Elasticsearch

Allows the connection to Elasticsearch nodes to automatically read and write data to an index in batch and streaming.

Code repository: https://github.com/AmadeusITGroup/dataio-framework/tree/main/src/main/scala/com/amadeus/dataio/pipes/elk

Useful links:


Common

The following fields are available for all Elasticsearch components:

Name Mandatory Description Example Default
Index Yes The index where the data will be written to.
DateField Yes The document date field to be used for sub index partitioning. DateField = "docTimestamp"
SubIndexDatePattern No Set the date pattern to use while computing the sub index suffix. See Java DateTimeFormatter documentation. SubIndexDatePattern = "yyyy.MM" yyyy.MM
Options Yes Spark options, as key = value pairs. "es.nodes" and "es.port" are currently mandatory. Options { es.nodes = "elk.mycompany.com", es.port = "9000" }

Batch

No batch input is currently available for Elasticsearch in Data I/O.

Output

Type: com.amadeus.dataio.pipes.elk.batch.ElkOutput

Name Mandatory Description Example Default
Mode Yes The Spark SQL write mode. Mode = "overwrite" error

Streaming

No streaming input is currently available for Elasticsearch in Data I/O.

Output

Type: com.amadeus.dataio.pipes.elk.streaming.ElkOutput

Name Mandatory Description Example Default
Duration Yes Sets the trigger for the stream query. Controls the trigger() Spark function. Duration = "60 seconds"
Timeout Yes Controls the amount of time before returning from the streaming query, in hours. It can be a String or an Int. Timeout = 24
Mode Yes The Spark Structured Streaming output mode. Mode = "complete" append