Elasticsearch
Allows the connection to Elasticsearch nodes to automatically read and write data to an index in batch and streaming.
Code repository: https://github.com/AmadeusITGroup/dataio-framework/tree/main/src/main/scala/com/amadeus/dataio/pipes/elk
Useful links:
Common
The following fields are available for all Elasticsearch components:
Name | Mandatory | Description | Example | Default |
---|---|---|---|---|
Index | Yes | The index where the data will be written to. | ||
DateField | Yes | The document date field to be used for sub index partitioning. | DateField = "docTimestamp" | |
SubIndexDatePattern | No | Set the date pattern to use while computing the sub index suffix. See Java DateTimeFormatter documentation. | SubIndexDatePattern = "yyyy.MM" | yyyy.MM |
Options | Yes | Spark options, as key = value pairs. "es.nodes" and "es.port" are currently mandatory. | Options { es.nodes = "elk.mycompany.com", es.port = "9000" } |
Batch
No batch input is currently available for Elasticsearch in Data I/O.
Output
Type: com.amadeus.dataio.pipes.elk.batch.ElkOutput
Name | Mandatory | Description | Example | Default |
---|---|---|---|---|
Mode | Yes | The Spark SQL write mode. | Mode = "overwrite" | error |
Streaming
No streaming input is currently available for Elasticsearch in Data I/O.
Output
Type: com.amadeus.dataio.pipes.elk.streaming.ElkOutput
Name | Mandatory | Description | Example | Default |
---|---|---|---|---|
Duration | Yes | Sets the trigger for the stream query. Controls the trigger() Spark function. | Duration = "60 seconds" | |
Timeout | Yes | Controls the amount of time before returning from the streaming query, in hours. It can be a String or an Int. | Timeout = 24 | |
Mode | Yes | The Spark Structured Streaming output mode. | Mode = "complete" | append |