File System Storage

Allows the use of the default Spark file system interactions in batch and streaming.

Useful links:

Common

The following fields are available for all storage components:

Name	Mandatory	Description	Example	Default
Path	true	The directory where the data is stored. Note that you may rely on path templatization.	Path = "hdfs://path/to/data"
Format	No	The format to use to read the data	Format = "csv"	The value is set as default in Spark configuration: spark.sql.sources.default
Schema	No	The schema of the input data. See the schema definitions page for more information.	Schema = "myproject.models.MySchema"
DateFilter	No	Pre-filter the input to focus on a specific date range.
Repartition	No	Matches the Spark Dataset repartition function, either by number, columns or both. One argument, either Column or Number, is mandatory.	Repartition { Number = 10, Columns = "upd_date" }
Coalesce	No	Matches the Spark Dataset coalesce function.	Coalesce = 10
Options	No	Spark options, as key = value pairs. The list of available options is available in the official Spark API documentation for the DataFrameReader	Options { header = true }

The DateFilter field is never mandatory, but be aware that omitting it could result in processing years of data.

Type: com.amadeus.dataio.pipes.storage.batch.StorageInput

Type: com.amadeus.dataio.pipes.storage.batch.StorageOutput

Type: com.amadeus.dataio.pipes.storage.streaming.StorageInput

Type: com.amadeus.dataio.pipes.storage.streaming.StorageOutput