Paths templatization

Data I/O includes a path templatization feature, allowing you to customize input and output paths with ease when applicable.

This feature facilitates dynamic path generation by replacing placeholders with values from the configuration. Path templatization is particularly useful for tasks such as managing date ranges and generating unique identifiers.

Placeholder fields

Some placeholders may only make sense for outputs configuration, even though they technically can be used in inputs (e.g. random uuid).

%{from} %{to}

Name	Mandatory	Description	Example
template	Yes	The path template to fill.	template = file_%{from}_%{to}.csv
date_reference	Yes	The date to use when detemplatizing.	date_reference = "2022-01-27"
date_offset	Yes	The offset to use, with respect to Date when detemplatizing.	date_offset = "+5D"
date_pattern	Yes	The output format to use when detemplatizing. It will apply to both %{from} and %{to}, if they are both present.	date_pattern = "yyyyMMdd"

%{date}

Name	Mandatory	Description	Example	Default
template	Yes	The path template to fill.	template = file_%{datetime}.csv
date	No	The date to use when detemplatizing.	date = "2022-01-27"	Current date
date_pattern	No	The output format to use when detemplatizing.	date_pattern = "yyyyMMdd"	yyyyMMdd

%{year} %{month} %{day}

Name	Mandatory	Description	Example	Default
template	Yes	The path template to fill.	template = file_%{year}.csv
date	No	The date to use when detemplatizing.	date = "2022-01-27"	Current datetime

%{uuid}

Name	Mandatory	Description	Example	Default
template	Yes	The template to fill with a random, 16-bytes long, UUID.	template = file_%{uuid}.csv

Examples

Here’s an example of input using the path feature without templatization:

(...)

output {
  name = "my-output"
  type = "com.amadeus.dataio.pipes.spark.batch.SparkOutput"
  format = "csv"
  path = "hdfs://path/to/data/static.csv"
}

(...)

Here’s an example of input using the path feature with templatization:

(...)

output {
  name = "my-output"
  type = "com.amadeus.dataio.pipes.spark.batch.SparkOutput"
  format = "csv"
  path {
    template = "hdfs://path/to/data/file_%{from}_%{to}.csv"
    date_reference = "2022-01-20"
    date_offset = "-1D"
    date_pattern = "yyyyMMdd"
  }
}

(...)

Which will result in the following output path: hdfs://path/to/data/file_20220119_20220120.csv