Paths templatization

Data I/O includes a path templatization feature, allowing you to customize input and output paths with ease when applicable.

This feature facilitates dynamic path generation by replacing placeholders with values from the configuration. Path templatization is particularly useful for tasks such as managing date ranges and generating unique identifiers.

Placeholder fields

Some placeholders may only make sense for outputs configuration, even though they technically can be used in inputs (e.g. random uuid).

%{from} %{to}

Name Mandatory Description Example Default
template Yes The path template to fill. template = file_%{from}_%{to}.csv
date_reference Yes The date to use when detemplatizing. date_reference = "2022-01-27"
date_offset Yes The offset to use, with respect to Date when detemplatizing. date_offset = "+5D"
date_pattern Yes The output format to use when detemplatizing. It will apply to both %{from} and %{to}, if they are both present. date_pattern = "yyyyMMdd"

%{date}

Name Mandatory Description Example Default
template Yes The path template to fill. template = file_%{datetime}.csv
date No The date to use when detemplatizing. date = "2022-01-27" Current date
date_pattern No The output format to use when detemplatizing. date_pattern = "yyyyMMdd" yyyyMMdd

%{year} %{month} %{day}

Name Mandatory Description Example Default
template Yes The path template to fill. template = file_%{year}.csv
date No The date to use when detemplatizing. date = "2022-01-27" Current datetime

%{uuid}

Name Mandatory Description Example Default
template Yes The template to fill with a random, 16-bytes long, UUID. template = file_%{uuid}.csv

Examples

Here’s an example of input using the path feature without templatization:

(...)

output {
  name = "my-output"
  type = "com.amadeus.dataio.pipes.spark.batch.SparkOutput"
  format = "csv"
  path = "hdfs://path/to/data/static.csv"
}

(...)

Here’s an example of input using the path feature with templatization:

(...)

output {
  name = "my-output"
  type = "com.amadeus.dataio.pipes.spark.batch.SparkOutput"
  format = "csv"
  path {
    template = "hdfs://path/to/data/file_%{from}_%{to}.csv"
    date_reference = "2022-01-20"
    date_offset = "-1D"
    date_pattern = "yyyyMMdd"
  }
}

(...)

Which will result in the following output path: hdfs://path/to/data/file_20220119_20220120.csv