Paths templatization
Data I/O includes a path templatization feature, allowing you to customize input and output paths with ease when applicable.
This feature facilitates dynamic path generation by replacing placeholders with values from the configuration. Path templatization is particularly useful for tasks such as managing date ranges and generating unique identifiers.
Placeholder fields
Some placeholders may only make sense for outputs configuration, even though they technically can be used in inputs (e.g. random uuid).
%{from} %{to}
Name | Mandatory | Description | Example | Default |
---|---|---|---|---|
template | Yes | The path template to fill. | template = file_%{from}_%{to}.csv | |
date_reference | Yes | The date to use when detemplatizing. | date_reference = "2022-01-27" | |
date_offset | Yes | The offset to use, with respect to Date when detemplatizing. | date_offset = "+5D" | |
date_pattern | Yes | The output format to use when detemplatizing. It will apply to both %{from} and %{to}, if they are both present. | date_pattern = "yyyyMMdd" |
%{date}
Name | Mandatory | Description | Example | Default |
---|---|---|---|---|
template | Yes | The path template to fill. | template = file_%{datetime}.csv | |
date | No | The date to use when detemplatizing. | date = "2022-01-27" | Current date |
date_pattern | No | The output format to use when detemplatizing. | date_pattern = "yyyyMMdd" | yyyyMMdd |
%{year} %{month} %{day}
Name | Mandatory | Description | Example | Default |
---|---|---|---|---|
template | Yes | The path template to fill. | template = file_%{year}.csv | |
date | No | The date to use when detemplatizing. | date = "2022-01-27" | Current datetime |
%{uuid}
Name | Mandatory | Description | Example | Default |
---|---|---|---|---|
template | Yes | The template to fill with a random, 16-bytes long, UUID. | template = file_%{uuid}.csv |
Examples
Here’s an example of input using the path
feature without templatization:
(...)
output {
name = "my-output"
type = "com.amadeus.dataio.pipes.spark.batch.SparkOutput"
format = "csv"
path = "hdfs://path/to/data/static.csv"
}
(...)
Here’s an example of input using the path
feature with templatization:
(...)
output {
name = "my-output"
type = "com.amadeus.dataio.pipes.spark.batch.SparkOutput"
format = "csv"
path {
template = "hdfs://path/to/data/file_%{from}_%{to}.csv"
date_reference = "2022-01-20"
date_offset = "-1D"
date_pattern = "yyyyMMdd"
}
}
(...)
Which will result in the following output path: hdfs://path/to/data/file_20220119_20220120.csv