It might be necessary for your applications to filter input datasets by a specific a date range. This is made possible
by Data I/O directly in the configuration file, via the date_filter input field.
date_filter’s availability is decided at pipe level. Please refer to their specific documentation to know whether it
is available.
date_filter always requires a column field (where the filter is applied), plus one of the following range definitions:
reference + offsetfrom and/or until (you can provide both, or only one of them)reference+offset and from/until are mutually exclusive.
column (required): the date column used for filtering.reference + offsetreference (required): the anchor date (iso-date format).offset (required): a duration relative to reference (e.g. -7D, -1M, +3D), defining the other bound.The resulting interval is the range between reference and reference + offset (order doesn’t matter: the earliest becomes the lower bound, the latest becomes the upper bound).
from / untilfrom (optional): lower bound (inclusive).until (optional): upper bound (exclusive).If only from is provided, the filter is open-ended on the upper side.
If only until is provided, the filter is open-ended on the lower side.
If both from and until are provided, from must always be strictly lower than until.
input {
name = "my-input"
type = "com.amadeus.dataio.pipes.spark.batch.SparkInput"
format = "delta"
path = "hdfs://path/to/data"
date_filter {
reference = "2023-07-01"
offset = "-7D"
column = "date"
}
}
input {
name = "my-input"
type = "com.amadeus.dataio.pipes.spark.batch.SparkInput"
format = "delta"
path = "hdfs://path/to/data"
date_filter {
from = "2023-06-24"
until = "2023-07-01"
column = "date"
}
}
Only from:
date_filter {
from = "2023-06-24"
column = "date"
}
Only until:
date_filter {
until = "2023-07-01"
column = "date"
}