Date Filters

It might be necessary for your applications to filter input datasets by a specific a date range. This is made possible by Data I/O directly in the configuration file, via the date_filter input field.

date_filter’s availability is decided at pipe level. Please refer to their specific documentation to know whether it is available.

Fields

date_filter always requires a column field (where the filter is applied), plus one of the following range definitions:

reference+offset and from/until are mutually exclusive.

Common field

Relative range: reference + offset

The resulting interval is the range between reference and reference + offset (order doesn’t matter: the earliest becomes the lower bound, the latest becomes the upper bound).

Absolute range: from / until

If only from is provided, the filter is open-ended on the upper side. If only until is provided, the filter is open-ended on the lower side.

If both from and until are provided, from must always be strictly lower than until.

Example

Relative range (reference + offset)

input {
  name = "my-input"
  type = "com.amadeus.dataio.pipes.spark.batch.SparkInput"
  format = "delta"
  path = "hdfs://path/to/data"

  date_filter {
    reference = "2023-07-01"
    offset = "-7D"
    column = "date"
  }
}

Absolute range (from + until)

input {
  name = "my-input"
  type = "com.amadeus.dataio.pipes.spark.batch.SparkInput"
  format = "delta"
  path = "hdfs://path/to/data"

  date_filter {
    from = "2023-06-24"
    until = "2023-07-01"
    column = "date"
  }
}

Absolute range (only one bound)

Only from:

date_filter {
  from = "2023-06-24"
  column = "date"
}

Only until:

date_filter {
  until = "2023-07-01"
  column = "date"
}