PyData I/O

Automated handling of inputs and outputs through a unified interface.

Welcome to the PyData I/O documentation!

PyData I/O is a Spark SQL python framework automating the process of reading and writing data. It gives you the tools to write ETL pipelines by focusing on the transformation of the data, regardless of its origin or destination.

Be sure to check out the getting started page for an overview of PyData I/O’s features, or visit the main concepts page for a deeper dive.

If you know your way around PyData I/O and want to learn how to create your own pipes, head to the advanced section.