package documentation

Provides I/O modules for Apache Spark.

Module deltatable_dataset AbstractDataSet implementation to access DeltaTables using delta-spark
Module spark_dataset AbstractVersionedDataSet implementation to access Spark dataframes using pyspark
Module spark_hive_dataset AbstractDataSet implementation to access Spark dataframes using pyspark on Apache Hive.
Module spark_jdbc_dataset SparkJDBCDataSet to load and save a PySpark DataFrame via JDBC.

From __init__.py:

Class DeltaTableDataSet DeltaTableDataSet loads data into DeltaTable objects.
Class SparkDataSet SparkDataSet loads and saves Spark dataframes.
Class SparkHiveDataSet SparkHiveDataSet loads and saves Spark dataframes stored on Hive. This data set also handles some incompatible file types such as using partitioned parquet on hive which will not normally allow upserts to existing data without a complete replacement of the existing file/partition.