package documentation
Provides I/O modules for Apache Spark.
Module | deltatable |
AbstractDataSet implementation to access DeltaTables using delta-spark |
Module | spark |
AbstractVersionedDataSet implementation to access Spark dataframes using pyspark |
Module | spark |
AbstractDataSet implementation to access Spark dataframes using pyspark on Apache Hive. |
Module | spark |
SparkJDBCDataSet to load and save a PySpark DataFrame via JDBC. |
From __init__.py
:
Class |
|
DeltaTableDataSet loads data into DeltaTable objects. |
Class |
|
SparkDataSet loads and saves Spark dataframes. |
Class |
|
SparkHiveDataSet loads and saves Spark dataframes stored on Hive. This data set also handles some incompatible file types such as using partitioned parquet on hive which will not normally allow upserts to existing data without a complete replacement of the existing file/partition. |