kedro.extras.datasets.spark.DeltaTableDataSet

class documentation

class DeltaTableDataSet(AbstractDataSet[None, DeltaTable]): (source)

DeltaTableDataSet loads data into DeltaTable objects.

Example usage for the YAML API:

weather@spark:
  type: spark.SparkDataSet
  filepath: data/02_intermediate/data.parquet
  file_format: "delta"

weather@delta:
  type: spark.DeltaTableDataSet
  filepath: data/02_intermediate/data.parquet

Example usage for the Python API:

>>> from pyspark.sql import SparkSession
>>> from pyspark.sql.types import (StructField, StringType,
>>>                                IntegerType, StructType)
>>>
>>> from kedro.extras.datasets.spark import DeltaTableDataSet, SparkDataSet
>>>
>>> schema = StructType([StructField("name", StringType(), True),
>>>                      StructField("age", IntegerType(), True)])
>>>
>>> data = [('Alex', 31), ('Bob', 12), ('Clarke', 65), ('Dave', 29)]
>>>
>>> spark_df = SparkSession.builder.getOrCreate().createDataFrame(data, schema)
>>>
>>> data_set = SparkDataSet(filepath="test_data", file_format="delta")
>>> data_set.save(spark_df)
>>> deltatable_dataset = DeltaTableDataSet(filepath="test_data")
>>> delta_table = deltatable_dataset.load()
>>>
>>> delta_table.update()

Method	`__init__`	Creates a new instance of `DeltaTableDataSet`.
Static Method	`_get_spark`	Undocumented
Method	`_describe`	Undocumented
Method	`_exists`	Undocumented
Method	`_load`	Undocumented
Method	`_save`	Undocumented
Constant	`_SINGLE_PROCESS`	Undocumented
Instance Variable	`_filepath`	Undocumented
Instance Variable	`_fs_prefix`	Undocumented

Inherited from AbstractDataSet:

Class Method	`from_config`	Create a data set instance using the configuration provided.
Method	`__str__`	Undocumented
Method	`exists`	Checks whether a data set's output already exists by calling the provided _exists() method.
Method	`load`	Loads data by delegation to the provided load method.
Method	`release`	Release any cached data.
Method	`save`	Saves data by delegation to the provided save method.
Method	`_copy`	Undocumented
Method	`_release`	Undocumented
Property	`_logger`	Undocumented

def __init__(self, filepath: str): (source) ¶

Creates a new instance of DeltaTableDataSet.

Parameters
filepath:`str`	Filepath in POSIX format to a Spark dataframe. When using Databricks and working with data written to mount path points, specify filepath``s for (versioned) ``SparkDataSet``s starting with ``/dbfs/mnt.