class documentation

class DeltaTableDataSet(AbstractDataSet[None, DeltaTable]): (source)

View In Hierarchy

DeltaTableDataSet loads data into DeltaTable objects.

Example usage for the YAML API:

weather@spark:
  type: spark.SparkDataSet
  filepath: data/02_intermediate/data.parquet
  file_format: "delta"

weather@delta:
  type: spark.DeltaTableDataSet
  filepath: data/02_intermediate/data.parquet

Example usage for the Python API:

>>> from pyspark.sql import SparkSession
>>> from pyspark.sql.types import (StructField, StringType,
>>>                                IntegerType, StructType)
>>>
>>> from kedro.extras.datasets.spark import DeltaTableDataSet, SparkDataSet
>>>
>>> schema = StructType([StructField("name", StringType(), True),
>>>                      StructField("age", IntegerType(), True)])
>>>
>>> data = [('Alex', 31), ('Bob', 12), ('Clarke', 65), ('Dave', 29)]
>>>
>>> spark_df = SparkSession.builder.getOrCreate().createDataFrame(data, schema)
>>>
>>> data_set = SparkDataSet(filepath="test_data", file_format="delta")
>>> data_set.save(spark_df)
>>> deltatable_dataset = DeltaTableDataSet(filepath="test_data")
>>> delta_table = deltatable_dataset.load()
>>>
>>> delta_table.update()
Method __init__ Creates a new instance of DeltaTableDataSet.
Static Method _get_spark Undocumented
Method _describe Undocumented
Method _exists Undocumented
Method _load Undocumented
Method _save Undocumented
Constant _SINGLE_PROCESS Undocumented
Instance Variable _filepath Undocumented
Instance Variable _fs_prefix Undocumented

Inherited from AbstractDataSet:

Class Method from_config Create a data set instance using the configuration provided.
Method __str__ Undocumented
Method exists Checks whether a data set's output already exists by calling the provided _exists() method.
Method load Loads data by delegation to the provided load method.
Method release Release any cached data.
Method save Saves data by delegation to the provided save method.
Method _copy Undocumented
Method _release Undocumented
Property _logger Undocumented
def __init__(self, filepath: str): (source)

Creates a new instance of DeltaTableDataSet.

Parameters
filepath:strFilepath in POSIX format to a Spark dataframe. When using Databricks and working with data written to mount path points, specify filepath``s for (versioned) ``SparkDataSet``s starting with ``/dbfs/mnt.
@staticmethod
def _get_spark(): (source)

Undocumented

def _describe(self): (source)

Undocumented

def _exists(self) -> bool: (source)

Undocumented

def _load(self) -> DeltaTable: (source)

Undocumented

def _save(self, data: None) -> NoReturn: (source)

Undocumented

_SINGLE_PROCESS: bool = (source)

Undocumented

Value
True
_filepath = (source)

Undocumented

_fs_prefix = (source)

Undocumented