class documentation
DeltaTableDataSet loads data into DeltaTable objects.
Example usage for the YAML API:
weather@spark: type: spark.SparkDataSet filepath: data/02_intermediate/data.parquet file_format: "delta" weather@delta: type: spark.DeltaTableDataSet filepath: data/02_intermediate/data.parquet
Example usage for the Python API:
>>> from pyspark.sql import SparkSession >>> from pyspark.sql.types import (StructField, StringType, >>> IntegerType, StructType) >>> >>> from kedro.extras.datasets.spark import DeltaTableDataSet, SparkDataSet >>> >>> schema = StructType([StructField("name", StringType(), True), >>> StructField("age", IntegerType(), True)]) >>> >>> data = [('Alex', 31), ('Bob', 12), ('Clarke', 65), ('Dave', 29)] >>> >>> spark_df = SparkSession.builder.getOrCreate().createDataFrame(data, schema) >>> >>> data_set = SparkDataSet(filepath="test_data", file_format="delta") >>> data_set.save(spark_df) >>> deltatable_dataset = DeltaTableDataSet(filepath="test_data") >>> delta_table = deltatable_dataset.load() >>> >>> delta_table.update()
Method | __init__ |
Creates a new instance of DeltaTableDataSet. |
Static Method | _get |
Undocumented |
Method | _describe |
Undocumented |
Method | _exists |
Undocumented |
Method | _load |
Undocumented |
Method | _save |
Undocumented |
Constant | _SINGLE |
Undocumented |
Instance Variable | _filepath |
Undocumented |
Instance Variable | _fs |
Undocumented |
Inherited from AbstractDataSet
:
Class Method | from |
Create a data set instance using the configuration provided. |
Method | __str__ |
Undocumented |
Method | exists |
Checks whether a data set's output already exists by calling the provided _exists() method. |
Method | load |
Loads data by delegation to the provided load method. |
Method | release |
Release any cached data. |
Method | save |
Saves data by delegation to the provided save method. |
Method | _copy |
Undocumented |
Method | _release |
Undocumented |
Property | _logger |
Undocumented |
Creates a new instance of DeltaTableDataSet.
Parameters | |
filepath:str | Filepath in POSIX format to a Spark dataframe. When using Databricks and working with data written to mount path points, specify filepath``s for (versioned) ``SparkDataSet``s starting with ``/dbfs/mnt. |