class AbstractVersionedDataSet(AbstractDataSet[
Known subclasses: kedro.extras.datasets.email.EmailMessageDataSet
, kedro.extras.datasets.geopandas.GeoJSONDataSet
, kedro.extras.datasets.holoviews.HoloviewsWriter
, kedro.extras.datasets.json.JSONDataSet
, kedro.extras.datasets.matplotlib.MatplotlibWriter
, kedro.extras.datasets.networkx.GMLDataSet
, kedro.extras.datasets.networkx.GraphMLDataSet
, kedro.extras.datasets.networkx.JSONDataSet
, kedro.extras.datasets.pandas.CSVDataSet
, kedro.extras.datasets.pandas.ExcelDataSet
, kedro.extras.datasets.pandas.FeatherDataSet
, kedro.extras.datasets.pandas.GenericDataSet
, kedro.extras.datasets.pandas.HDFDataSet
, kedro.extras.datasets.pandas.JSONDataSet
, kedro.extras.datasets.pandas.ParquetDataSet
, kedro.extras.datasets.pandas.XMLDataSet
, kedro.extras.datasets.pickle.PickleDataSet
, kedro.extras.datasets.pillow.ImageDataSet
, kedro.extras.datasets.plotly.JSONDataSet
, kedro.extras.datasets.spark.SparkDataSet
, kedro.extras.datasets.svmlight.SVMLightDataSet
, kedro.extras.datasets.tensorflow.TensorFlowModelDataset
, kedro.extras.datasets.text.TextDataSet
, kedro.extras.datasets.yaml.YAMLDataSet
AbstractVersionedDataSet is the base class for all versioned data set implementations. All data sets that implement versioning should extend this abstract class and implement the methods marked as abstract.
Example:
>>> from pathlib import Path, PurePosixPath >>> import pandas as pd >>> from kedro.io import AbstractVersionedDataSet >>> >>> >>> class MyOwnDataSet(AbstractVersionedDataSet): >>> def __init__(self, filepath, version, param1, param2=True): >>> super().__init__(PurePosixPath(filepath), version) >>> self._param1 = param1 >>> self._param2 = param2 >>> >>> def _load(self) -> pd.DataFrame: >>> load_path = self._get_load_path() >>> return pd.read_csv(load_path) >>> >>> def _save(self, df: pd.DataFrame) -> None: >>> save_path = self._get_save_path() >>> df.to_csv(str(save_path)) >>> >>> def _exists(self) -> bool: >>> path = self._get_load_path() >>> return Path(path.as_posix()).exists() >>> >>> def _describe(self): >>> return dict(version=self._version, param1=self._param1, param2=self._param2)
Example catalog.yml specification:
my_dataset: type: <path-to-my-own-dataset>.MyOwnDataSet filepath: data/01_raw/my_data.csv versioned: true param1: <param1-value> # param1 is a required argument # param2 will be True by default
Method | __init__ |
Creates a new instance of AbstractVersionedDataSet. |
Method | exists |
Checks whether a data set's output already exists by calling the provided _exists() method. |
Method | load |
Loads data by delegation to the provided load method. |
Method | resolve |
Compute the version the dataset should be loaded with. |
Method | resolve |
Compute the version the dataset should be saved with. |
Method | save |
Saves data by delegation to the provided save method. |
Method | _fetch |
Undocumented |
Method | _fetch |
Generate and cache the current save version |
Method | _get |
Undocumented |
Method | _get |
Undocumented |
Method | _get |
Undocumented |
Method | _release |
Undocumented |
Instance Variable | _exists |
Undocumented |
Instance Variable | _filepath |
Undocumented |
Instance Variable | _glob |
Undocumented |
Instance Variable | _version |
Undocumented |
Instance Variable | _version |
Undocumented |
Inherited from AbstractDataSet
:
Class Method | from |
Create a data set instance using the configuration provided. |
Method | __str__ |
Undocumented |
Method | release |
Release any cached data. |
Method | _copy |
Undocumented |
Method | _describe |
Undocumented |
Method | _exists |
Undocumented |
Method | _load |
Undocumented |
Method | _save |
Undocumented |
Property | _logger |
Undocumented |
PurePosixPath
, version: Optional[ Version]
, exists_function: Callable[ [ str], bool]
= None, glob_function: Callable[ [ str], List[ str]]
= None):
(source)
¶
kedro.extras.datasets.email.EmailMessageDataSet
, kedro.extras.datasets.geopandas.GeoJSONDataSet
, kedro.extras.datasets.holoviews.HoloviewsWriter
, kedro.extras.datasets.json.JSONDataSet
, kedro.extras.datasets.matplotlib.MatplotlibWriter
, kedro.extras.datasets.networkx.GMLDataSet
, kedro.extras.datasets.networkx.GraphMLDataSet
, kedro.extras.datasets.networkx.JSONDataSet
, kedro.extras.datasets.pandas.CSVDataSet
, kedro.extras.datasets.pandas.ExcelDataSet
, kedro.extras.datasets.pandas.FeatherDataSet
, kedro.extras.datasets.pandas.GenericDataSet
, kedro.extras.datasets.pandas.HDFDataSet
, kedro.extras.datasets.pandas.JSONDataSet
, kedro.extras.datasets.pandas.ParquetDataSet
, kedro.extras.datasets.pandas.XMLDataSet
, kedro.extras.datasets.pickle.PickleDataSet
, kedro.extras.datasets.pillow.ImageDataSet
, kedro.extras.datasets.plotly.JSONDataSet
, kedro.extras.datasets.spark.SparkDataSet
, kedro.extras.datasets.svmlight.SVMLightDataSet
, kedro.extras.datasets.tensorflow.TensorFlowModelDataset
, kedro.extras.datasets.text.TextDataSet
, kedro.extras.datasets.yaml.YAMLDataSet
Creates a new instance of AbstractVersionedDataSet.
Parameters | |
filepath:PurePosixPath | Filepath in POSIX format to a file. |
version:Optional[ | If specified, should be an instance of kedro.io.core.Version. If its load attribute is None, the latest version will be loaded. If its save attribute is None, save version will be autogenerated. |
existsCallable[ | Function that is used for determining whether a path exists in a filesystem. |
globCallable[ | Function that is used for finding all paths in a filesystem, which match a given pattern. |
kedro.io.AbstractDataSet.exists
Checks whether a data set's output already exists by calling the provided _exists() method.
Returns | |
bool | Flag indicating whether the output already exists. |
Raises | |
DataSetError | when underlying exists method raises error. |
kedro.io.AbstractDataSet.load
Loads data by delegation to the provided load method.
Returns | |
_DO | Data returned by the provided load method. |
Raises | |
DataSetError | When underlying load method raises error. |
kedro.io.AbstractDataSet.save
Saves data by delegation to the provided save method.
Parameters | |
data:_DI | the value to be saved by provided save method. |
Raises | |
DataSetError | when underlying save method raises error. |
FileNotFoundError | when save method got file instead of dir, on Windows. |
NotADirectoryError | when save method got file instead of dir, on Unix. |
def _fetch_latest_load_version(self) ->
str
:
(source)
¶
Undocumented
def _fetch_latest_save_version(self) ->
str
:
(source)
¶
Generate and cache the current save version
kedro.io.AbstractDataSet._release
kedro.extras.datasets.email.EmailMessageDataSet
, kedro.extras.datasets.geopandas.GeoJSONDataSet
, kedro.extras.datasets.holoviews.HoloviewsWriter
, kedro.extras.datasets.json.JSONDataSet
, kedro.extras.datasets.matplotlib.MatplotlibWriter
, kedro.extras.datasets.networkx.GMLDataSet
, kedro.extras.datasets.networkx.GraphMLDataSet
, kedro.extras.datasets.networkx.JSONDataSet
, kedro.extras.datasets.pandas.CSVDataSet
, kedro.extras.datasets.pandas.ExcelDataSet
, kedro.extras.datasets.pandas.FeatherDataSet
, kedro.extras.datasets.pandas.GenericDataSet
, kedro.extras.datasets.pandas.HDFDataSet
, kedro.extras.datasets.pandas.JSONDataSet
, kedro.extras.datasets.pandas.ParquetDataSet
, kedro.extras.datasets.pandas.XMLDataSet
, kedro.extras.datasets.pickle.PickleDataSet
, kedro.extras.datasets.pillow.ImageDataSet
, kedro.extras.datasets.plotly.JSONDataSet
, kedro.extras.datasets.svmlight.SVMLightDataSet
, kedro.extras.datasets.tensorflow.TensorFlowModelDataset
, kedro.extras.datasets.text.TextDataSet
, kedro.extras.datasets.yaml.YAMLDataSet
Undocumented