class documentation

AbstractVersionedDataSet is the base class for all versioned data set implementations. All data sets that implement versioning should extend this abstract class and implement the methods marked as abstract.

Example:

>>> from pathlib import Path, PurePosixPath
>>> import pandas as pd
>>> from kedro.io import AbstractVersionedDataSet
>>>
>>>
>>> class MyOwnDataSet(AbstractVersionedDataSet):
>>>     def __init__(self, filepath, version, param1, param2=True):
>>>         super().__init__(PurePosixPath(filepath), version)
>>>         self._param1 = param1
>>>         self._param2 = param2
>>>
>>>     def _load(self) -> pd.DataFrame:
>>>         load_path = self._get_load_path()
>>>         return pd.read_csv(load_path)
>>>
>>>     def _save(self, df: pd.DataFrame) -> None:
>>>         save_path = self._get_save_path()
>>>         df.to_csv(str(save_path))
>>>
>>>     def _exists(self) -> bool:
>>>         path = self._get_load_path()
>>>         return Path(path.as_posix()).exists()
>>>
>>>     def _describe(self):
>>>         return dict(version=self._version, param1=self._param1, param2=self._param2)

Example catalog.yml specification:

my_dataset:
    type: <path-to-my-own-dataset>.MyOwnDataSet
    filepath: data/01_raw/my_data.csv
    versioned: true
    param1: <param1-value> # param1 is a required argument
    # param2 will be True by default
Method __init__ Creates a new instance of AbstractVersionedDataSet.
Method exists Checks whether a data set's output already exists by calling the provided _exists() method.
Method load Loads data by delegation to the provided load method.
Method resolve_load_version Compute the version the dataset should be loaded with.
Method resolve_save_version Compute the version the dataset should be saved with.
Method save Saves data by delegation to the provided save method.
Method _fetch_latest_load_version Undocumented
Method _fetch_latest_save_version Generate and cache the current save version
Method _get_load_path Undocumented
Method _get_save_path Undocumented
Method _get_versioned_path Undocumented
Method _release Undocumented
Instance Variable _exists_function Undocumented
Instance Variable _filepath Undocumented
Instance Variable _glob_function Undocumented
Instance Variable _version Undocumented
Instance Variable _version_cache Undocumented

Inherited from AbstractDataSet:

Class Method from_config Create a data set instance using the configuration provided.
Method __str__ Undocumented
Method release Release any cached data.
Method _copy Undocumented
Method _describe Undocumented
Method _exists Undocumented
Method _load Undocumented
Method _save Undocumented
Property _logger Undocumented
def __init__(self, filepath: PurePosixPath, version: Optional[Version], exists_function: Callable[[str], bool] = None, glob_function: Callable[[str], List[str]] = None): (source)

Creates a new instance of AbstractVersionedDataSet.

Parameters
filepath:PurePosixPathFilepath in POSIX format to a file.
version:Optional[Version]If specified, should be an instance of kedro.io.core.Version. If its load attribute is None, the latest version will be loaded. If its save attribute is None, save version will be autogenerated.
exists_function:Callable[[str], bool]Function that is used for determining whether a path exists in a filesystem.
glob_function:Callable[[str], List[str]]Function that is used for finding all paths in a filesystem, which match a given pattern.
def exists(self) -> bool: (source)

Checks whether a data set's output already exists by calling the provided _exists() method.

Returns
boolFlag indicating whether the output already exists.
Raises
DataSetErrorwhen underlying exists method raises error.
def load(self) -> _DO: (source)

Loads data by delegation to the provided load method.

Returns
_DOData returned by the provided load method.
Raises
DataSetErrorWhen underlying load method raises error.
def resolve_load_version(self) -> Optional[str]: (source)

Compute the version the dataset should be loaded with.

def resolve_save_version(self) -> Optional[str]: (source)

Compute the version the dataset should be saved with.

def save(self, data: _DI): (source)

Saves data by delegation to the provided save method.

Parameters
data:_DIthe value to be saved by provided save method.
Raises
DataSetErrorwhen underlying save method raises error.
FileNotFoundErrorwhen save method got file instead of dir, on Windows.
NotADirectoryErrorwhen save method got file instead of dir, on Unix.
@cachedmethod(cache=attrgetter('_version_cache'), key=partial(hashkey, 'load'))
def _fetch_latest_load_version(self) -> str: (source)

Undocumented

@cachedmethod(cache=attrgetter('_version_cache'), key=partial(hashkey, 'save'))
def _fetch_latest_save_version(self) -> str: (source)

Generate and cache the current save version

def _get_load_path(self) -> PurePosixPath: (source)

Undocumented

def _get_save_path(self) -> PurePosixPath: (source)

Undocumented

def _get_versioned_path(self, version: str) -> PurePosixPath: (source)

Undocumented

_exists_function = (source)

Undocumented

_filepath = (source)

Undocumented

_glob_function = (source)

Undocumented

_version = (source)

Undocumented

_version_cache: Cache = (source)

Undocumented