kedro.io.CachedDataSet

class documentation

class CachedDataSet(AbstractDataSet): (source)

CachedDataSet is a dataset wrapper which caches in memory the data saved, so that the user avoids io operations with slow storage media.

You can also specify a CachedDataSet in catalog.yml:

>>> test_ds:
>>>    type: CachedDataSet
>>>    versioned: true
>>>    dataset:
>>>       type: pandas.CSVDataSet
>>>       filepath: example.csv

Please note that if your dataset is versioned, this should be indicated in the wrapper class as shown above.

Method	`__getstate__`	Undocumented
Method	`__init__`	Creates a new instance of `CachedDataSet` pointing to the provided Python object.
Static Method	`_from_config`	Undocumented
Method	`_describe`	Undocumented
Method	`_exists`	Undocumented
Method	`_load`	Undocumented
Method	`_release`	Undocumented
Method	`_save`	Undocumented
Constant	`_SINGLE_PROCESS`	Undocumented
Instance Variable	`_cache`	Undocumented
Instance Variable	`_dataset`	Undocumented

Inherited from AbstractDataSet:

Class Method	`from_config`	Create a data set instance using the configuration provided.
Method	`__str__`	Undocumented
Method	`exists`	Checks whether a data set's output already exists by calling the provided _exists() method.
Method	`load`	Loads data by delegation to the provided load method.
Method	`release`	Release any cached data.
Method	`save`	Saves data by delegation to the provided save method.
Method	`_copy`	Undocumented
Property	`_logger`	Undocumented

Undocumented

def __init__(self, dataset: Union[AbstractDataSet, Dict], version: Version = None, copy_mode: str = None): (source) ¶

Creates a new instance of CachedDataSet pointing to the provided Python object.

Parameters
dataset:`Union[AbstractDataSet, Dict]`	A Kedro DataSet object or a dictionary to cache.
version:`Version`	If specified, should be an instance of `kedro.io.core.Version`. If its `load` attribute is None, the latest version will be loaded. If its `save` attribute is None, save version will be autogenerated.
copy_mode:`str`	The copy mode used to copy the data. Possible values are: "deepcopy", "copy" and "assign". If not provided, it is inferred based on the data type.
Raises
`ValueError`	If the provided dataset is not a valid dict/YAML representation of a dataset or an actual dataset.