class PickleDataSet(AbstractVersionedDataSet[
PickleDataSet loads/saves data from/to a Pickle file using an underlying filesystem (e.g.: local, S3, GCS). The underlying functionality is supported by the specified backend library passed in (defaults to the pickle library), so it supports all allowed options for loading and saving pickle files.
Example usage for the YAML API:
test_model: # simple example without compression type: pickle.PickleDataSet filepath: data/07_model_output/test_model.pkl backend: pickle final_model: # example with load and save args type: pickle.PickleDataSet filepath: s3://your_bucket/final_model.pkl.lz4 backend: joblib credentials: s3_credentials save_args: compress: lz4
Example usage for the Python API:
>>> from kedro.extras.datasets.pickle import PickleDataSet >>> import pandas as pd >>> >>> data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5], >>> 'col3': [5, 6]}) >>> >>> data_set = PickleDataSet(filepath="test.pkl", backend="pickle") >>> data_set.save(data) >>> reloaded = data_set.load() >>> assert data.equals(reloaded) >>> >>> data_set = PickleDataSet(filepath="test.pickle.lz4", >>> backend="compress_pickle", >>> load_args={"compression":"lz4"}, >>> save_args={"compression":"lz4"}) >>> data_set.save(data) >>> reloaded = data_set.load() >>> assert data.equals(reloaded)
Method | __init__ |
Creates a new instance of PickleDataSet pointing to a concrete Pickle file on a specific filesystem. PickleDataSet supports custom backends to serialise/deserialise objects. |
Constant | DEFAULT |
Undocumented |
Constant | DEFAULT |
Undocumented |
Method | _describe |
Undocumented |
Method | _exists |
Undocumented |
Method | _invalidate |
Invalidate underlying filesystem caches. |
Method | _load |
Undocumented |
Method | _release |
Undocumented |
Method | _save |
Undocumented |
Instance Variable | _backend |
Undocumented |
Instance Variable | _fs |
Undocumented |
Instance Variable | _fs |
Undocumented |
Instance Variable | _fs |
Undocumented |
Instance Variable | _load |
Undocumented |
Instance Variable | _protocol |
Undocumented |
Instance Variable | _save |
Undocumented |
Inherited from AbstractVersionedDataSet
:
Method | exists |
Checks whether a data set's output already exists by calling the provided _exists() method. |
Method | load |
Loads data by delegation to the provided load method. |
Method | resolve |
Compute the version the dataset should be loaded with. |
Method | resolve |
Compute the version the dataset should be saved with. |
Method | save |
Saves data by delegation to the provided save method. |
Method | _fetch |
Undocumented |
Method | _fetch |
Generate and cache the current save version |
Method | _get |
Undocumented |
Method | _get |
Undocumented |
Method | _get |
Undocumented |
Instance Variable | _exists |
Undocumented |
Instance Variable | _filepath |
Undocumented |
Instance Variable | _glob |
Undocumented |
Instance Variable | _version |
Undocumented |
Instance Variable | _version |
Undocumented |
Inherited from AbstractDataSet
(via AbstractVersionedDataSet
):
Class Method | from |
Create a data set instance using the configuration provided. |
Method | __str__ |
Undocumented |
Method | release |
Release any cached data. |
Method | _copy |
Undocumented |
Property | _logger |
Undocumented |
str
, backend: str
= 'pickle', load_args: Dict[ str, Any]
= None, save_args: Dict[ str, Any]
= None, version: Version
= None, credentials: Dict[ str, Any]
= None, fs_args: Dict[ str, Any]
= None):
(source)
¶
Creates a new instance of PickleDataSet pointing to a concrete Pickle file on a specific filesystem. PickleDataSet supports custom backends to serialise/deserialise objects.
- Example backends that are compatible (non-exhaustive):
pickle
joblib
dill
compress_pickle
- Example backends that are incompatible:
torch
Parameters | |
filepath:str | Filepath in POSIX format to a Pickle file prefixed with a protocol like
s3:// . If prefix is not provided, file protocol (local filesystem) will be used.
The prefix should be any protocol supported by fsspec.
Note: http(s) doesn't support versioning. |
backend:str | Backend to use, must be an import path to a module which satisfies the
pickle interface. That is, contains a load and dump function.
Defaults to 'pickle'. |
loadDict[ | Pickle options for loading pickle files. You can pass in arguments that the backend load function specified accepts, e.g: pickle.load: https://docs.python.org/3/library/pickle.html#pickle.load joblib.load: https://joblib.readthedocs.io/en/latest/generated/joblib.load.html dill.load: https://dill.readthedocs.io/en/latest/index.html#dill.load compress_pickle.load: https://lucianopaz.github.io/compress_pickle/html/api/compress_pickle.html#compress_pickle.compress_pickle.load All defaults are preserved. |
saveDict[ | Pickle options for saving pickle files. You can pass in arguments that the backend dump function specified accepts, e.g: pickle.dump: https://docs.python.org/3/library/pickle.html#pickle.dump joblib.dump: https://joblib.readthedocs.io/en/latest/generated/joblib.dump.html dill.dump: https://dill.readthedocs.io/en/latest/index.html#dill.dump compress_pickle.dump: https://lucianopaz.github.io/compress_pickle/html/api/compress_pickle.html#compress_pickle.compress_pickle.dump All defaults are preserved. |
version:Version | If specified, should be an instance of kedro.io.core.Version. If its load attribute is None, the latest version will be loaded. If its save attribute is None, save version will be autogenerated. |
credentials:Dict[ | Credentials required to get access to the underlying filesystem.
E.g. for GCSFileSystem it should look like {"token": None} . |
fsDict[ | Extra arguments to pass into underlying filesystem class constructor
(e.g. {"project": "my-project"} for GCSFileSystem), as well as
to pass to the filesystem's open method through nested keys
open_args_load and open_args_save .
Here you can find all available arguments for open :
https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.open
All defaults are preserved, except mode , which is set to wb when saving. |
Raises | |
ValueError | If backend does not satisfy the pickle interface. |
ImportError | If the backend module could not be imported. |