class SVMLightDataSet(AbstractVersionedDataSet[
SVMLightDataSet loads/saves data from/to a svmlight/libsvm file using an underlying filesystem (e.g.: local, S3, GCS). It uses sklearn functions dump_svmlight_file to save and load_svmlight_file to load a file.
Data is loaded as a tuple of features and labels. Labels is NumPy array, and features is Compressed Sparse Row matrix.
This format is a text-based format, with one sample per line. It does not store zero valued features hence it is suitable for sparse datasets.
This format is used as the default format for both svmlight and the libsvm command line programs.
Example usage for the YAML API:
svm_dataset: type: svmlight.SVMLightDataSet filepath: data/01_raw/location.svm load_args: zero_based: False save_args: zero_based: False cars: type: svmlight.SVMLightDataSet filepath: gcs://your_bucket/cars.svm fs_args: project: my-project credentials: my_gcp_credentials load_args: zero_based: False save_args: zero_based: False
Example usage for the Python API:
>>> from kedro.extras.datasets.svmlight import SVMLightDataSet >>> import numpy as np >>> >>> # Features and labels. >>> data = (np.array([[0, 1], [2, 3.14159]]), np.array([7, 3])) >>> >>> data_set = SVMLightDataSet(filepath="test.svm") >>> data_set.save(data) >>> reloaded_features, reloaded_labels = data_set.load() >>> assert (data[0] == reloaded_features).all() >>> assert (data[1] == reloaded_labels).all()
Method | __init__ |
Creates a new instance of AbstractVersionedDataSet. |
Constant | DEFAULT |
Undocumented |
Constant | DEFAULT |
Undocumented |
Method | _describe |
Undocumented |
Method | _exists |
Undocumented |
Method | _invalidate |
Invalidate underlying filesystem caches. |
Method | _load |
Undocumented |
Method | _release |
Undocumented |
Method | _save |
Undocumented |
Instance Variable | _fs |
Undocumented |
Instance Variable | _fs |
Undocumented |
Instance Variable | _fs |
Undocumented |
Instance Variable | _load |
Undocumented |
Instance Variable | _protocol |
Undocumented |
Instance Variable | _save |
Undocumented |
Inherited from AbstractVersionedDataSet
:
Method | exists |
Checks whether a data set's output already exists by calling the provided _exists() method. |
Method | load |
Loads data by delegation to the provided load method. |
Method | resolve |
Compute the version the dataset should be loaded with. |
Method | resolve |
Compute the version the dataset should be saved with. |
Method | save |
Saves data by delegation to the provided save method. |
Method | _fetch |
Undocumented |
Method | _fetch |
Generate and cache the current save version |
Method | _get |
Undocumented |
Method | _get |
Undocumented |
Method | _get |
Undocumented |
Instance Variable | _exists |
Undocumented |
Instance Variable | _filepath |
Undocumented |
Instance Variable | _glob |
Undocumented |
Instance Variable | _version |
Undocumented |
Instance Variable | _version |
Undocumented |
Inherited from AbstractDataSet
(via AbstractVersionedDataSet
):
Class Method | from |
Create a data set instance using the configuration provided. |
Method | __str__ |
Undocumented |
Method | release |
Release any cached data. |
Method | _copy |
Undocumented |
Property | _logger |
Undocumented |
str
, load_args: Dict[ str, Any]
= None, save_args: Dict[ str, Any]
= None, version: Optional[ Version]
= None, credentials: Dict[ str, Any]
= None, fs_args: Dict[ str, Any]
= None):
(source)
¶
Creates a new instance of AbstractVersionedDataSet.
Parameters | |
filepath:str | Filepath in POSIX format to a file. |
loadDict[ | Undocumented |
saveDict[ | Undocumented |
version:Optional[ | If specified, should be an instance of kedro.io.core.Version. If its load attribute is None, the latest version will be loaded. If its save attribute is None, save version will be autogenerated. |
credentials:Dict[ | Undocumented |
fsDict[ | Undocumented |
exists | Function that is used for determining whether a path exists in a filesystem. |
glob | Function that is used for finding all paths in a filesystem, which match a given pattern. |