kedro.extras.datasets.pandas.XMLDataSet

class documentation

class XMLDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame]): (source)

XMLDataSet loads/saves data from/to a XML file using an underlying filesystem (e.g.: local, S3, GCS). It uses pandas to handle the XML file.

Example usage for the Python API:

>>> from kedro.extras.datasets.pandas import XMLDataSet
>>> import pandas as pd
>>>
>>> data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5],
>>>                      'col3': [5, 6]})
>>>
>>> data_set = XMLDataSet(filepath="test.xml")
>>> data_set.save(data)
>>> reloaded = data_set.load()
>>> assert data.equals(reloaded)

Method	`__init__`	Creates a new instance of `XMLDataSet` pointing to a concrete XML file on a specific filesystem.
Constant	`DEFAULT_LOAD_ARGS`	Undocumented
Constant	`DEFAULT_SAVE_ARGS`	Undocumented
Method	`_describe`	Undocumented
Method	`_exists`	Undocumented
Method	`_invalidate_cache`	Invalidate underlying filesystem caches.
Method	`_load`	Undocumented
Method	`_release`	Undocumented
Method	`_save`	Undocumented
Instance Variable	`_fs`	Undocumented
Instance Variable	`_load_args`	Undocumented
Instance Variable	`_protocol`	Undocumented
Instance Variable	`_save_args`	Undocumented
Instance Variable	`_storage_options`	Undocumented

Inherited from AbstractVersionedDataSet:

Method	`exists`	Checks whether a data set's output already exists by calling the provided _exists() method.
Method	`load`	Loads data by delegation to the provided load method.
Method	`resolve_load_version`	Compute the version the dataset should be loaded with.
Method	`resolve_save_version`	Compute the version the dataset should be saved with.
Method	`save`	Saves data by delegation to the provided save method.
Method	`_fetch_latest_load_version`	Undocumented
Method	`_fetch_latest_save_version`	Generate and cache the current save version
Method	`_get_load_path`	Undocumented
Method	`_get_save_path`	Undocumented
Method	`_get_versioned_path`	Undocumented
Instance Variable	`_exists_function`	Undocumented
Instance Variable	`_filepath`	Undocumented
Instance Variable	`_glob_function`	Undocumented
Instance Variable	`_version`	Undocumented
Instance Variable	`_version_cache`	Undocumented

Inherited from AbstractDataSet (via AbstractVersionedDataSet):

Class Method	`from_config`	Create a data set instance using the configuration provided.
Method	`__str__`	Undocumented
Method	`release`	Release any cached data.
Method	`_copy`	Undocumented
Property	`_logger`	Undocumented

def __init__(self, filepath: str, load_args: Dict[str, Any] = None, save_args: Dict[str, Any] = None, version: Version = None, credentials: Dict[str, Any] = None, fs_args: Dict[str, Any] = None): (source) ¶

overrides kedro.io.AbstractVersionedDataSet.__init__

Creates a new instance of XMLDataSet pointing to a concrete XML file on a specific filesystem.

Parameters
filepath:`str`	Filepath in POSIX format to a XML file prefixed with a protocol like `s3://`. If prefix is not provided, `file` protocol (local filesystem) will be used. The prefix should be any protocol supported by `fsspec`. Note: `http(s)` doesn't support versioning.
load_args:`Dict[str, Any]`	Pandas options for loading XML files. Here you can find all available arguments: https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html All defaults are preserved.
save_args:`Dict[str, Any]`	Pandas options for saving XML files. Here you can find all available arguments: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_xml.html All defaults are preserved, but "index", which is set to False.
version:`Version`	If specified, should be an instance of `kedro.io.core.Version`. If its `load` attribute is None, the latest version will be loaded. If its `save` attribute is None, save version will be autogenerated.
credentials:`Dict[str, Any]`	Credentials required to get access to the underlying filesystem. E.g. for `GCSFileSystem` it should look like `{"token": None}`.
fs_args:`Dict[str, Any]`	Extra arguments to pass into underlying filesystem class constructor (e.g. `{"project": "my-project"}` for `GCSFileSystem`).