class documentation

class GeoJSONDataSet(AbstractVersionedDataSet[gpd.GeoDataFrame, Union[gpd.GeoDataFrame, Dict[str, gpd.GeoDataFrame]]]): (source)

View In Hierarchy

GeoJSONDataSet loads/saves data to a GeoJSON file using an underlying filesystem (eg: local, S3, GCS). The underlying functionality is supported by geopandas, so it supports all allowed geopandas (pandas) options for loading and saving GeoJSON files.

Example:

>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> from kedro.extras.datasets.geopandas import GeoJSONDataSet
>>>
>>> data = gpd.GeoDataFrame({'col1': [1, 2], 'col2': [4, 5],
>>>                      'col3': [5, 6]}, geometry=[Point(1,1), Point(2,4)])
>>> data_set = GeoJSONDataSet(filepath="test.geojson", save_args=None)
>>> data_set.save(data)
>>> reloaded = data_set.load()
>>>
>>> assert data.equals(reloaded)
Method __init__ Creates a new instance of GeoJSONDataSet pointing to a concrete GeoJSON file on a specific filesystem fsspec.
Method invalidate_cache Invalidate underlying filesystem cache.
Constant DEFAULT_LOAD_ARGS Undocumented
Constant DEFAULT_SAVE_ARGS Undocumented
Method _describe Undocumented
Method _exists Undocumented
Method _load Undocumented
Method _release Undocumented
Method _save Undocumented
Instance Variable _fs Undocumented
Instance Variable _fs_open_args_load Undocumented
Instance Variable _fs_open_args_save Undocumented
Instance Variable _load_args Undocumented
Instance Variable _protocol Undocumented
Instance Variable _save_args Undocumented

Inherited from AbstractVersionedDataSet:

Method exists Checks whether a data set's output already exists by calling the provided _exists() method.
Method load Loads data by delegation to the provided load method.
Method resolve_load_version Compute the version the dataset should be loaded with.
Method resolve_save_version Compute the version the dataset should be saved with.
Method save Saves data by delegation to the provided save method.
Method _fetch_latest_load_version Undocumented
Method _fetch_latest_save_version Generate and cache the current save version
Method _get_load_path Undocumented
Method _get_save_path Undocumented
Method _get_versioned_path Undocumented
Instance Variable _exists_function Undocumented
Instance Variable _filepath Undocumented
Instance Variable _glob_function Undocumented
Instance Variable _version Undocumented
Instance Variable _version_cache Undocumented

Inherited from AbstractDataSet (via AbstractVersionedDataSet):

Class Method from_config Create a data set instance using the configuration provided.
Method __str__ Undocumented
Method release Release any cached data.
Method _copy Undocumented
Property _logger Undocumented
def __init__(self, filepath: str, load_args: Dict[str, Any] = None, save_args: Dict[str, Any] = None, version: Version = None, credentials: Dict[str, Any] = None, fs_args: Dict[str, Any] = None): (source)

Creates a new instance of GeoJSONDataSet pointing to a concrete GeoJSON file on a specific filesystem fsspec.

Parameters
filepath:strFilepath in POSIX format to a GeoJSON file prefixed with a protocol like s3://. If prefix is not provided file protocol (local filesystem) will be used. The prefix should be any protocol supported by fsspec. Note: http(s) doesn't support versioning.
load_args:Dict[str, Any]GeoPandas options for loading GeoJSON files. Here you can find all available arguments: https://geopandas.org/en/stable/docs/reference/api/geopandas.read_file.html
save_args:Dict[str, Any]GeoPandas options for saving geojson files. Here you can find all available arguments: https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_file.html The default_save_arg driver is 'GeoJSON', all others preserved.
version:VersionIf specified, should be an instance of kedro.io.core.Version. If its load attribute is None, the latest version will be loaded. If its save
credentials:Dict[str, Any]credentials required to access the underlying filesystem. Eg. for GCFileSystem it would look like {'token': None}.
fs_args:Dict[str, Any]Extra arguments to pass into underlying filesystem class constructor (e.g. {"project": "my-project"} for GCSFileSystem), as well as to pass to the filesystem's open method through nested keys open_args_load and open_args_save. Here you can find all available arguments for open: https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.open All defaults are preserved, except mode, which is set to wb when saving.
def invalidate_cache(self): (source)

Invalidate underlying filesystem cache.

DEFAULT_LOAD_ARGS: Dict[str, Any] = (source)

Undocumented

Value
{}
DEFAULT_SAVE_ARGS: dict[str, str] = (source)

Undocumented

Value
{'driver': 'GeoJSON'}
def _describe(self) -> Dict[str, Any]: (source)

Undocumented

def _exists(self) -> bool: (source)

Undocumented

def _load(self) -> Union[gpd.GeoDataFrame, Dict[str, gpd.GeoDataFrame]]: (source)

Undocumented

def _release(self): (source)
def _save(self, data: gpd.GeoDataFrame): (source)

Undocumented

Undocumented

_fs_open_args_load = (source)

Undocumented

_fs_open_args_save = (source)

Undocumented

_load_args = (source)

Undocumented

_protocol = (source)

Undocumented

_save_args = (source)

Undocumented