module documentation

AbstractVersionedDataSet implementation to access Spark dataframes using pyspark

Class KedroHdfsInsecureClient Subclasses hdfs.InsecureClient and implements hdfs_exists and hdfs_glob methods required by SparkDataSet
Function _dbfs_exists Perform an ls list operation in DBFS using the provided pattern. It is assumed that version paths are managed by Kedro. Broad Exception is present due to dbutils.fs.ExecutionError that cannot be imported directly.
Function _dbfs_glob Perform a custom glob search in DBFS using the provided pattern. It is assumed that version paths are managed by Kedro only.
Function _get_dbutils Get the instance of 'dbutils' or None if the one could not be found.
Function _parse_glob_pattern Undocumented
Function _split_filepath Undocumented
Function _strip_dbfs_prefix Undocumented
def _dbfs_exists(pattern: str, dbutils: Any) -> bool: (source)

Perform an ls list operation in DBFS using the provided pattern. It is assumed that version paths are managed by Kedro. Broad Exception is present due to dbutils.fs.ExecutionError that cannot be imported directly.

Parameters
pattern:strFilepath to search for.
dbutils:Anydbutils instance to operate with DBFS.
Returns
boolBoolean value if filepath exists.
def _dbfs_glob(pattern: str, dbutils: Any) -> List[str]: (source)

Perform a custom glob search in DBFS using the provided pattern. It is assumed that version paths are managed by Kedro only.

Parameters
pattern:strGlob pattern to search for.
dbutils:Anydbutils instance to operate with DBFS.
Returns
List[str]List of DBFS paths prefixed with '/dbfs' that satisfy the glob pattern.
def _get_dbutils(spark: SparkSession) -> Optional[Any]: (source)

Get the instance of 'dbutils' or None if the one could not be found.

def _parse_glob_pattern(pattern: str) -> str: (source)

Undocumented

def _split_filepath(filepath: str) -> Tuple[str, str]: (source)

Undocumented

def _strip_dbfs_prefix(path: str, prefix: str = '/dbfs') -> str: (source)

Undocumented