class documentation

class BioSequenceDataSet(AbstractDataSet[List, List]): (source)

View In Hierarchy

BioSequenceDataSet loads and saves data to a sequence file.

Example:

>>> from kedro.extras.datasets.biosequence import BioSequenceDataSet
>>> from io import StringIO
>>> from Bio import SeqIO
>>>
>>> data = ">Alpha\nACCGGATGTA\n>Beta\nAGGCTCGGTTA\n"
>>> raw_data = []
>>> for record in SeqIO.parse(StringIO(data), "fasta"):
>>>     raw_data.append(record)
>>>
>>> data_set = BioSequenceDataSet(filepath="ls_orchid.fasta",
>>>                               load_args={"format": "fasta"},
>>>                               save_args={"format": "fasta"})
>>> data_set.save(raw_data)
>>> sequence_list = data_set.load()
>>>
>>> assert raw_data[0].id == sequence_list[0].id
>>> assert raw_data[0].seq == sequence_list[0].seq
Method __init__ Creates a new instance of BioSequenceDataSet pointing to a concrete filepath.
Method invalidate_cache Invalidate underlying filesystem caches.
Constant DEFAULT_LOAD_ARGS Undocumented
Constant DEFAULT_SAVE_ARGS Undocumented
Method _describe Undocumented
Method _exists Undocumented
Method _load Undocumented
Method _release Undocumented
Method _save Undocumented
Instance Variable _filepath Undocumented
Instance Variable _fs Undocumented
Instance Variable _fs_open_args_load Undocumented
Instance Variable _fs_open_args_save Undocumented
Instance Variable _load_args Undocumented
Instance Variable _protocol Undocumented
Instance Variable _save_args Undocumented

Inherited from AbstractDataSet:

Class Method from_config Create a data set instance using the configuration provided.
Method __str__ Undocumented
Method exists Checks whether a data set's output already exists by calling the provided _exists() method.
Method load Loads data by delegation to the provided load method.
Method release Release any cached data.
Method save Saves data by delegation to the provided save method.
Method _copy Undocumented
Property _logger Undocumented
def __init__(self, filepath: str, load_args: Dict[str, Any] = None, save_args: Dict[str, Any] = None, credentials: Dict[str, Any] = None, fs_args: Dict[str, Any] = None): (source)

Creates a new instance of BioSequenceDataSet pointing to a concrete filepath.

Note: Here you can find all supported file formats: https://biopython.org/wiki/SeqIO

Parameters
filepath:strFilepath in POSIX format to sequence file prefixed with a protocol like s3://. If prefix is not provided, file protocol (local filesystem) will be used. The prefix should be any protocol supported by fsspec.
load_args:Dict[str, Any]Options for parsing sequence files by Biopython SeqIO.parse().
save_args:Dict[str, Any]file format supported by Biopython SeqIO.write(). E.g. {"format": "fasta"}.
credentials:Dict[str, Any]Credentials required to get access to the underlying filesystem. E.g. for GCSFileSystem it should look like {"token": None}.
fs_args:Dict[str, Any]Extra arguments to pass into underlying filesystem class constructor (e.g. {"project": "my-project"} for GCSFileSystem), as well as to pass to the filesystem's open method through nested keys open_args_load and open_args_save. Here you can find all available arguments for open: https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.open All defaults are preserved, except mode, which is set to r when loading and to w when saving.
def invalidate_cache(self): (source)

Invalidate underlying filesystem caches.

DEFAULT_LOAD_ARGS: Dict[str, Any] = (source)

Undocumented

Value
{}
DEFAULT_SAVE_ARGS: Dict[str, Any] = (source)

Undocumented

Value
{}
def _describe(self) -> Dict[str, Any]: (source)

Undocumented

def _exists(self) -> bool: (source)

Undocumented

def _load(self) -> List: (source)

Undocumented

def _release(self): (source)

Undocumented

def _save(self, data: List): (source)

Undocumented

_filepath = (source)

Undocumented

Undocumented

_fs_open_args_load = (source)

Undocumented

_fs_open_args_save = (source)

Undocumented

_load_args = (source)

Undocumented

_protocol = (source)

Undocumented

_save_args = (source)

Undocumented