class documentation
class BioSequenceDataSet(AbstractDataSet[
BioSequenceDataSet loads and saves data to a sequence file.
Example:
>>> from kedro.extras.datasets.biosequence import BioSequenceDataSet >>> from io import StringIO >>> from Bio import SeqIO >>> >>> data = ">Alpha\nACCGGATGTA\n>Beta\nAGGCTCGGTTA\n" >>> raw_data = [] >>> for record in SeqIO.parse(StringIO(data), "fasta"): >>> raw_data.append(record) >>> >>> data_set = BioSequenceDataSet(filepath="ls_orchid.fasta", >>> load_args={"format": "fasta"}, >>> save_args={"format": "fasta"}) >>> data_set.save(raw_data) >>> sequence_list = data_set.load() >>> >>> assert raw_data[0].id == sequence_list[0].id >>> assert raw_data[0].seq == sequence_list[0].seq
Method | __init__ |
Creates a new instance of BioSequenceDataSet pointing to a concrete filepath. |
Method | invalidate |
Invalidate underlying filesystem caches. |
Constant | DEFAULT |
Undocumented |
Constant | DEFAULT |
Undocumented |
Method | _describe |
Undocumented |
Method | _exists |
Undocumented |
Method | _load |
Undocumented |
Method | _release |
Undocumented |
Method | _save |
Undocumented |
Instance Variable | _filepath |
Undocumented |
Instance Variable | _fs |
Undocumented |
Instance Variable | _fs |
Undocumented |
Instance Variable | _fs |
Undocumented |
Instance Variable | _load |
Undocumented |
Instance Variable | _protocol |
Undocumented |
Instance Variable | _save |
Undocumented |
Inherited from AbstractDataSet
:
Class Method | from |
Create a data set instance using the configuration provided. |
Method | __str__ |
Undocumented |
Method | exists |
Checks whether a data set's output already exists by calling the provided _exists() method. |
Method | load |
Loads data by delegation to the provided load method. |
Method | release |
Release any cached data. |
Method | save |
Saves data by delegation to the provided save method. |
Method | _copy |
Undocumented |
Property | _logger |
Undocumented |
def __init__(self, filepath:
str
, load_args: Dict[ str, Any]
= None, save_args: Dict[ str, Any]
= None, credentials: Dict[ str, Any]
= None, fs_args: Dict[ str, Any]
= None):
(source)
¶
Creates a new instance of BioSequenceDataSet pointing to a concrete filepath.
Note: Here you can find all supported file formats: https://biopython.org/wiki/SeqIO
Parameters | |
filepath:str | Filepath in POSIX format to sequence file prefixed with a protocol like
s3:// . If prefix is not provided, file protocol (local filesystem) will be used.
The prefix should be any protocol supported by fsspec. |
loadDict[ | Options for parsing sequence files by Biopython SeqIO.parse(). |
saveDict[ | file format supported by Biopython SeqIO.write().
E.g. {"format": "fasta"} . |
credentials:Dict[ | Credentials required to get access to the underlying filesystem.
E.g. for GCSFileSystem it should look like {"token": None} . |
fsDict[ | Extra arguments to pass into underlying filesystem class constructor
(e.g. {"project": "my-project"} for GCSFileSystem), as well as
to pass to the filesystem's open method through nested keys
open_args_load and open_args_save .
Here you can find all available arguments for open :
https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.open
All defaults are preserved, except mode , which is set to r when loading
and to w when saving. |