class documentation

class DataSource: (source)

Known subclasses: numpy.lib._datasource.Repository

View In Hierarchy

DataSource(destpath='.')

A generic data source file (file, http, ftp, ...).

DataSources can be local files or remote files/URLs. The files may also be compressed or uncompressed. DataSource hides some of the low-level details of downloading the file, allowing you to simply pass in a valid file path (or URL) and obtain a file object.

Notes

URLs require a scheme string (http://) to be used, without it they will fail:

>>> repos = np.DataSource()
>>> repos.exists('www.google.com/index.html')
False
>>> repos.exists('http://www.google.com/index.html')
True

Temporary directories are deleted when the DataSource is deleted.

Examples

>>> ds = np.DataSource('/home/guido')
>>> urlname = 'http://www.google.com/'
>>> gfile = ds.open('http://www.google.com/')
>>> ds.abspath(urlname)
'/home/guido/www.google.com/index.html'

>>> ds = np.DataSource(None)  # use with temporary file
>>> ds.open('/home/guido/foobar.txt')
<open file '/home/guido.foobar.txt', mode 'r' at 0x91d4430>
>>> ds.abspath('/home/guido/foobar.txt')
'/tmp/.../home/guido/foobar.txt'
Parameters
destpathPath to the directory where the source file gets downloaded to for use. If destpath is None, a temporary directory will be created. The default path is the current directory.
Method __del__ Undocumented
Method __init__ Create a DataSource with a local path at destpath.
Method abspath Return absolute path of file in the DataSource directory.
Method exists Test if path exists.
Method open Open and return file-like object.
Method _cache Cache the file specified by path.
Method _findfile Searches for path and returns full path if found.
Method _isurl Test if path is a net location. Tests the scheme and netloc.
Method _iswritemode Test if the given mode will open a file for writing.
Method _iszip Test if the filename is a zip file by looking at the file extension.
Method _possible_names Return a tuple containing compressed filename variations.
Method _sanitize_relative_path Return a sanitised relative path for which os.path.abspath(os.path.join(base, path)).startswith(base)
Method _splitzipext Split zip extension from filename and return filename.
Instance Variable _destpath Undocumented
Instance Variable _istmpdest Undocumented
def __del__(self): (source)

Undocumented

def __init__(self, destpath=os.curdir): (source)

Create a DataSource with a local path at destpath.

def abspath(self, path): (source)

Return absolute path of file in the DataSource directory.

If path is an URL, then abspath will return either the location the file exists locally or the location it would exist when opened using the open method.

Notes

The functionality is based on os.path.abspath.

Parameters
path:strCan be a local file or a remote URL.
Returns
strout - Complete path, including the DataSource destination directory.
def exists(self, path): (source)

Test if path exists.

Test if path exists as (and in this order):

  • a local file.
  • a remote URL that has been downloaded and stored locally in the DataSource directory.
  • a remote URL that has not been downloaded, but is valid and accessible.

Notes

When path is an URL, exists will return True if it's either stored locally in the DataSource directory, or is a valid remote URL. DataSource does not discriminate between the two, the file is accessible if it exists in either location.

Parameters
path:strCan be a local file or a remote URL.
Returns
boolout - True if path exists.
def open(self, path, mode='r', encoding=None, newline=None): (source)

Open and return file-like object.

If path is an URL, it will be downloaded, stored in the DataSource directory and opened from there.

Parameters
path:strLocal file path or URL to open.
mode:{'r', 'w', 'a'}, optionalMode to open path. Mode 'r' for reading, 'w' for writing, 'a' to append. Available modes depend on the type of object specified by path. Default is 'r'.
encoding:{None, str}, optionalOpen text file with given encoding. The default encoding will be what io.open uses.
newline:{None, str}, optionalNewline to use when reading text file.
Returns
file objectout - File object.
def _cache(self, path): (source)

Cache the file specified by path.

Creates a copy of the file in the datasource cache.

def _findfile(self, path): (source)

Searches for path and returns full path if found.

If path is an URL, _findfile will cache a local copy and return the path to the cached file. If path is a local file, _findfile will return a path to that local file.

The search will include possible compressed versions of the file and return the first occurrence found.

def _isurl(self, path): (source)

Test if path is a net location. Tests the scheme and netloc.

def _iswritemode(self, mode): (source)

Test if the given mode will open a file for writing.

def _iszip(self, filename): (source)

Test if the filename is a zip file by looking at the file extension.

def _possible_names(self, filename): (source)

Return a tuple containing compressed filename variations.

def _sanitize_relative_path(self, path): (source)

Return a sanitised relative path for which os.path.abspath(os.path.join(base, path)).startswith(base)

def _splitzipext(self, filename): (source)

Split zip extension from filename and return filename.

Returns
{tuple}base, zip_ext
_destpath = (source)

Undocumented

_istmpdest: bool = (source)

Undocumented