class documentation

Abstract pipeline that implement the file downloading This pipeline tries to minimize network transfers and file processing, doing stat of the files and determining if file is new, up-to-date or expired. ``new`` files are those that pipeline never processed and needs to be downloaded from supplier site the first time. ``uptodate`` files are the ones that the pipeline processed and are still valid files. ``expired`` files are those that pipeline already processed but the last modification was made long time ago, so a reprocessing is recommended to refresh it in case of change.

Class Method from_settings Undocumented
Method __init__ Undocumented
Method file_downloaded Undocumented
Method file_path Returns the path where downloaded media should be stored
Method get_media_requests Returns the media requests to download
Method inc_stats Undocumented
Method item_completed Called per item when all media requests has been processed
Method media_downloaded Handler for success downloads
Method media_failed Handler for failed downloads
Method media_to_download Check request before starting download
Constant DEFAULT_FILES_RESULT_FIELD Undocumented
Constant DEFAULT_FILES_URLS_FIELD Undocumented
Constant EXPIRES Undocumented
Constant MEDIA_NAME Undocumented
Constant STORE_SCHEMES Undocumented
Instance Variable expires Undocumented
Instance Variable FILES_RESULT_FIELD Undocumented
Instance Variable files_result_field Undocumented
Instance Variable FILES_URLS_FIELD Undocumented
Instance Variable files_urls_field Undocumented
Instance Variable store Undocumented
Method _get_store Undocumented

Inherited from MediaPipeline:

Class SpiderInfo Undocumented
Class Method from_crawler Undocumented
Method open_spider Undocumented
Method process_item Undocumented
Constant LOG_FAILED_RESULTS Undocumented
Instance Variable allow_redirects Undocumented
Instance Variable download_func Undocumented
Instance Variable handle_httpstatus_list Undocumented
Instance Variable spiderinfo Undocumented
Method _cache_result_and_execute_waiters Undocumented
Method _check_media_to_download Undocumented
Method _check_signature Undocumented
Method _compatible Wrapper for overridable methods to allow backwards compatibility
Method _handle_statuses Undocumented
Method _key_for_pipe >>> MediaPipeline()._key_for_pipe("IMAGES") 'IMAGES' >>> class MyPipe(MediaPipeline): ... pass >>> MyPipe()._key_for_pipe("IMAGES", base_class_name="MediaPipeline") 'MYPIPE_IMAGES'
Method _make_compatible Make overridable methods of MediaPipeline and subclasses backwards compatible
Method _modify_media_request Undocumented
Method _process_request Undocumented
Instance Variable _expects_item Undocumented
@classmethod
def from_settings(cls, settings): (source)

Undocumented

def __init__(self, store_uri, download_func=None, settings=None): (source)
def file_downloaded(self, response, request, info, *, item=None): (source)

Undocumented

def file_path(self, request, response=None, info=None, *, item=None): (source)

Returns the path where downloaded media should be stored

def get_media_requests(self, item, info): (source)

Returns the media requests to download

def inc_stats(self, spider, status): (source)

Undocumented

def item_completed(self, results, item, info): (source)

Called per item when all media requests has been processed

def media_downloaded(self, response, request, info, *, item=None): (source)

Handler for success downloads

def media_failed(self, failure, request, info): (source)

Handler for failed downloads

def media_to_download(self, request, info, *, item=None): (source)

Check request before starting download

DEFAULT_FILES_RESULT_FIELD: str = (source)

Undocumented

Value
'files'
DEFAULT_FILES_URLS_FIELD: str = (source)

Undocumented

Value
'file_urls'

Undocumented

Value
90
MEDIA_NAME: str = (source)

Undocumented

Value
'file'
STORE_SCHEMES = (source)

Undocumented

Value
{'': FSFilesStore,
 'file': FSFilesStore,
 's3': S3FilesStore,
 'gs': GCSFilesStore,
 'ftp': FTPFilesStore}
FILES_RESULT_FIELD = (source)

Undocumented

files_result_field = (source)

Undocumented

FILES_URLS_FIELD = (source)

Undocumented

files_urls_field = (source)

Undocumented

Undocumented

def _get_store(self, uri: str): (source)

Undocumented