class documentation

class Scraper: (source)

View In Hierarchy

Undocumented

Method __init__ Undocumented
Method call_spider Undocumented
Method close_spider Close a spider being scraped and release its resources
Method enqueue_scrape Undocumented
Method handle_spider_error Undocumented
Method handle_spider_output Undocumented
Method is_idle Return True if there isn't any more spiders to process
Method open_spider Open the given spider for scraping and allocate resources for it
Instance Variable concurrent_items Undocumented
Instance Variable crawler Undocumented
Instance Variable itemproc Undocumented
Instance Variable logformatter Undocumented
Instance Variable signals Undocumented
Instance Variable slot Undocumented
Instance Variable spidermw Undocumented
Method _check_if_closing Undocumented
Method _itemproc_finished ItemProcessor finished for the given ``item`` and returned ``output``
Method _log_download_errors Log and silence errors that come from the engine (typically download errors that got propagated thru here).
Method _process_spidermw_output Process each Request/Item (given in the output parameter) returned from the given spider
Method _scrape Handle the downloaded response or failure through the spider callback/errback
Method _scrape2 Handle the different cases of request's result been a Response or a Failure
Method _scrape_next Undocumented
def __init__(self, crawler: Crawler): (source)

Undocumented

def call_spider(self, result: Union[Response, Failure], request: Request, spider: Spider) -> Deferred: (source)

Undocumented

def close_spider(self, spider: Spider) -> Deferred: (source)

Close a spider being scraped and release its resources

def enqueue_scrape(self, result: Union[Response, Failure], request: Request, spider: Spider) -> Deferred: (source)

Undocumented

def handle_spider_error(self, _failure: Failure, request: Request, response: Response, spider: Spider): (source)

Undocumented

def handle_spider_output(self, result: Union[Iterable, AsyncIterable], request: Request, response: Response, spider: Spider) -> Deferred: (source)

Undocumented

def is_idle(self) -> bool: (source)

Return True if there isn't any more spiders to process

@inlineCallbacks
def open_spider(self, spider: Spider): (source)

Open the given spider for scraping and allocate resources for it

concurrent_items = (source)

Undocumented

Undocumented

itemproc = (source)

Undocumented

logformatter = (source)

Undocumented

Undocumented

Undocumented

spidermw = (source)

Undocumented

def _check_if_closing(self, spider: Spider): (source)

Undocumented

def _itemproc_finished(self, output: Any, item: Any, response: Response, spider: Spider): (source)

ItemProcessor finished for the given ``item`` and returned ``output``

def _log_download_errors(self, spider_failure: Failure, download_failure: Failure, request: Request, spider: Spider) -> Union[Failure, None]: (source)

Log and silence errors that come from the engine (typically download errors that got propagated thru here). spider_failure: the value passed into the errback of self.call_spider() download_failure: the value passed into _scrape2() from ExecutionEngine._handle_downloader_output() as "result"

def _process_spidermw_output(self, output: Any, request: Request, response: Response, spider: Spider) -> Optional[Deferred]: (source)

Process each Request/Item (given in the output parameter) returned from the given spider

def _scrape(self, result: Union[Response, Failure], request: Request, spider: Spider) -> Deferred: (source)

Handle the downloaded response or failure through the spider callback/errback

def _scrape2(self, result: Union[Response, Failure], request: Request, spider: Spider) -> Deferred: (source)

Handle the different cases of request's result been a Response or a Failure

def _scrape_next(self, spider: Spider): (source)

Undocumented