scrapy.core.scraper.Scraper

class documentation

class Scraper: (source)

Undocumented

Method	`__init__`	Undocumented
Method	`call_spider`	Undocumented
Method	`close_spider`	Close a spider being scraped and release its resources
Method	`enqueue_scrape`	Undocumented
Method	`handle_spider_error`	Undocumented
Method	`handle_spider_output`	Undocumented
Method	`is_idle`	Return True if there isn't any more spiders to process
Method	`open_spider`	Open the given spider for scraping and allocate resources for it
Instance Variable	`concurrent_items`	Undocumented
Instance Variable	`crawler`	Undocumented
Instance Variable	`itemproc`	Undocumented
Instance Variable	`logformatter`	Undocumented
Instance Variable	`signals`	Undocumented
Instance Variable	`slot`	Undocumented
Instance Variable	`spidermw`	Undocumented
Method	`_check_if_closing`	Undocumented
Method	`_itemproc_finished`	ItemProcessor finished for the given ``item`` and returned ``output``
Method	`_log_download_errors`	Log and silence errors that come from the engine (typically download errors that got propagated thru here).
Method	`_process_spidermw_output`	Process each Request/Item (given in the output parameter) returned from the given spider
Method	`_scrape`	Handle the downloaded response or failure through the spider callback/errback
Method	`_scrape2`	Handle the different cases of request's result been a Response or a Failure
Method	`_scrape_next`	Undocumented

def __init__(self, crawler: Crawler): (source) ¶

Undocumented

def call_spider(self, result: Union[Response, Failure], request: Request, spider: Spider) -> Deferred: (source) ¶

Undocumented

def close_spider(self, spider: Spider) -> Deferred: (source) ¶

Close a spider being scraped and release its resources

def enqueue_scrape(self, result: Union[Response, Failure], request: Request, spider: Spider) -> Deferred: (source) ¶

Undocumented

def handle_spider_error(self, _failure: Failure, request: Request, response: Response, spider: Spider): (source) ¶

Undocumented

def handle_spider_output(self, result: Union[Iterable, AsyncIterable], request: Request, response: Response, spider: Spider) -> Deferred: (source) ¶

Undocumented

def is_idle(self) -> bool: (source) ¶

Return True if there isn't any more spiders to process

@inlineCallbacks
def open_spider(self, spider: Spider): (source) ¶

Open the given spider for scraping and allocate resources for it

concurrent_items = (source) ¶

Undocumented

crawler = (source) ¶

Undocumented

itemproc = (source) ¶

Undocumented

logformatter = (source) ¶

Undocumented

signals = (source) ¶

Undocumented

slot = (source) ¶

Undocumented

spidermw = (source) ¶

Undocumented

def _check_if_closing(self, spider: Spider): (source) ¶

Undocumented

def _itemproc_finished(self, output: Any, item: Any, response: Response, spider: Spider): (source) ¶

ItemProcessor finished for the given ``item`` and returned ``output``

def _log_download_errors(self, spider_failure: Failure, download_failure: Failure, request: Request, spider: Spider) -> Union[Failure, None]: (source) ¶

Log and silence errors that come from the engine (typically download errors that got propagated thru here). spider_failure: the value passed into the errback of self.call_spider() download_failure: the value passed into _scrape2() from ExecutionEngine._handle_downloader_output() as "result"

def _process_spidermw_output(self, output: Any, request: Request, response: Response, spider: Spider) -> Optional[Deferred]: (source) ¶

Process each Request/Item (given in the output parameter) returned from the given spider

def _scrape(self, result: Union[Response, Failure], request: Request, spider: Spider) -> Deferred: (source) ¶

Handle the downloaded response or failure through the spider callback/errback

def _scrape2(self, result: Union[Response, Failure], request: Request, spider: Spider) -> Deferred: (source) ¶

Handle the different cases of request's result been a Response or a Failure

def _scrape_next(self, spider: Spider): (source) ¶

Undocumented