class documentation

class BaseScheduler: (source)

Known subclasses: scrapy.core.scheduler.Scheduler

View In Hierarchy

The scheduler component is responsible for storing requests received from the engine, and feeding them back upon request (also to the engine). The original sources of said requests are: * Spider: ``start_requests`` method, requests created for URLs in the ``start_urls`` attribute, request callbacks * Spider middleware: ``process_spider_output`` and ``process_spider_exception`` methods * Downloader middleware: ``process_request``, ``process_response`` and ``process_exception`` methods The order in which the scheduler returns its stored requests (via the ``next_request`` method) plays a great part in determining the order in which those requests are downloaded. The methods defined in this class constitute the minimal interface that the Scrapy engine will interact with.

Class Method from_crawler Factory method which receives the current :class:`~scrapy.crawler.Crawler` object as argument.
Method close Called when the spider is closed by the engine. It receives the reason why the crawl finished as argument and it's useful to execute cleaning code.
Method enqueue_request Process a request received by the engine.
Method has_pending_requests ``True`` if the scheduler has enqueued requests, ``False`` otherwise
Method next_request Return the next :class:`~scrapy.http.Request` to be processed, or ``None`` to indicate that there are no requests to be considered ready at the moment.
Method open Called when the spider is opened by the engine. It receives the spider instance as argument and it's useful to execute initialization code.
@classmethod
def from_crawler(cls, crawler: Crawler): (source)

Factory method which receives the current :class:`~scrapy.crawler.Crawler` object as argument.

def close(self, reason: str) -> Optional[Deferred]: (source)

Called when the spider is closed by the engine. It receives the reason why the crawl finished as argument and it's useful to execute cleaning code. :param reason: a string which describes the reason why the spider was closed :type reason: :class:`str`

@abstractmethod
def enqueue_request(self, request: Request) -> bool: (source)

Process a request received by the engine. Return ``True`` if the request is stored correctly, ``False`` otherwise. If ``False``, the engine will fire a ``request_dropped`` signal, and will not make further attempts to schedule the request at a later time. For reference, the default Scrapy scheduler returns ``False`` when the request is rejected by the dupefilter.

@abstractmethod
def has_pending_requests(self) -> bool: (source)

``True`` if the scheduler has enqueued requests, ``False`` otherwise

@abstractmethod
def next_request(self) -> Optional[Request]: (source)

Return the next :class:`~scrapy.http.Request` to be processed, or ``None`` to indicate that there are no requests to be considered ready at the moment. Returning ``None`` implies that no request from the scheduler will be sent to the downloader in the current reactor cycle. The engine will continue calling ``next_request`` until ``has_pending_requests`` is ``False``.

def open(self, spider: Spider) -> Optional[Deferred]: (source)

Called when the spider is opened by the engine. It receives the spider instance as argument and it's useful to execute initialization code. :param spider: the spider object for the current crawl :type spider: :class:`~scrapy.spiders.Spider`