The scheduler component is responsible for storing requests received from the engine, and feeding them back upon request (also to the engine). The original sources of said requests are: * Spider: ``start_requests`` method, requests created for URLs in the ``start_urls`` attribute, request callbacks * Spider middleware: ``process_spider_output`` and ``process_spider_exception`` methods * Downloader middleware: ``process_request``, ``process_response`` and ``process_exception`` methods The order in which the scheduler returns its stored requests (via the ``next_request`` method) plays a great part in determining the order in which those requests are downloaded. The methods defined in this class constitute the minimal interface that the Scrapy engine will interact with.
Class Method | from |
Factory method which receives the current :class:`~scrapy.crawler.Crawler` object as argument. |
Method | close |
Called when the spider is closed by the engine. It receives the reason why the crawl finished as argument and it's useful to execute cleaning code. |
Method | enqueue |
Process a request received by the engine. |
Method | has |
``True`` if the scheduler has enqueued requests, ``False`` otherwise |
Method | next |
Return the next :class:`~scrapy.http.Request` to be processed, or ``None`` to indicate that there are no requests to be considered ready at the moment. |
Method | open |
Called when the spider is opened by the engine. It receives the spider instance as argument and it's useful to execute initialization code. |
scrapy.core.scheduler.Scheduler
Factory method which receives the current :class:`~scrapy.crawler.Crawler` object as argument.
scrapy.core.scheduler.Scheduler
Called when the spider is closed by the engine. It receives the reason why the crawl finished as argument and it's useful to execute cleaning code. :param reason: a string which describes the reason why the spider was closed :type reason: :class:`str`
scrapy.core.scheduler.Scheduler
Process a request received by the engine. Return ``True`` if the request is stored correctly, ``False`` otherwise. If ``False``, the engine will fire a ``request_dropped`` signal, and will not make further attempts to schedule the request at a later time. For reference, the default Scrapy scheduler returns ``False`` when the request is rejected by the dupefilter.
scrapy.core.scheduler.Scheduler
``True`` if the scheduler has enqueued requests, ``False`` otherwise
scrapy.core.scheduler.Scheduler
Return the next :class:`~scrapy.http.Request` to be processed, or ``None`` to indicate that there are no requests to be considered ready at the moment. Returning ``None`` implies that no request from the scheduler will be sent to the downloader in the current reactor cycle. The engine will continue calling ``next_request`` until ``has_pending_requests`` is ``False``.
scrapy.core.scheduler.Scheduler
Called when the spider is opened by the engine. It receives the spider instance as argument and it's useful to execute initialization code. :param spider: the spider object for the current crawl :type spider: :class:`~scrapy.spiders.Spider`