class documentation

class CrawlerProcess(CrawlerRunner): (source)

View In Hierarchy

A class to run multiple scrapy crawlers in a process simultaneously. This class extends :class:`~scrapy.crawler.CrawlerRunner` by adding support for starting a :mod:`~twisted.internet.reactor` and handling shutdown signals, like the keyboard interrupt command Ctrl-C. It also configures top-level logging. This utility should be a better fit than :class:`~scrapy.crawler.CrawlerRunner` if you aren't running another :mod:`~twisted.internet.reactor` within your application. The CrawlerProcess object must be instantiated with a :class:`~scrapy.settings.Settings` object. :param install_root_handler: whether to install root logging handler (default: True) This class shouldn't be needed (since Scrapy is responsible of using it accordingly) unless writing scripts that manually handle the crawling process. See :ref:`run-from-script` for an example.

Method __init__ Undocumented
Method start This method starts a :mod:`~twisted.internet.reactor`, adjusts its pool size to :setting:`REACTOR_THREADPOOL_MAXSIZE`, and installs a DNS cache based on :setting:`DNSCACHE_ENABLED` and :setting:`DNSCACHE_SIZE`.
Method _create_crawler Undocumented
Method _graceful_stop_reactor Undocumented
Method _signal_kill Undocumented
Method _signal_shutdown Undocumented
Method _stop_reactor Undocumented
Instance Variable _initialized_reactor Undocumented

Inherited from CrawlerRunner:

Method crawl Run a crawler with the provided arguments.
Method create_crawler Return a :class:`~scrapy.crawler.Crawler` object.
Method join join()
Method stop Stops simultaneously all the crawling jobs taking place.
Class Variable crawlers Undocumented
Instance Variable bootstrap_failed Undocumented
Instance Variable settings Undocumented
Instance Variable spider_loader Undocumented
Property spiders Undocumented
Static Method _get_spider_loader Get SpiderLoader instance from settings
Method _crawl Undocumented
Instance Variable _active Undocumented
Instance Variable _crawlers Undocumented
def __init__(self, settings=None, install_root_handler=True): (source)

Undocumented

def start(self, stop_after_crawl=True, install_signal_handlers=True): (source)

This method starts a :mod:`~twisted.internet.reactor`, adjusts its pool size to :setting:`REACTOR_THREADPOOL_MAXSIZE`, and installs a DNS cache based on :setting:`DNSCACHE_ENABLED` and :setting:`DNSCACHE_SIZE`. If ``stop_after_crawl`` is True, the reactor will be stopped after all crawlers have finished, using :meth:`join`. :param bool stop_after_crawl: stop or not the reactor when all crawlers have finished :param bool install_signal_handlers: whether to install the shutdown handlers (default: True)

def _create_crawler(self, spidercls): (source)
def _graceful_stop_reactor(self): (source)

Undocumented

def _signal_kill(self, signum, _): (source)

Undocumented

def _signal_shutdown(self, signum, _): (source)

Undocumented

def _stop_reactor(self, _=None): (source)

Undocumented

_initialized_reactor: bool = (source)

Undocumented