class CrawlerProcess(CrawlerRunner): (source)
A class to run multiple scrapy crawlers in a process simultaneously. This class extends :class:`~scrapy.crawler.CrawlerRunner` by adding support for starting a :mod:`~twisted.internet.reactor` and handling shutdown signals, like the keyboard interrupt command Ctrl-C. It also configures top-level logging. This utility should be a better fit than :class:`~scrapy.crawler.CrawlerRunner` if you aren't running another :mod:`~twisted.internet.reactor` within your application. The CrawlerProcess object must be instantiated with a :class:`~scrapy.settings.Settings` object. :param install_root_handler: whether to install root logging handler (default: True) This class shouldn't be needed (since Scrapy is responsible of using it accordingly) unless writing scripts that manually handle the crawling process. See :ref:`run-from-script` for an example.
Method | __init__ |
Undocumented |
Method | start |
This method starts a :mod:`~twisted.internet.reactor`, adjusts its pool size to :setting:`REACTOR_THREADPOOL_MAXSIZE`, and installs a DNS cache based on :setting:`DNSCACHE_ENABLED` and :setting:`DNSCACHE_SIZE`. |
Method | _create |
Undocumented |
Method | _graceful |
Undocumented |
Method | _signal |
Undocumented |
Method | _signal |
Undocumented |
Method | _stop |
Undocumented |
Instance Variable | _initialized |
Undocumented |
Inherited from CrawlerRunner
:
Method | crawl |
Run a crawler with the provided arguments. |
Method | create |
Return a :class:`~scrapy.crawler.Crawler` object. |
Method | join |
join() |
Method | stop |
Stops simultaneously all the crawling jobs taking place. |
Class Variable | crawlers |
Undocumented |
Instance Variable | bootstrap |
Undocumented |
Instance Variable | settings |
Undocumented |
Instance Variable | spider |
Undocumented |
Property | spiders |
Undocumented |
Static Method | _get |
Get SpiderLoader instance from settings |
Method | _crawl |
Undocumented |
Instance Variable | _active |
Undocumented |
Instance Variable | _crawlers |
Undocumented |
This method starts a :mod:`~twisted.internet.reactor`, adjusts its pool size to :setting:`REACTOR_THREADPOOL_MAXSIZE`, and installs a DNS cache based on :setting:`DNSCACHE_ENABLED` and :setting:`DNSCACHE_SIZE`. If ``stop_after_crawl`` is True, the reactor will be stopped after all crawlers have finished, using :meth:`join`. :param bool stop_after_crawl: stop or not the reactor when all crawlers have finished :param bool install_signal_handlers: whether to install the shutdown handlers (default: True)