class documentation

class SitemapSpider(Spider): (source)

View In Hierarchy

Undocumented

Method __init__ Undocumented
Method sitemap_filter This method can be used to filter sitemap entries by their attributes, for example, you can filter locs with lastmod greater than a given date (see docs).
Method start_requests Undocumented
Class Variable sitemap_alternate_links Undocumented
Class Variable sitemap_follow Undocumented
Class Variable sitemap_rules Undocumented
Class Variable sitemap_urls Undocumented
Method _get_sitemap_body Return the sitemap body contained in the given response, or None if the response is not a sitemap.
Method _parse_sitemap Undocumented
Instance Variable _cbs Undocumented
Instance Variable _follow Undocumented

Inherited from Spider:

Class Method from_crawler Undocumented
Class Method handles_request Undocumented
Class Method update_settings Undocumented
Static Method close Undocumented
Method __repr__ Undocumented
Method log Log the given message at the given log level
Method parse Undocumented
Class Variable custom_settings Undocumented
Instance Variable crawler Undocumented
Instance Variable name Undocumented
Instance Variable settings Undocumented
Instance Variable start_urls Undocumented
Property logger Undocumented
Method _parse Undocumented
Method _set_crawler Undocumented

Inherited from object_ref (via Spider):

Method __new__ Undocumented
Class Variable __slots__ Undocumented
def __init__(self, *a, **kw): (source)

Undocumented

def sitemap_filter(self, entries): (source)

This method can be used to filter sitemap entries by their attributes, for example, you can filter locs with lastmod greater than a given date (see docs).

def start_requests(self): (source)

Undocumented

sitemap_alternate_links: bool = (source)

Undocumented

sitemap_follow: list[str] = (source)

Undocumented

sitemap_rules: list = (source)

Undocumented

sitemap_urls: tuple = (source)

Undocumented

def _get_sitemap_body(self, response): (source)

Return the sitemap body contained in the given response, or None if the response is not a sitemap.

def _parse_sitemap(self, response): (source)

Undocumented

Undocumented

Undocumented