scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor

class documentation

class LxmlLinkExtractor: (source)

Undocumented

Method	`__init__`	Undocumented
Method	`extract_links`	Returns a list of :class:`~scrapy.link.Link` objects from the specified :class:`response <scrapy.http.Response>`.
Method	`matches`	Undocumented
Instance Variable	`allow_domains`	Undocumented
Instance Variable	`allow_res`	Undocumented
Instance Variable	`canonicalize`	Undocumented
Instance Variable	`deny_domains`	Undocumented
Instance Variable	`deny_extensions`	Undocumented
Instance Variable	`deny_res`	Undocumented
Instance Variable	`link_extractor`	Undocumented
Instance Variable	`restrict_text`	Undocumented
Instance Variable	`restrict_xpaths`	Undocumented
Method	`_extract_links`	Undocumented
Method	`_link_allowed`	Undocumented
Method	`_process_links`	Undocumented
Class Variable	`_csstranslator`	Undocumented

def __init__(self, allow=(), deny=(), allow_domains=(), deny_domains=(), restrict_xpaths=(), tags=('a', 'area'), attrs=('href'), canonicalize=False, unique=True, process_value=None, deny_extensions=None, restrict_css=(), strip=True, restrict_text=None): (source) ¶

Undocumented

def extract_links(self, response): (source) ¶

Returns a list of :class:`~scrapy.link.Link` objects from the specified :class:`response <scrapy.http.Response>`. Only links that match the settings passed to the ``__init__`` method of the link extractor are returned. Duplicate links are omitted if the ``unique`` attribute is set to ``True``, otherwise they are returned.

def matches(self, url): (source) ¶

Undocumented

allow_domains = (source) ¶

Undocumented

allow_res = (source) ¶

Undocumented

canonicalize = (source) ¶

Undocumented

deny_domains = (source) ¶

Undocumented

deny_extensions = (source) ¶

Undocumented