class documentation

class LxmlParserLinkExtractor: (source)

View In Hierarchy

Undocumented

Method __init__ Undocumented
Method extract_links Undocumented
Instance Variable link_key Undocumented
Instance Variable process_attr Undocumented
Instance Variable scan_attr Undocumented
Instance Variable scan_tag Undocumented
Instance Variable strip Undocumented
Instance Variable unique Undocumented
Method _deduplicate_if_needed Undocumented
Method _extract_links Undocumented
Method _iter_links Undocumented
Method _process_links Normalize and filter extracted links
def __init__(self, tag='a', attr='href', process=None, unique=False, strip=True, canonicalized=False): (source)

Undocumented

def extract_links(self, response): (source)

Undocumented

link_key = (source)

Undocumented

process_attr = (source)

Undocumented

scan_attr = (source)

Undocumented

scan_tag = (source)

Undocumented

Undocumented

Undocumented

def _deduplicate_if_needed(self, links): (source)

Undocumented

def _extract_links(self, selector, response_url, response_encoding, base_url): (source)

Undocumented

def _iter_links(self, document): (source)

Undocumented

def _process_links(self, links): (source)

Normalize and filter extracted links The subclass should override it if necessary