class documentation

class HTMLExtractorExtra(HTMLExtractor): (source)

View In Hierarchy

Override HTMLExtractor and create etree Elements for any elements which should have content parsed as Markdown.

Method __init__ Undocumented
Method close Handle any buffered data.
Method get_element Return element from treebuilder and reset treebuilder for later use.
Method get_state Return state from tag and `markdown` attr. One of 'block', 'span', or 'off'.
Method handle_data Undocumented
Method handle_empty_tag Handle empty tags (`<data>`).
Method handle_endtag Undocumented
Method handle_startendtag Undocumented
Method handle_starttag Undocumented
Method parse_html_declaration Undocumented
Method parse_pi Undocumented
Method reset Reset this instance. Loses all unprocessed data.
Instance Variable block_level_tags Undocumented
Instance Variable block_tags Undocumented
Instance Variable intail Undocumented
Instance Variable mdstack Undocumented
Instance Variable mdstate Undocumented
Instance Variable raw_tags Undocumented
Instance Variable span_and_blocks_tags Undocumented
Instance Variable span_tags Undocumented
Instance Variable treebuilder Undocumented

Inherited from HTMLExtractor:

Method at_line_start Returns True if current position is at start of line.
Method get_endtag_text Returns the text of the end tag.
Method get_starttag_text Return full source of start tag: '<...>'.
Method handle_charref Undocumented
Method handle_comment Undocumented
Method handle_decl Undocumented
Method handle_entityref Undocumented
Method handle_pi Undocumented
Method parse_starttag Undocumented
Method unknown_decl Undocumented
Instance Variable cleandoc Undocumented
Instance Variable empty_tags Undocumented
Instance Variable inraw Undocumented
Instance Variable lasttag Undocumented
Instance Variable md Undocumented
Instance Variable stack Undocumented
Property line_offset Returns char index in self.rawdata for the start of the current line.
Instance Variable __starttag_text Undocumented
Instance Variable _cache Undocumented
def __init__(self, md, *args, **kwargs): (source)
def close(self): (source)

Handle any buffered data.

def get_element(self): (source)

Return element from treebuilder and reset treebuilder for later use.

def get_state(self, tag, attrs): (source)

Return state from tag and `markdown` attr. One of 'block', 'span', or 'off'.

def handle_data(self, data): (source)
def handle_empty_tag(self, data, is_block): (source)

Handle empty tags (`<data>`).

def handle_endtag(self, tag): (source)
def handle_startendtag(self, tag, attrs): (source)
def handle_starttag(self, tag, attrs): (source)
def parse_html_declaration(self, i): (source)
def parse_pi(self, i): (source)
def reset(self): (source)

Reset this instance. Loses all unprocessed data.

block_level_tags = (source)

Undocumented

block_tags = (source)

Undocumented

mdstack: list = (source)

Undocumented

mdstate: list = (source)

Undocumented

raw_tags = (source)

Undocumented

span_and_blocks_tags = (source)

Undocumented

span_tags = (source)

Undocumented

treebuilder = (source)

Undocumented