class documentation

class HTMLExtractor(htmlparser.HTMLParser): (source)

Known subclasses: markdown.extensions.md_in_html.HTMLExtractorExtra

View In Hierarchy

Extract raw HTML from text. The raw HTML is stored in the `htmlStash` of the Markdown instance passed to `md` and the remaining text is stored in `cleandoc` as a list of strings.

Method __init__ Undocumented
Method at_line_start Returns True if current position is at start of line.
Method close Handle any buffered data.
Method get_endtag_text Returns the text of the end tag.
Method get_starttag_text Return full source of start tag: '<...>'.
Method handle_charref Undocumented
Method handle_comment Undocumented
Method handle_data Undocumented
Method handle_decl Undocumented
Method handle_empty_tag Handle empty tags (`<data>`).
Method handle_endtag Undocumented
Method handle_entityref Undocumented
Method handle_pi Undocumented
Method handle_startendtag Undocumented
Method handle_starttag Undocumented
Method parse_html_declaration Undocumented
Method parse_pi Undocumented
Method parse_starttag Undocumented
Method reset Reset this instance. Loses all unprocessed data.
Method unknown_decl Undocumented
Instance Variable cleandoc Undocumented
Instance Variable empty_tags Undocumented
Instance Variable inraw Undocumented
Instance Variable intail Undocumented
Instance Variable lasttag Undocumented
Instance Variable md Undocumented
Instance Variable stack Undocumented
Property line_offset Returns char index in self.rawdata for the start of the current line.
Instance Variable __starttag_text Undocumented
Instance Variable _cache Undocumented
def __init__(self, md, *args, **kwargs): (source)
def at_line_start(self): (source)

Returns True if current position is at start of line. Allows for up to three blank spaces at start of line.

def close(self): (source)

Handle any buffered data.

def get_endtag_text(self, tag): (source)

Returns the text of the end tag. If it fails to extract the actual text from the raw data, it builds a closing tag with `tag`.

def get_starttag_text(self): (source)

Return full source of start tag: '<...>'.

def handle_charref(self, name): (source)

Undocumented

def handle_comment(self, data): (source)

Undocumented

def handle_data(self, data): (source)
def handle_decl(self, data): (source)

Undocumented

def handle_empty_tag(self, data, is_block): (source)

Handle empty tags (`<data>`).

def handle_endtag(self, tag): (source)
def handle_entityref(self, name): (source)

Undocumented

def handle_pi(self, data): (source)

Undocumented

def handle_startendtag(self, tag, attrs): (source)
def handle_starttag(self, tag, attrs): (source)
def parse_html_declaration(self, i): (source)
def parse_pi(self, i): (source)
def parse_starttag(self, i): (source)

Undocumented

def reset(self): (source)

Reset this instance. Loses all unprocessed data.

def unknown_decl(self, data): (source)

Undocumented

cleandoc: list = (source)

Undocumented

empty_tags = (source)

Undocumented

inraw: bool = (source)

Undocumented

lasttag = (source)

Undocumented

Undocumented

stack: list = (source)

Undocumented

@property
line_offset = (source)

Returns char index in self.rawdata for the start of the current line.

__starttag_text = (source)

Undocumented

_cache: list = (source)

Undocumented