markdown.htmlparser.HTMLExtractor

class documentation

class HTMLExtractor(htmlparser.HTMLParser): (source)

Known subclasses: markdown.extensions.md_in_html.HTMLExtractorExtra

Extract raw HTML from text. The raw HTML is stored in the `htmlStash` of the Markdown instance passed to `md` and the remaining text is stored in `cleandoc` as a list of strings.

Method	`__init__`	Undocumented
Method	`at_line_start`	Returns True if current position is at start of line.
Method	`close`	Handle any buffered data.
Method	`get_endtag_text`	Returns the text of the end tag.
Method	`get_starttag_text`	Return full source of start tag: '<...>'.
Method	`handle_charref`	Undocumented
Method	`handle_comment`	Undocumented
Method	`handle_data`	Undocumented
Method	`handle_decl`	Undocumented
Method	`handle_empty_tag`	Handle empty tags (`<data>`).
Method	`handle_endtag`	Undocumented
Method	`handle_entityref`	Undocumented
Method	`handle_pi`	Undocumented
Method	`handle_startendtag`	Undocumented
Method	`handle_starttag`	Undocumented
Method	`parse_html_declaration`	Undocumented
Method	`parse_pi`	Undocumented
Method	`parse_starttag`	Undocumented
Method	`reset`	Reset this instance. Loses all unprocessed data.
Method	`unknown_decl`	Undocumented
Instance Variable	`cleandoc`	Undocumented
Instance Variable	`empty_tags`	Undocumented
Instance Variable	`inraw`	Undocumented
Instance Variable	`intail`	Undocumented
Instance Variable	`lasttag`	Undocumented
Instance Variable	`md`	Undocumented
Instance Variable	`stack`	Undocumented
Property	`line_offset`	Returns char index in self.rawdata for the start of the current line.
Instance Variable	`__starttag_text`	Undocumented
Instance Variable	`_cache`	Undocumented

def __init__(self, md, *args, **kwargs): (source)

overridden in markdown.extensions.md_in_html.HTMLExtractorExtra

Undocumented

def at_line_start(self): (source)

Returns True if current position is at start of line. Allows for up to three blank spaces at start of line.

def close(self): (source)

overridden in markdown.extensions.md_in_html.HTMLExtractorExtra

Handle any buffered data.

def get_endtag_text(self, tag): (source)

Returns the text of the end tag. If it fails to extract the actual text from the raw data, it builds a closing tag with `tag`.

def get_starttag_text(self): (source)

Return full source of start tag: '<...>'.

def handle_charref(self, name): (source)

Undocumented

def handle_comment(self, data): (source)

Undocumented

def handle_data(self, data): (source)

overridden in markdown.extensions.md_in_html.HTMLExtractorExtra

Undocumented

def handle_decl(self, data): (source)

Undocumented

def handle_empty_tag(self, data, is_block): (source)

overridden in markdown.extensions.md_in_html.HTMLExtractorExtra

Handle empty tags (`<data>`).

def handle_endtag(self, tag): (source)

overridden in markdown.extensions.md_in_html.HTMLExtractorExtra

Undocumented

def handle_entityref(self, name): (source)

Undocumented

def handle_pi(self, data): (source)

Undocumented

def handle_startendtag(self, tag, attrs): (source)

overridden in markdown.extensions.md_in_html.HTMLExtractorExtra

Undocumented

def handle_starttag(self, tag, attrs): (source)

overridden in markdown.extensions.md_in_html.HTMLExtractorExtra

Undocumented

def parse_html_declaration(self, i): (source)

overridden in markdown.extensions.md_in_html.HTMLExtractorExtra

Undocumented

def parse_pi(self, i): (source)

overridden in markdown.extensions.md_in_html.HTMLExtractorExtra

Undocumented

def parse_starttag(self, i): (source)

Undocumented

def reset(self): (source)

overridden in markdown.extensions.md_in_html.HTMLExtractorExtra

Reset this instance. Loses all unprocessed data.

def unknown_decl(self, data): (source)

Undocumented

cleandoc: list = (source)

Undocumented

empty_tags = (source)

Undocumented

inraw: bool = (source)

Undocumented

intail: bool = (source)

overridden in markdown.extensions.md_in_html.HTMLExtractorExtra

Undocumented

lasttag = (source)

Undocumented

md = (source)

Undocumented

stack: list = (source)

Undocumented

@property
line_offset = (source)

Returns char index in self.rawdata for the start of the current line.

__starttag_text = (source)

Undocumented

_cache: list = (source)

Undocumented