class documentation
class HTMLExtractor(htmlparser.HTMLParser): (source)
Known subclasses: markdown.extensions.md_in_html.HTMLExtractorExtra
Extract raw HTML from text. The raw HTML is stored in the `htmlStash` of the Markdown instance passed to `md` and the remaining text is stored in `cleandoc` as a list of strings.
Method | __init__ |
Undocumented |
Method | at |
Returns True if current position is at start of line. |
Method | close |
Handle any buffered data. |
Method | get |
Returns the text of the end tag. |
Method | get |
Return full source of start tag: '<...>'. |
Method | handle |
Undocumented |
Method | handle |
Undocumented |
Method | handle |
Undocumented |
Method | handle |
Undocumented |
Method | handle |
Handle empty tags (`<data>`). |
Method | handle |
Undocumented |
Method | handle |
Undocumented |
Method | handle |
Undocumented |
Method | handle |
Undocumented |
Method | handle |
Undocumented |
Method | parse |
Undocumented |
Method | parse |
Undocumented |
Method | parse |
Undocumented |
Method | reset |
Reset this instance. Loses all unprocessed data. |
Method | unknown |
Undocumented |
Instance Variable | cleandoc |
Undocumented |
Instance Variable | empty |
Undocumented |
Instance Variable | inraw |
Undocumented |
Instance Variable | intail |
Undocumented |
Instance Variable | lasttag |
Undocumented |
Instance Variable | md |
Undocumented |
Instance Variable | stack |
Undocumented |
Property | line |
Returns char index in self.rawdata for the start of the current line. |
Instance Variable | __starttag |
Undocumented |
Instance Variable | _cache |
Undocumented |
Returns True if current position is at start of line. Allows for up to three blank spaces at start of line.
Returns the text of the end tag. If it fails to extract the actual text from the raw data, it builds a closing tag with `tag`.
overridden in
markdown.extensions.md_in_html.HTMLExtractorExtra
Reset this instance. Loses all unprocessed data.