class documentation
class HTMLExtractorExtra(HTMLExtractor): (source)
Override HTMLExtractor and create etree Elements for any elements which should have content parsed as Markdown.
Method | __init__ |
Undocumented |
Method | close |
Handle any buffered data. |
Method | get |
Return element from treebuilder and reset treebuilder for later use. |
Method | get |
Return state from tag and `markdown` attr. One of 'block', 'span', or 'off'. |
Method | handle |
Undocumented |
Method | handle |
Handle empty tags (`<data>`). |
Method | handle |
Undocumented |
Method | handle |
Undocumented |
Method | handle |
Undocumented |
Method | parse |
Undocumented |
Method | parse |
Undocumented |
Method | reset |
Reset this instance. Loses all unprocessed data. |
Instance Variable | block |
Undocumented |
Instance Variable | block |
Undocumented |
Instance Variable | intail |
Undocumented |
Instance Variable | mdstack |
Undocumented |
Instance Variable | mdstate |
Undocumented |
Instance Variable | raw |
Undocumented |
Instance Variable | span |
Undocumented |
Instance Variable | span |
Undocumented |
Instance Variable | treebuilder |
Undocumented |
Inherited from HTMLExtractor
:
Method | at |
Returns True if current position is at start of line. |
Method | get |
Returns the text of the end tag. |
Method | get |
Return full source of start tag: '<...>'. |
Method | handle |
Undocumented |
Method | handle |
Undocumented |
Method | handle |
Undocumented |
Method | handle |
Undocumented |
Method | handle |
Undocumented |
Method | parse |
Undocumented |
Method | unknown |
Undocumented |
Instance Variable | cleandoc |
Undocumented |
Instance Variable | empty |
Undocumented |
Instance Variable | inraw |
Undocumented |
Instance Variable | lasttag |
Undocumented |
Instance Variable | md |
Undocumented |
Instance Variable | stack |
Undocumented |
Property | line |
Returns char index in self.rawdata for the start of the current line. |
Instance Variable | __starttag |
Undocumented |
Instance Variable | _cache |
Undocumented |