class BeautifulSoupHTMLParser(HTMLParser, DetectsXMLParsedAsHTML): (source)
A subclass of the Python standard library's HTMLParser class, which listens for HTMLParser events and translates them into calls to Beautiful Soup's tree construction API.
Method | __init__ |
Constructor. |
Method | handle |
Handle a numeric character reference by converting it to the corresponding Unicode character and treating it as textual data. |
Method | handle |
Handle an HTML comment. |
Method | handle |
Handle some textual data that shows up between tags. |
Method | handle |
Handle a DOCTYPE declaration. |
Method | handle |
Handle a closing tag, e.g. '</tag>' |
Method | handle |
Handle a named entity reference by converting it to the corresponding Unicode character(s) and treating it as textual data. |
Method | handle |
Handle a processing instruction. |
Method | handle |
Handle an incoming empty-element tag. |
Method | handle |
Handle an opening tag, e.g. '<tag>' |
Method | unknown |
Handle a declaration of unknown type -- probably a CDATA block. |
Constant | IGNORE |
Undocumented |
Constant | REPLACE |
Undocumented |
Instance Variable | already |
Undocumented |
Instance Variable | on |
Undocumented |
Inherited from DetectsXMLParsedAsHTML
:
Class Method | warn |
Perform a check on some markup to see if it looks like XML that's not XHTML. If so, issue a warning. |
Constant | LOOKS |
Undocumented |
Constant | LOOKS |
Undocumented |
Constant | XML |
Undocumented |
Constant | XML |
Undocumented |
Class Method | _warn |
Issue a warning about XML being parsed as HTML. |
Method | _document |
Call this method when encountering an XML declaration, or a "processing instruction" that might be an XML declaration. |
Method | _initialize |
Call this method before parsing a document. |
Method | _root |
Call this when you encounter the document's root tag. |
Instance Variable | _first |
Undocumented |
Instance Variable | _root |
Undocumented |
Constructor. :param on_duplicate_attribute: A strategy for what to do if a tag includes the same attribute more than once. Accepted values are: REPLACE (replace earlier values with later ones, the default), IGNORE (keep the earliest value encountered), or a callable. A callable must take three arguments: the dictionary of attributes already processed, the name of the duplicate attribute, and the most recent value encountered.
Handle a numeric character reference by converting it to the corresponding Unicode character and treating it as textual data. :param name: Character number, possibly in hexadecimal.
Handle a closing tag, e.g. '</tag>' :param name: A tag name. :param check_already_closed: True if this tag is expected to be the closing portion of an empty-element tag, e.g. '<tag></tag>'.
Handle a named entity reference by converting it to the corresponding Unicode character(s) and treating it as textual data. :param name: Name of the entity reference.
Handle an incoming empty-element tag. This is only called when the markup looks like <tag/>. :param name: Name of the tag. :param attrs: Dictionary of the tag's attributes.
Handle an opening tag, e.g. '<tag>' :param name: Name of the tag. :param attrs: Dictionary of the tag's attributes. :param handle_empty_element: True if this tag is known to be an empty-element tag (i.e. there is not expected to be any closing tag).