class documentation

class BeautifulSoupHTMLParser(HTMLParser, DetectsXMLParsedAsHTML): (source)

View In Hierarchy

A subclass of the Python standard library's HTMLParser class, which listens for HTMLParser events and translates them into calls to Beautiful Soup's tree construction API.

Method __init__ Constructor.
Method handle_charref Handle a numeric character reference by converting it to the corresponding Unicode character and treating it as textual data.
Method handle_comment Handle an HTML comment.
Method handle_data Handle some textual data that shows up between tags.
Method handle_decl Handle a DOCTYPE declaration.
Method handle_endtag Handle a closing tag, e.g. '</tag>'
Method handle_entityref Handle a named entity reference by converting it to the corresponding Unicode character(s) and treating it as textual data.
Method handle_pi Handle a processing instruction.
Method handle_startendtag Handle an incoming empty-element tag.
Method handle_starttag Handle an opening tag, e.g. '<tag>'
Method unknown_decl Handle a declaration of unknown type -- probably a CDATA block.
Constant IGNORE Undocumented
Constant REPLACE Undocumented
Instance Variable already_closed_empty_element Undocumented
Instance Variable on_duplicate_attribute Undocumented

Inherited from DetectsXMLParsedAsHTML:

Class Method warn_if_markup_looks_like_xml Perform a check on some markup to see if it looks like XML that's not XHTML. If so, issue a warning.
Constant LOOKS_LIKE_HTML Undocumented
Constant LOOKS_LIKE_HTML_B Undocumented
Constant XML_PREFIX Undocumented
Constant XML_PREFIX_B Undocumented
Class Method _warn Issue a warning about XML being parsed as HTML.
Method _document_might_be_xml Call this method when encountering an XML declaration, or a "processing instruction" that might be an XML declaration.
Method _initialize_xml_detector Call this method before parsing a document.
Method _root_tag_encountered Call this when you encounter the document's root tag.
Instance Variable _first_processing_instruction Undocumented
Instance Variable _root_tag Undocumented
def __init__(self, *args, **kwargs): (source)

Constructor. :param on_duplicate_attribute: A strategy for what to do if a tag includes the same attribute more than once. Accepted values are: REPLACE (replace earlier values with later ones, the default), IGNORE (keep the earliest value encountered), or a callable. A callable must take three arguments: the dictionary of attributes already processed, the name of the duplicate attribute, and the most recent value encountered.

def handle_charref(self, name): (source)

Handle a numeric character reference by converting it to the corresponding Unicode character and treating it as textual data. :param name: Character number, possibly in hexadecimal.

def handle_comment(self, data): (source)

Handle an HTML comment. :param data: The text of the comment.

def handle_data(self, data): (source)

Handle some textual data that shows up between tags.

def handle_decl(self, data): (source)

Handle a DOCTYPE declaration. :param data: The text of the declaration.

def handle_endtag(self, name, check_already_closed=True): (source)

Handle a closing tag, e.g. '</tag>' :param name: A tag name. :param check_already_closed: True if this tag is expected to be the closing portion of an empty-element tag, e.g. '<tag></tag>'.

def handle_entityref(self, name): (source)

Handle a named entity reference by converting it to the corresponding Unicode character(s) and treating it as textual data. :param name: Name of the entity reference.

def handle_pi(self, data): (source)

Handle a processing instruction. :param data: The text of the instruction.

def handle_startendtag(self, name, attrs): (source)

Handle an incoming empty-element tag. This is only called when the markup looks like <tag/>. :param name: Name of the tag. :param attrs: Dictionary of the tag's attributes.

def handle_starttag(self, name, attrs, handle_empty_element=True): (source)

Handle an opening tag, e.g. '<tag>' :param name: Name of the tag. :param attrs: Dictionary of the tag's attributes. :param handle_empty_element: True if this tag is known to be an empty-element tag (i.e. there is not expected to be any closing tag).

def unknown_decl(self, data): (source)

Handle a declaration of unknown type -- probably a CDATA block. :param data: The text of the declaration.

Undocumented

Value
'ignore'

Undocumented

Value
'replace'
already_closed_empty_element: list = (source)

Undocumented

on_duplicate_attribute = (source)

Undocumented