class documentation

class HTMLParserTreeBuilder(HTMLTreeBuilder): (source)

View In Hierarchy

A Beautiful soup `TreeBuilder` that uses the `HTMLParser` parser, found in the Python standard library.

Method __init__ Constructor.
Method feed Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup.
Method prepare_markup Run any preliminary steps necessary to make incoming markup acceptable to the parser.
Constant TRACKS_LINE_NUMBERS Undocumented
Class Variable features Undocumented
Class Variable is_xml Undocumented
Class Variable picklable Undocumented
Instance Variable parser_args Undocumented

Inherited from HTMLTreeBuilder:

Method set_up_substitutions Replace the declared encoding in a <meta> tag with a placeholder, to be substituted when the tag is output to a string.
Constant DEFAULT_CDATA_LIST_ATTRIBUTES Undocumented
Constant DEFAULT_PRESERVE_WHITESPACE_TAGS Undocumented
Constant DEFAULT_STRING_CONTAINERS Undocumented
Class Variable block_elements Undocumented
Class Variable empty_element_tags Undocumented

Inherited from TreeBuilder (via HTMLTreeBuilder):

Method can_be_empty_element Might a tag with this name be an empty-element tag?
Method initialize_soup The BeautifulSoup object has been initialized and is now being associated with the TreeBuilder.
Method reset Do any work necessary to reset the underlying parser for a new document.
Method test_fragment_to_document Wrap an HTML fragment to make it look like a document.
Constant ALTERNATE_NAMES Undocumented
Constant NAME Undocumented
Constant USE_DEFAULT Undocumented
Instance Variable cdata_list_attributes Undocumented
Instance Variable preserve_whitespace_tags Undocumented
Instance Variable soup Undocumented
Instance Variable store_line_numbers Undocumented
Instance Variable string_containers Undocumented
Method _replace_cdata_list_attribute_values When an attribute value is associated with a tag that can have multiple values for that attribute, convert the string value to a list of strings.
def __init__(self, parser_args=None, parser_kwargs=None, **kwargs): (source)

Constructor. :param parser_args: Positional arguments to pass into the BeautifulSoupHTMLParser constructor, once it's invoked. :param parser_kwargs: Keyword arguments to pass into the BeautifulSoupHTMLParser constructor, once it's invoked. :param kwargs: Keyword arguments for the superclass constructor.

def feed(self, markup): (source)

Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup.

def prepare_markup(self, markup, user_specified_encoding=None, document_declared_encoding=None, exclude_encodings=None): (source)

Run any preliminary steps necessary to make incoming markup acceptable to the parser. :param markup: Some markup -- probably a bytestring. :param user_specified_encoding: The user asked to try this encoding. :param document_declared_encoding: The markup itself claims to be in this encoding. :param exclude_encodings: The user asked _not_ to try any of these encodings. :yield: A series of 4-tuples: (markup, encoding, declared encoding, has undergone character replacement) Each 4-tuple represents a strategy for converting the document to Unicode and parsing it. Each strategy will be tried in turn.

TRACKS_LINE_NUMBERS: bool = (source)
features = (source)

Undocumented

parser_args = (source)

Undocumented