bs4.builder._htmlparser.HTMLParserTreeBuilder

class documentation

class HTMLParserTreeBuilder(HTMLTreeBuilder): (source)

A Beautiful soup `TreeBuilder` that uses the `HTMLParser` parser, found in the Python standard library.

Method	`__init__`	Constructor.
Method	`feed`	Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup.
Method	`prepare_markup`	Run any preliminary steps necessary to make incoming markup acceptable to the parser.
Constant	`TRACKS_LINE_NUMBERS`	Undocumented
Class Variable	`features`	Undocumented
Class Variable	`is_xml`	Undocumented
Class Variable	`picklable`	Undocumented
Instance Variable	`parser_args`	Undocumented

Inherited from HTMLTreeBuilder:

Method	`set_up_substitutions`	Replace the declared encoding in a <meta> tag with a placeholder, to be substituted when the tag is output to a string.
Constant	`DEFAULT_CDATA_LIST_ATTRIBUTES`	Undocumented
Constant	`DEFAULT_PRESERVE_WHITESPACE_TAGS`	Undocumented
Constant	`DEFAULT_STRING_CONTAINERS`	Undocumented
Class Variable	`block_elements`	Undocumented
Class Variable	`empty_element_tags`	Undocumented

Inherited from TreeBuilder (via HTMLTreeBuilder):

Method	`can_be_empty_element`	Might a tag with this name be an empty-element tag?
Method	`initialize_soup`	The BeautifulSoup object has been initialized and is now being associated with the TreeBuilder.
Method	`reset`	Do any work necessary to reset the underlying parser for a new document.
Method	`test_fragment_to_document`	Wrap an HTML fragment to make it look like a document.
Constant	`ALTERNATE_NAMES`	Undocumented
Constant	`NAME`	Undocumented
Constant	`USE_DEFAULT`	Undocumented
Instance Variable	`cdata_list_attributes`	Undocumented
Instance Variable	`preserve_whitespace_tags`	Undocumented
Instance Variable	`soup`	Undocumented
Instance Variable	`store_line_numbers`	Undocumented
Instance Variable	`string_containers`	Undocumented
Method	`_replace_cdata_list_attribute_values`	When an attribute value is associated with a tag that can have multiple values for that attribute, convert the string value to a list of strings.

def __init__(self, parser_args=None, parser_kwargs=None, **kwargs): (source) ¶

overrides bs4.builder.TreeBuilder.__init__

Constructor. :param parser_args: Positional arguments to pass into the BeautifulSoupHTMLParser constructor, once it's invoked. :param parser_kwargs: Keyword arguments to pass into the BeautifulSoupHTMLParser constructor, once it's invoked. :param kwargs: Keyword arguments for the superclass constructor.

def feed(self, markup): (source) ¶

overrides bs4.builder.TreeBuilder.feed

Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup.

def prepare_markup(self, markup, user_specified_encoding=None, document_declared_encoding=None, exclude_encodings=None): (source) ¶

overrides bs4.builder.TreeBuilder.prepare_markup

Run any preliminary steps necessary to make incoming markup acceptable to the parser. :param markup: Some markup -- probably a bytestring. :param user_specified_encoding: The user asked to try this encoding. :param document_declared_encoding: The markup itself claims to be in this encoding. :param exclude_encodings: The user asked _not_ to try any of these encodings. :yield: A series of 4-tuples: (markup, encoding, declared encoding, has undergone character replacement) Each 4-tuple represents a strategy for converting the document to Unicode and parsing it. Each strategy will be tried in turn.

TRACKS_LINE_NUMBERS: bool = (source) ¶

overrides bs4.builder.TreeBuilder.TRACKS_LINE_NUMBERS

Undocumented

Value

True

features = (source) ¶

overrides bs4.builder.TreeBuilder.features

Undocumented

is_xml: bool = (source) ¶

overrides bs4.builder.TreeBuilder.is_xml

Undocumented

picklable: bool = (source) ¶

overrides bs4.builder.TreeBuilder.picklable

Undocumented

parser_args = (source) ¶

Undocumented