bs4.builder._html5lib.HTML5TreeBuilder

class documentation

class HTML5TreeBuilder(HTMLTreeBuilder): (source)

Use html5lib to build a tree. Note that this TreeBuilder does not support some features common to HTML TreeBuilders. Some of these features could theoretically be implemented, but at the very least it's quite difficult, because html5lib moves the parse tree around as it's being built. * This TreeBuilder doesn't use different subclasses of NavigableString based on the name of the tag in which the string was found. * You can't use a SoupStrainer to parse only part of a document.

Method	`create_treebuilder`	Undocumented
Method	`feed`	Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup.
Method	`prepare_markup`	Run any preliminary steps necessary to make incoming markup acceptable to the parser.
Method	`test_fragment_to_document`	See `TreeBuilder`.
Constant	`NAME`	Undocumented
Constant	`TRACKS_LINE_NUMBERS`	Undocumented
Class Variable	`features`	Undocumented
Instance Variable	`underlying_builder`	Undocumented
Instance Variable	`user_specified_encoding`	Undocumented

Inherited from HTMLTreeBuilder:

Method	`set_up_substitutions`	Replace the declared encoding in a <meta> tag with a placeholder, to be substituted when the tag is output to a string.
Constant	`DEFAULT_CDATA_LIST_ATTRIBUTES`	Undocumented
Constant	`DEFAULT_PRESERVE_WHITESPACE_TAGS`	Undocumented
Constant	`DEFAULT_STRING_CONTAINERS`	Undocumented
Class Variable	`block_elements`	Undocumented
Class Variable	`empty_element_tags`	Undocumented

Inherited from TreeBuilder (via HTMLTreeBuilder):

Method	`__init__`	Constructor.
Method	`can_be_empty_element`	Might a tag with this name be an empty-element tag?
Method	`initialize_soup`	The BeautifulSoup object has been initialized and is now being associated with the TreeBuilder.
Method	`reset`	Do any work necessary to reset the underlying parser for a new document.
Constant	`ALTERNATE_NAMES`	Undocumented
Constant	`USE_DEFAULT`	Undocumented
Class Variable	`is_xml`	Undocumented
Class Variable	`picklable`	Undocumented
Instance Variable	`cdata_list_attributes`	Undocumented
Instance Variable	`preserve_whitespace_tags`	Undocumented
Instance Variable	`soup`	Undocumented
Instance Variable	`store_line_numbers`	Undocumented
Instance Variable	`string_containers`	Undocumented
Method	`_replace_cdata_list_attribute_values`	When an attribute value is associated with a tag that can have multiple values for that attribute, convert the string value to a list of strings.

def create_treebuilder(self, namespaceHTMLElements): (source) ¶

Undocumented

def feed(self, markup): (source) ¶

overrides bs4.builder.TreeBuilder.feed

Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup. This method is not implemented in TreeBuilder; it must be implemented in subclasses. :return: None.

def prepare_markup(self, markup, user_specified_encoding, document_declared_encoding=None, exclude_encodings=None): (source) ¶

overrides bs4.builder.TreeBuilder.prepare_markup

Run any preliminary steps necessary to make incoming markup acceptable to the parser. :param markup: Some markup -- probably a bytestring. :param user_specified_encoding: The user asked to try this encoding. :param document_declared_encoding: The markup itself claims to be in this encoding. NOTE: This argument is not used by the calling code and can probably be removed. :param exclude_encodings: The user asked _not_ to try any of these encodings. :yield: A series of 4-tuples: (markup, encoding, declared encoding, has undergone character replacement) Each 4-tuple represents a strategy for converting the document to Unicode and parsing it. Each strategy will be tried in turn. By default, the only strategy is to parse the markup as-is. See `LXMLTreeBuilderForXML` and `HTMLParserTreeBuilder` for implementations that take into account the quirks of particular parsers.

def test_fragment_to_document(self, fragment): (source) ¶

overrides bs4.builder.TreeBuilder.test_fragment_to_document

See `TreeBuilder`.

NAME: str = (source) ¶

overrides bs4.builder.TreeBuilder.NAME

Undocumented

Value