class HTML5TreeBuilder(HTMLTreeBuilder): (source)
Use html5lib to build a tree. Note that this TreeBuilder does not support some features common to HTML TreeBuilders. Some of these features could theoretically be implemented, but at the very least it's quite difficult, because html5lib moves the parse tree around as it's being built. * This TreeBuilder doesn't use different subclasses of NavigableString based on the name of the tag in which the string was found. * You can't use a SoupStrainer to parse only part of a document.
Method | create |
Undocumented |
Method | feed |
Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup. |
Method | prepare |
Run any preliminary steps necessary to make incoming markup acceptable to the parser. |
Method | test |
See `TreeBuilder`. |
Constant | NAME |
Undocumented |
Constant | TRACKS |
Undocumented |
Class Variable | features |
Undocumented |
Instance Variable | underlying |
Undocumented |
Instance Variable | user |
Undocumented |
Inherited from HTMLTreeBuilder
:
Method | set |
Replace the declared encoding in a <meta> tag with a placeholder, to be substituted when the tag is output to a string. |
Constant | DEFAULT |
Undocumented |
Constant | DEFAULT |
Undocumented |
Constant | DEFAULT |
Undocumented |
Class Variable | block |
Undocumented |
Class Variable | empty |
Undocumented |
Inherited from TreeBuilder
(via HTMLTreeBuilder
):
Method | __init__ |
Constructor. |
Method | can |
Might a tag with this name be an empty-element tag? |
Method | initialize |
The BeautifulSoup object has been initialized and is now being associated with the TreeBuilder. |
Method | reset |
Do any work necessary to reset the underlying parser for a new document. |
Constant | ALTERNATE |
Undocumented |
Constant | USE |
Undocumented |
Class Variable | is |
Undocumented |
Class Variable | picklable |
Undocumented |
Instance Variable | cdata |
Undocumented |
Instance Variable | preserve |
Undocumented |
Instance Variable | soup |
Undocumented |
Instance Variable | store |
Undocumented |
Instance Variable | string |
Undocumented |
Method | _replace |
When an attribute value is associated with a tag that can have multiple values for that attribute, convert the string value to a list of strings. |
bs4.builder.TreeBuilder.feed
Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup. This method is not implemented in TreeBuilder; it must be implemented in subclasses. :return: None.
bs4.builder.TreeBuilder.prepare_markup
Run any preliminary steps necessary to make incoming markup acceptable to the parser. :param markup: Some markup -- probably a bytestring. :param user_specified_encoding: The user asked to try this encoding. :param document_declared_encoding: The markup itself claims to be in this encoding. NOTE: This argument is not used by the calling code and can probably be removed. :param exclude_encodings: The user asked _not_ to try any of these encodings. :yield: A series of 4-tuples: (markup, encoding, declared encoding, has undergone character replacement) Each 4-tuple represents a strategy for converting the document to Unicode and parsing it. Each strategy will be tried in turn. By default, the only strategy is to parse the markup as-is. See `LXMLTreeBuilderForXML` and `HTMLParserTreeBuilder` for implementations that take into account the quirks of particular parsers.