class documentation

class HTML5TreeBuilder(HTMLTreeBuilder): (source)

View In Hierarchy

Use html5lib to build a tree. Note that this TreeBuilder does not support some features common to HTML TreeBuilders. Some of these features could theoretically be implemented, but at the very least it's quite difficult, because html5lib moves the parse tree around as it's being built. * This TreeBuilder doesn't use different subclasses of NavigableString based on the name of the tag in which the string was found. * You can't use a SoupStrainer to parse only part of a document.

Method create_treebuilder Undocumented
Method feed Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup.
Method prepare_markup Run any preliminary steps necessary to make incoming markup acceptable to the parser.
Method test_fragment_to_document See `TreeBuilder`.
Constant NAME Undocumented
Constant TRACKS_LINE_NUMBERS Undocumented
Class Variable features Undocumented
Instance Variable underlying_builder Undocumented
Instance Variable user_specified_encoding Undocumented

Inherited from HTMLTreeBuilder:

Method set_up_substitutions Replace the declared encoding in a <meta> tag with a placeholder, to be substituted when the tag is output to a string.
Constant DEFAULT_CDATA_LIST_ATTRIBUTES Undocumented
Constant DEFAULT_PRESERVE_WHITESPACE_TAGS Undocumented
Constant DEFAULT_STRING_CONTAINERS Undocumented
Class Variable block_elements Undocumented
Class Variable empty_element_tags Undocumented

Inherited from TreeBuilder (via HTMLTreeBuilder):

Method __init__ Constructor.
Method can_be_empty_element Might a tag with this name be an empty-element tag?
Method initialize_soup The BeautifulSoup object has been initialized and is now being associated with the TreeBuilder.
Method reset Do any work necessary to reset the underlying parser for a new document.
Constant ALTERNATE_NAMES Undocumented
Constant USE_DEFAULT Undocumented
Class Variable is_xml Undocumented
Class Variable picklable Undocumented
Instance Variable cdata_list_attributes Undocumented
Instance Variable preserve_whitespace_tags Undocumented
Instance Variable soup Undocumented
Instance Variable store_line_numbers Undocumented
Instance Variable string_containers Undocumented
Method _replace_cdata_list_attribute_values When an attribute value is associated with a tag that can have multiple values for that attribute, convert the string value to a list of strings.
def create_treebuilder(self, namespaceHTMLElements): (source)

Undocumented

def feed(self, markup): (source)

Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup. This method is not implemented in TreeBuilder; it must be implemented in subclasses. :return: None.

def prepare_markup(self, markup, user_specified_encoding, document_declared_encoding=None, exclude_encodings=None): (source)

Run any preliminary steps necessary to make incoming markup acceptable to the parser. :param markup: Some markup -- probably a bytestring. :param user_specified_encoding: The user asked to try this encoding. :param document_declared_encoding: The markup itself claims to be in this encoding. NOTE: This argument is not used by the calling code and can probably be removed. :param exclude_encodings: The user asked _not_ to try any of these encodings. :yield: A series of 4-tuples: (markup, encoding, declared encoding, has undergone character replacement) Each 4-tuple represents a strategy for converting the document to Unicode and parsing it. Each strategy will be tried in turn. By default, the only strategy is to parse the markup as-is. See `LXMLTreeBuilderForXML` and `HTMLParserTreeBuilder` for implementations that take into account the quirks of particular parsers.

def test_fragment_to_document(self, fragment): (source)

Undocumented

Value
'html5lib'
TRACKS_LINE_NUMBERS: bool = (source)
features = (source)

Undocumented

underlying_builder = (source)

Undocumented

user_specified_encoding = (source)

Undocumented