class HTMLParserTreeBuilder(HTMLTreeBuilder): (source)
A Beautiful soup `TreeBuilder` that uses the `HTMLParser` parser, found in the Python standard library.
Method | __init__ |
Constructor. |
Method | feed |
Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup. |
Method | prepare |
Run any preliminary steps necessary to make incoming markup acceptable to the parser. |
Constant | TRACKS |
Undocumented |
Class Variable | features |
Undocumented |
Class Variable | is |
Undocumented |
Class Variable | picklable |
Undocumented |
Instance Variable | parser |
Undocumented |
Inherited from HTMLTreeBuilder
:
Method | set |
Replace the declared encoding in a <meta> tag with a placeholder, to be substituted when the tag is output to a string. |
Constant | DEFAULT |
Undocumented |
Constant | DEFAULT |
Undocumented |
Constant | DEFAULT |
Undocumented |
Class Variable | block |
Undocumented |
Class Variable | empty |
Undocumented |
Inherited from TreeBuilder
(via HTMLTreeBuilder
):
Method | can |
Might a tag with this name be an empty-element tag? |
Method | initialize |
The BeautifulSoup object has been initialized and is now being associated with the TreeBuilder. |
Method | reset |
Do any work necessary to reset the underlying parser for a new document. |
Method | test |
Wrap an HTML fragment to make it look like a document. |
Constant | ALTERNATE |
Undocumented |
Constant | NAME |
Undocumented |
Constant | USE |
Undocumented |
Instance Variable | cdata |
Undocumented |
Instance Variable | preserve |
Undocumented |
Instance Variable | soup |
Undocumented |
Instance Variable | store |
Undocumented |
Instance Variable | string |
Undocumented |
Method | _replace |
When an attribute value is associated with a tag that can have multiple values for that attribute, convert the string value to a list of strings. |
bs4.builder.TreeBuilder.__init__
Constructor. :param parser_args: Positional arguments to pass into the BeautifulSoupHTMLParser constructor, once it's invoked. :param parser_kwargs: Keyword arguments to pass into the BeautifulSoupHTMLParser constructor, once it's invoked. :param kwargs: Keyword arguments for the superclass constructor.
bs4.builder.TreeBuilder.feed
Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup.
bs4.builder.TreeBuilder.prepare_markup
Run any preliminary steps necessary to make incoming markup acceptable to the parser. :param markup: Some markup -- probably a bytestring. :param user_specified_encoding: The user asked to try this encoding. :param document_declared_encoding: The markup itself claims to be in this encoding. :param exclude_encodings: The user asked _not_ to try any of these encodings. :yield: A series of 4-tuples: (markup, encoding, declared encoding, has undergone character replacement) Each 4-tuple represents a strategy for converting the document to Unicode and parsing it. Each strategy will be tried in turn.