class LXMLTreeBuilderForXML(TreeBuilder): (source)
Known subclasses: bs4.builder._lxml.LXMLTreeBuilder
Undocumented
Method | __init__ |
Constructor. |
Method | close |
Undocumented |
Method | comment |
Handle comments as Comment objects. |
Method | data |
Undocumented |
Method | default |
Find the default parser for the given encoding. |
Method | doctype |
Undocumented |
Method | end |
Undocumented |
Method | feed |
Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup. |
Method | initialize |
Let the BeautifulSoup object know about the standard namespace mapping. |
Method | parser |
Instantiate an appropriate parser for the given encoding. |
Method | pi |
Undocumented |
Method | prepare |
Run any preliminary steps necessary to make incoming markup acceptable to the parser. |
Method | start |
Undocumented |
Method | test |
See `TreeBuilder`. |
Constant | ALTERNATE |
Undocumented |
Constant | CHUNK |
Undocumented |
Constant | DEFAULT |
Undocumented |
Constant | DEFAULT |
Undocumented |
Constant | NAME |
Undocumented |
Class Variable | features |
Undocumented |
Class Variable | is |
Undocumented |
Instance Variable | active |
Undocumented |
Instance Variable | empty |
Undocumented |
Instance Variable | nsmaps |
Undocumented |
Instance Variable | parser |
Undocumented |
Instance Variable | processing |
Undocumented |
Instance Variable | soup |
Undocumented |
Method | _get |
Undocumented |
Method | _prefix |
Find the currently active prefix for the given namespace. |
Method | _register |
Let the BeautifulSoup object know about namespaces encountered while parsing the document. |
Instance Variable | _default |
Undocumented |
Inherited from TreeBuilder
:
Method | can |
Might a tag with this name be an empty-element tag? |
Method | reset |
Do any work necessary to reset the underlying parser for a new document. |
Method | set |
Set up any substitutions that will need to be performed on a `Tag` when it's output as a string. |
Constant | DEFAULT |
Undocumented |
Constant | DEFAULT |
Undocumented |
Constant | DEFAULT |
Undocumented |
Constant | TRACKS |
Undocumented |
Constant | USE |
Undocumented |
Class Variable | picklable |
Undocumented |
Instance Variable | cdata |
Undocumented |
Instance Variable | preserve |
Undocumented |
Instance Variable | store |
Undocumented |
Instance Variable | string |
Undocumented |
Method | _replace |
When an attribute value is associated with a tag that can have multiple values for that attribute, convert the string value to a list of strings. |
bs4.builder.TreeBuilder.__init__
Constructor. :param multi_valued_attributes: If this is set to None, the TreeBuilder will not turn any values for attributes like 'class' into lists. Setting this to a dictionary will customize this behavior; look at DEFAULT_CDATA_LIST_ATTRIBUTES for an example. Internally, these are called "CDATA list attributes", but that probably doesn't make sense to an end-user, so the argument name is `multi_valued_attributes`. :param preserve_whitespace_tags: A list of tags to treat the way <pre> tags are treated in HTML. Tags in this list are immune from pretty-printing; their contents will always be output as-is. :param string_containers: A dictionary mapping tag names to the classes that should be instantiated to contain the textual contents of those tags. The default is to use NavigableString for every tag, no matter what the name. You can override the default by changing DEFAULT_STRING_CONTAINERS. :param store_line_numbers: If the parser keeps track of the line numbers and positions of the original markup, that information will, by default, be stored in each corresponding `Tag` object. You can turn this off by passing store_line_numbers=False. If the parser you're using doesn't keep track of this information, then setting store_line_numbers=True will do nothing.
bs4.builder._lxml.LXMLTreeBuilder
Find the default parser for the given encoding. :param encoding: A string. :return: Either a parser object or a class, which will be instantiated with default arguments.
bs4.builder.TreeBuilder.feed
bs4.builder._lxml.LXMLTreeBuilder
Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup. This method is not implemented in TreeBuilder; it must be implemented in subclasses. :return: None.
bs4.builder.TreeBuilder.initialize_soup
Let the BeautifulSoup object know about the standard namespace mapping. :param soup: A `BeautifulSoup`.
Instantiate an appropriate parser for the given encoding. :param encoding: A string. :return: A parser object such as an `etree.XMLParser`.
bs4.builder.TreeBuilder.prepare_markup
Run any preliminary steps necessary to make incoming markup acceptable to the parser. lxml really wants to get a bytestring and convert it to Unicode itself. So instead of using UnicodeDammit to convert the bytestring to Unicode using different encodings, this implementation uses EncodingDetector to iterate over the encodings, and tell lxml to try to parse the document as each one in turn. :param markup: Some markup -- hopefully a bytestring. :param user_specified_encoding: The user asked to try this encoding. :param document_declared_encoding: The markup itself claims to be in this encoding. :param exclude_encodings: The user asked _not_ to try any of these encodings. :yield: A series of 4-tuples: (markup, encoding, declared encoding, has undergone character replacement) Each 4-tuple represents a strategy for converting the document to Unicode and parsing it. Each strategy will be tried in turn.
bs4.builder.TreeBuilder.ALTERNATE_NAMES
bs4.builder._lxml.LXMLTreeBuilder
Undocumented
Value |
|
bs4.builder.TreeBuilder.features
bs4.builder._lxml.LXMLTreeBuilder
Undocumented
bs4.builder.TreeBuilder.is_xml
bs4.builder._lxml.LXMLTreeBuilder
Undocumented
Let the BeautifulSoup object know about namespaces encountered while parsing the document. This might be useful later on when creating CSS selectors. This will track (almost) all namespaces, even ones that were only in scope for part of the document. If two namespaces have the same prefix, only the first one encountered will be tracked. Un-prefixed namespaces are not tracked. :param mapping: A dictionary mapping namespace prefixes to URIs.