bs4.builder._lxml.LXMLTreeBuilderForXML

class documentation

class LXMLTreeBuilderForXML(TreeBuilder): (source)

Known subclasses: bs4.builder._lxml.LXMLTreeBuilder

Undocumented

Method	`__init__`	Constructor.
Method	`close`	Undocumented
Method	`comment`	Handle comments as Comment objects.
Method	`data`	Undocumented
Method	`default_parser`	Find the default parser for the given encoding.
Method	`doctype`	Undocumented
Method	`end`	Undocumented
Method	`feed`	Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup.
Method	`initialize_soup`	Let the BeautifulSoup object know about the standard namespace mapping.
Method	`parser_for`	Instantiate an appropriate parser for the given encoding.
Method	`pi`	Undocumented
Method	`prepare_markup`	Run any preliminary steps necessary to make incoming markup acceptable to the parser.
Method	`start`	Undocumented
Method	`test_fragment_to_document`	See `TreeBuilder`.
Constant	`ALTERNATE_NAMES`	Undocumented
Constant	`CHUNK_SIZE`	Undocumented
Constant	`DEFAULT_NSMAPS`	Undocumented
Constant	`DEFAULT_NSMAPS_INVERTED`	Undocumented
Constant	`NAME`	Undocumented
Class Variable	`features`	Undocumented
Class Variable	`is_xml`	Undocumented
Instance Variable	`active_namespace_prefixes`	Undocumented
Instance Variable	`empty_element_tags`	Undocumented
Instance Variable	`nsmaps`	Undocumented
Instance Variable	`parser`	Undocumented
Instance Variable	`processing_instruction_class`	Undocumented
Instance Variable	`soup`	Undocumented
Method	`_getNsTag`	Undocumented
Method	`_prefix_for_namespace`	Find the currently active prefix for the given namespace.
Method	`_register_namespaces`	Let the BeautifulSoup object know about namespaces encountered while parsing the document.
Instance Variable	`_default_parser`	Undocumented

Inherited from TreeBuilder:

Method	`can_be_empty_element`	Might a tag with this name be an empty-element tag?
Method	`reset`	Do any work necessary to reset the underlying parser for a new document.
Method	`set_up_substitutions`	Set up any substitutions that will need to be performed on a `Tag` when it's output as a string.
Constant	`DEFAULT_CDATA_LIST_ATTRIBUTES`	Undocumented
Constant	`DEFAULT_PRESERVE_WHITESPACE_TAGS`	Undocumented
Constant	`DEFAULT_STRING_CONTAINERS`	Undocumented
Constant	`TRACKS_LINE_NUMBERS`	Undocumented
Constant	`USE_DEFAULT`	Undocumented
Class Variable	`picklable`	Undocumented
Instance Variable	`cdata_list_attributes`	Undocumented
Instance Variable	`preserve_whitespace_tags`	Undocumented
Instance Variable	`store_line_numbers`	Undocumented
Instance Variable	`string_containers`	Undocumented
Method	`_replace_cdata_list_attribute_values`	When an attribute value is associated with a tag that can have multiple values for that attribute, convert the string value to a list of strings.

def __init__(self, parser=None, empty_element_tags=None, **kwargs): (source) ¶

overrides bs4.builder.TreeBuilder.__init__

Constructor. :param multi_valued_attributes: If this is set to None, the TreeBuilder will not turn any values for attributes like 'class' into lists. Setting this to a dictionary will customize this behavior; look at DEFAULT_CDATA_LIST_ATTRIBUTES for an example. Internally, these are called "CDATA list attributes", but that probably doesn't make sense to an end-user, so the argument name is `multi_valued_attributes`. :param preserve_whitespace_tags: A list of tags to treat the way <pre> tags are treated in HTML. Tags in this list are immune from pretty-printing; their contents will always be output as-is. :param string_containers: A dictionary mapping tag names to the classes that should be instantiated to contain the textual contents of those tags. The default is to use NavigableString for every tag, no matter what the name. You can override the default by changing DEFAULT_STRING_CONTAINERS. :param store_line_numbers: If the parser keeps track of the line numbers and positions of the original markup, that information will, by default, be stored in each corresponding `Tag` object. You can turn this off by passing store_line_numbers=False. If the parser you're using doesn't keep track of this information, then setting store_line_numbers=True will do nothing.

def close(self): (source) ¶

Undocumented

def comment(self, content): (source) ¶

Handle comments as Comment objects.

def data(self, content): (source) ¶

Undocumented

def default_parser(self, encoding): (source) ¶

overridden in bs4.builder._lxml.LXMLTreeBuilder

Find the default parser for the given encoding. :param encoding: A string. :return: Either a parser object or a class, which will be instantiated with default arguments.

def doctype(self, name, pubid, system): (source) ¶

Undocumented

def end(self, name): (source) ¶

Undocumented

def feed(self, markup): (source) ¶

overrides bs4.builder.TreeBuilder.feed

overridden in bs4.builder._lxml.LXMLTreeBuilder

Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup. This method is not implemented in TreeBuilder; it must be implemented in subclasses. :return: None.

def initialize_soup(self, soup): (source) ¶

overrides bs4.builder.TreeBuilder.initialize_soup

Let the BeautifulSoup object know about the standard namespace mapping. :param soup: A `BeautifulSoup`.

def parser_for(self, encoding): (source) ¶

Instantiate an appropriate parser for the given encoding. :param encoding: A string. :return: A parser object such as an `etree.XMLParser`.

def pi(self, target, data): (source) ¶

Undocumented

def prepare_markup(self, markup, user_specified_encoding=None, exclude_encodings=None, document_declared_encoding=None): (source) ¶

overrides bs4.builder.TreeBuilder.prepare_markup

Run any preliminary steps necessary to make incoming markup acceptable to the parser. lxml really wants to get a bytestring and convert it to Unicode itself. So instead of using UnicodeDammit to convert the bytestring to Unicode using different encodings, this implementation uses EncodingDetector to iterate over the encodings, and tell lxml to try to parse the document as each one in turn. :param markup: Some markup -- hopefully a bytestring. :param user_specified_encoding: The user asked to try this encoding. :param document_declared_encoding: The markup itself claims to be in this encoding. :param exclude_encodings: The user asked _not_ to try any of these encodings. :yield: A series of 4-tuples: (markup, encoding, declared encoding, has undergone character replacement) Each 4-tuple represents a strategy for converting the document to Unicode and parsing it. Each strategy will be tried in turn.

def start(self, name, attrs, nsmap={}): (source) ¶

Undocumented

def test_fragment_to_document(self, fragment): (source) ¶

overrides bs4.builder.TreeBuilder.test_fragment_to_document

overridden in bs4.builder._lxml.LXMLTreeBuilder

See `TreeBuilder`.

ALTERNATE_NAMES: list[str] = (source) ¶

overrides bs4.builder.TreeBuilder.ALTERNATE_NAMES

overridden in bs4.builder._lxml.LXMLTreeBuilder

Undocumented

Value

['xml']

CHUNK_SIZE: int = (source) ¶

Undocumented

Value

DEFAULT_NSMAPS = (source) ¶

Undocumented

Value

dict(xml='http://www.w3.org/XML/1998/namespace')

DEFAULT_NSMAPS_INVERTED = (source) ¶

Undocumented

Value

_invert(DEFAULT_NSMAPS)

NAME: str = (source) ¶

overrides bs4.builder.TreeBuilder.NAME

Undocumented

Value

'lxml-xml'

features = (source) ¶

overrides bs4.builder.TreeBuilder.features

overridden in bs4.builder._lxml.LXMLTreeBuilder

Undocumented

is_xml: bool = (source) ¶

overrides bs4.builder.TreeBuilder.is_xml

overridden in bs4.builder._lxml.LXMLTreeBuilder

Undocumented

active_namespace_prefixes = (source) ¶

Undocumented

empty_element_tags = (source) ¶

overrides bs4.builder.TreeBuilder.empty_element_tags

Undocumented

nsmaps = (source) ¶

Undocumented

parser = (source) ¶

overridden in bs4.builder._lxml.LXMLTreeBuilder

Undocumented

processing_instruction_class = (source) ¶

Undocumented

soup = (source) ¶

overrides bs4.builder.TreeBuilder.soup

Undocumented

def _getNsTag(self, tag): (source) ¶

Undocumented

def _prefix_for_namespace(self, namespace): (source) ¶

Find the currently active prefix for the given namespace.

def _register_namespaces(self, mapping): (source) ¶

Let the BeautifulSoup object know about namespaces encountered while parsing the document. This might be useful later on when creating CSS selectors. This will track (almost) all namespaces, even ones that were only in scope for part of the document. If two namespaces have the same prefix, only the first one encountered will be tracked. Un-prefixed namespaces are not tracked. :param mapping: A dictionary mapping namespace prefixes to URIs.

_default_parser = (source) ¶

Undocumented