class documentation

class LXMLTreeBuilderForXML(TreeBuilder): (source)

Known subclasses: bs4.builder._lxml.LXMLTreeBuilder

View In Hierarchy

Undocumented

Method __init__ Constructor.
Method close Undocumented
Method comment Handle comments as Comment objects.
Method data Undocumented
Method default_parser Find the default parser for the given encoding.
Method doctype Undocumented
Method end Undocumented
Method feed Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup.
Method initialize_soup Let the BeautifulSoup object know about the standard namespace mapping.
Method parser_for Instantiate an appropriate parser for the given encoding.
Method pi Undocumented
Method prepare_markup Run any preliminary steps necessary to make incoming markup acceptable to the parser.
Method start Undocumented
Method test_fragment_to_document See `TreeBuilder`.
Constant ALTERNATE_NAMES Undocumented
Constant CHUNK_SIZE Undocumented
Constant DEFAULT_NSMAPS Undocumented
Constant DEFAULT_NSMAPS_INVERTED Undocumented
Constant NAME Undocumented
Class Variable features Undocumented
Class Variable is_xml Undocumented
Instance Variable active_namespace_prefixes Undocumented
Instance Variable empty_element_tags Undocumented
Instance Variable nsmaps Undocumented
Instance Variable parser Undocumented
Instance Variable processing_instruction_class Undocumented
Instance Variable soup Undocumented
Method _getNsTag Undocumented
Method _prefix_for_namespace Find the currently active prefix for the given namespace.
Method _register_namespaces Let the BeautifulSoup object know about namespaces encountered while parsing the document.
Instance Variable _default_parser Undocumented

Inherited from TreeBuilder:

Method can_be_empty_element Might a tag with this name be an empty-element tag?
Method reset Do any work necessary to reset the underlying parser for a new document.
Method set_up_substitutions Set up any substitutions that will need to be performed on a `Tag` when it's output as a string.
Constant DEFAULT_CDATA_LIST_ATTRIBUTES Undocumented
Constant DEFAULT_PRESERVE_WHITESPACE_TAGS Undocumented
Constant DEFAULT_STRING_CONTAINERS Undocumented
Constant TRACKS_LINE_NUMBERS Undocumented
Constant USE_DEFAULT Undocumented
Class Variable picklable Undocumented
Instance Variable cdata_list_attributes Undocumented
Instance Variable preserve_whitespace_tags Undocumented
Instance Variable store_line_numbers Undocumented
Instance Variable string_containers Undocumented
Method _replace_cdata_list_attribute_values When an attribute value is associated with a tag that can have multiple values for that attribute, convert the string value to a list of strings.
def __init__(self, parser=None, empty_element_tags=None, **kwargs): (source)

Constructor. :param multi_valued_attributes: If this is set to None, the TreeBuilder will not turn any values for attributes like 'class' into lists. Setting this to a dictionary will customize this behavior; look at DEFAULT_CDATA_LIST_ATTRIBUTES for an example. Internally, these are called "CDATA list attributes", but that probably doesn't make sense to an end-user, so the argument name is `multi_valued_attributes`. :param preserve_whitespace_tags: A list of tags to treat the way <pre> tags are treated in HTML. Tags in this list are immune from pretty-printing; their contents will always be output as-is. :param string_containers: A dictionary mapping tag names to the classes that should be instantiated to contain the textual contents of those tags. The default is to use NavigableString for every tag, no matter what the name. You can override the default by changing DEFAULT_STRING_CONTAINERS. :param store_line_numbers: If the parser keeps track of the line numbers and positions of the original markup, that information will, by default, be stored in each corresponding `Tag` object. You can turn this off by passing store_line_numbers=False. If the parser you're using doesn't keep track of this information, then setting store_line_numbers=True will do nothing.

def close(self): (source)

Undocumented

def comment(self, content): (source)

Handle comments as Comment objects.

def data(self, content): (source)

Undocumented

def default_parser(self, encoding): (source)

Find the default parser for the given encoding. :param encoding: A string. :return: Either a parser object or a class, which will be instantiated with default arguments.

def doctype(self, name, pubid, system): (source)

Undocumented

def end(self, name): (source)

Undocumented

def feed(self, markup): (source)

Run some incoming markup through some parsing process, populating the `BeautifulSoup` object in self.soup. This method is not implemented in TreeBuilder; it must be implemented in subclasses. :return: None.

def initialize_soup(self, soup): (source)

Let the BeautifulSoup object know about the standard namespace mapping. :param soup: A `BeautifulSoup`.

def parser_for(self, encoding): (source)

Instantiate an appropriate parser for the given encoding. :param encoding: A string. :return: A parser object such as an `etree.XMLParser`.

def pi(self, target, data): (source)

Undocumented

def prepare_markup(self, markup, user_specified_encoding=None, exclude_encodings=None, document_declared_encoding=None): (source)

Run any preliminary steps necessary to make incoming markup acceptable to the parser. lxml really wants to get a bytestring and convert it to Unicode itself. So instead of using UnicodeDammit to convert the bytestring to Unicode using different encodings, this implementation uses EncodingDetector to iterate over the encodings, and tell lxml to try to parse the document as each one in turn. :param markup: Some markup -- hopefully a bytestring. :param user_specified_encoding: The user asked to try this encoding. :param document_declared_encoding: The markup itself claims to be in this encoding. :param exclude_encodings: The user asked _not_ to try any of these encodings. :yield: A series of 4-tuples: (markup, encoding, declared encoding, has undergone character replacement) Each 4-tuple represents a strategy for converting the document to Unicode and parsing it. Each strategy will be tried in turn.

def start(self, name, attrs, nsmap={}): (source)

Undocumented

def test_fragment_to_document(self, fragment): (source)
ALTERNATE_NAMES: list[str] = (source)

Undocumented

Value
['xml']
CHUNK_SIZE: int = (source)

Undocumented

Value
512
DEFAULT_NSMAPS = (source)

Undocumented

Value
dict(xml='http://www.w3.org/XML/1998/namespace')
DEFAULT_NSMAPS_INVERTED = (source)

Undocumented

Value
_invert(DEFAULT_NSMAPS)

Undocumented

Value
'lxml-xml'
active_namespace_prefixes = (source)

Undocumented

empty_element_tags = (source)

Undocumented

Undocumented

processing_instruction_class = (source)

Undocumented

def _getNsTag(self, tag): (source)

Undocumented

def _prefix_for_namespace(self, namespace): (source)

Find the currently active prefix for the given namespace.

def _register_namespaces(self, mapping): (source)

Let the BeautifulSoup object know about namespaces encountered while parsing the document. This might be useful later on when creating CSS selectors. This will track (almost) all namespaces, even ones that were only in scope for part of the document. If two namespaces have the same prefix, only the first one encountered will be tracked. Un-prefixed namespaces are not tracked. :param mapping: A dictionary mapping namespace prefixes to URIs.

_default_parser = (source)

Undocumented