class documentation

class DetectsXMLParsedAsHTML(object): (source)

Known subclasses: bs4.builder._htmlparser.BeautifulSoupHTMLParser

View In Hierarchy

A mixin class for any class (a TreeBuilder, or some class used by a TreeBuilder) that's in a position to detect whether an XML document is being incorrectly parsed as HTML, and issue an appropriate warning. This requires being able to observe an incoming processing instruction that might be an XML declaration, and also able to observe tags as they're opened. If you can't do that for a given TreeBuilder, there's a less reliable implementation based on examining the raw markup.

Class Method warn_if_markup_looks_like_xml Perform a check on some markup to see if it looks like XML that's not XHTML. If so, issue a warning.
Constant LOOKS_LIKE_HTML Undocumented
Constant LOOKS_LIKE_HTML_B Undocumented
Constant XML_PREFIX Undocumented
Constant XML_PREFIX_B Undocumented
Class Method _warn Issue a warning about XML being parsed as HTML.
Method _document_might_be_xml Call this method when encountering an XML declaration, or a "processing instruction" that might be an XML declaration.
Method _initialize_xml_detector Call this method before parsing a document.
Method _root_tag_encountered Call this when you encounter the document's root tag.
Instance Variable _first_processing_instruction Undocumented
Instance Variable _root_tag Undocumented
@classmethod
def warn_if_markup_looks_like_xml(cls, markup): (source)

Perform a check on some markup to see if it looks like XML that's not XHTML. If so, issue a warning. This is much less reliable than doing the check while parsing, but some of the tree builders can't do that. :return: True if the markup looks like non-XHTML XML, False otherwise.

LOOKS_LIKE_HTML = (source)

Undocumented

Value
re.compile(r'<[^ \+]html',
           re.I)
LOOKS_LIKE_HTML_B = (source)

Undocumented

Value
re.compile(rb'<[^ \+]html',
           re.I)
XML_PREFIX: str = (source)

Undocumented

Value
'<?xml'
XML_PREFIX_B: bytes = (source)

Undocumented

Value
b'<?xml'
@classmethod
def _warn(cls): (source)

Issue a warning about XML being parsed as HTML.

def _document_might_be_xml(self, processing_instruction): (source)

Call this method when encountering an XML declaration, or a "processing instruction" that might be an XML declaration.

def _initialize_xml_detector(self): (source)

Call this method before parsing a document.

def _root_tag_encountered(self, name): (source)

Call this when you encounter the document's root tag. This is where we actually check whether an XML document is being incorrectly parsed as HTML, and issue the warning.

_first_processing_instruction = (source)

Undocumented

_root_tag = (source)

Undocumented