class documentation

class EntitySubstitution(object): (source)

Known subclasses: bs4.formatter.Formatter

View In Hierarchy

The ability to substitute XML or HTML entities for certain characters.

Class Method quoted_attribute_value Make a value into a quoted XML attribute, possibly escaping it.
Class Method substitute_html Replace certain Unicode characters with named HTML entities.
Class Method substitute_xml Substitute XML entities for special XML characters.
Class Method substitute_xml_containing_entities Substitute XML entities for special XML characters.
Constant AMPERSAND_OR_BRACKET Undocumented
Constant BARE_AMPERSAND_OR_BRACKET Undocumented
Constant CHARACTER_TO_HTML_ENTITY Undocumented
Constant CHARACTER_TO_HTML_ENTITY_RE Undocumented
Constant CHARACTER_TO_XML_ENTITY Undocumented
Constant HTML_ENTITY_TO_CHARACTER Undocumented
Class Method _substitute_html_entity Used with a regular expression to substitute the appropriate HTML entity for a special character string.
Class Method _substitute_xml_entity Used with a regular expression to substitute the appropriate XML entity for a special character string.
Method _populate_class_variables Initialize variables used by this class to manage the plethora of HTML5 named entities.
@classmethod
def quoted_attribute_value(self, value): (source)

Make a value into a quoted XML attribute, possibly escaping it. Most strings will be quoted using double quotes. Bob's Bar -> "Bob's Bar" If a string contains double quotes, it will be quoted using single quotes. Welcome to "my bar" -> 'Welcome to "my bar"' If a string contains both single and double quotes, the double quotes will be escaped, and the string will be quoted using double quotes. Welcome to "Bob's Bar" -> "Welcome to "Bob's bar"

@classmethod
def substitute_html(cls, s): (source)

Replace certain Unicode characters with named HTML entities. This differs from data.encode(encoding, 'xmlcharrefreplace') in that the goal is to make the result more readable (to those with ASCII displays) rather than to recover from errors. There's absolutely nothing wrong with a UTF-8 string containg a LATIN SMALL LETTER E WITH ACUTE, but replacing that character with "é" will make it more readable to some people. :param s: A Unicode string.

@classmethod
def substitute_xml(cls, value, make_quoted_attribute=False): (source)

Substitute XML entities for special XML characters. :param value: A string to be substituted. The less-than sign will become <, the greater-than sign will become >, and any ampersands will become &. If you want ampersands that appear to be part of an entity definition to be left alone, use substitute_xml_containing_entities() instead. :param make_quoted_attribute: If True, then the string will be quoted, as befits an attribute value.

@classmethod
def substitute_xml_containing_entities(cls, value, make_quoted_attribute=False): (source)

Substitute XML entities for special XML characters. :param value: A string to be substituted. The less-than sign will become <, the greater-than sign will become >, and any ampersands that are not part of an entity defition will become &. :param make_quoted_attribute: If True, then the string will be quoted, as befits an attribute value.

AMPERSAND_OR_BRACKET = (source)

Undocumented

Value
re.compile(r'([<>&])')
BARE_AMPERSAND_OR_BRACKET = (source)

Undocumented

Value
re.compile(r'([<>]|&(?!#\d+;|#x[0-9a-fA-F]+;|\w+;))')
CHARACTER_TO_HTML_ENTITY = (source)

Undocumented

CHARACTER_TO_HTML_ENTITY_RE = (source)

Undocumented

CHARACTER_TO_XML_ENTITY: dict[str, str] = (source)

Undocumented

Value
{'\'': 'apos', '"': 'quot', '&': 'amp', '<': 'lt', '>': 'gt'}
HTML_ENTITY_TO_CHARACTER = (source)

Undocumented

@classmethod
def _substitute_html_entity(cls, matchobj): (source)

Used with a regular expression to substitute the appropriate HTML entity for a special character string.

@classmethod
def _substitute_xml_entity(cls, matchobj): (source)

Used with a regular expression to substitute the appropriate XML entity for a special character string.

def _populate_class_variables(): (source)

Initialize variables used by this class to manage the plethora of HTML5 named entities. This function returns a 3-tuple containing two dictionaries and a regular expression: unicode_to_name - A mapping of Unicode strings like "⦨" to entity names like "angmsdaa". When a single Unicode string has multiple entity names, we try to choose the most commonly-used name. name_to_unicode: A mapping of entity names like "angmsdaa" to Unicode strings like "⦨". named_entity_re: A regular expression matching (almost) any Unicode string that corresponds to an HTML5 named entity.