class documentation

Describes a strategy to use when outputting a parse tree to a string. Some parts of this strategy come from the distinction between HTML4, HTML5, and XML. Others are configurable by the user. Formatters are passed in as the `formatter` argument to methods like `PageElement.encode`. Most people won't need to think about formatters, and most people who need to think about them can pass in one of these predefined strings as `formatter` rather than making a new Formatter object: For HTML documents: * 'html' - HTML entity substitution for generic HTML documents. (default) * 'html5' - HTML entity substitution for HTML5 documents, as well as some optimizations in the way tags are rendered. * 'minimal' - Only make the substitutions necessary to guarantee valid HTML. * None - Do not perform any substitution. This will be faster but may result in invalid markup. For XML documents: * 'html' - Entity substitution for XHTML documents. * 'minimal' - Only make the substitutions necessary to guarantee valid XML. (default) * None - Do not perform any substitution. This will be faster but may result in invalid markup.

Method __init__ Constructor.
Method attribute_value Process the value of an attribute.
Method attributes Reorder a tag's attributes however you want.
Method substitute Process a string that needs to undergo entity substitution. This may be a string encountered in an attribute value or as text.
Constant HTML Undocumented
Constant HTML_DEFAULTS Undocumented
Constant HTML_FORMATTERS Undocumented
Constant XML Undocumented
Constant XML_FORMATTERS Undocumented
Instance Variable cdata_containing_tags Undocumented
Instance Variable empty_attributes_are_booleans Undocumented
Instance Variable entity_substitution Undocumented
Instance Variable indent Undocumented
Instance Variable language Undocumented
Instance Variable void_element_close_prefix Undocumented
Method _default Undocumented

Inherited from EntitySubstitution:

Class Method quoted_attribute_value Make a value into a quoted XML attribute, possibly escaping it.
Class Method substitute_html Replace certain Unicode characters with named HTML entities.
Class Method substitute_xml Substitute XML entities for special XML characters.
Class Method substitute_xml_containing_entities Substitute XML entities for special XML characters.
Constant AMPERSAND_OR_BRACKET Undocumented
Constant BARE_AMPERSAND_OR_BRACKET Undocumented
Constant CHARACTER_TO_HTML_ENTITY Undocumented
Constant CHARACTER_TO_HTML_ENTITY_RE Undocumented
Constant CHARACTER_TO_XML_ENTITY Undocumented
Constant HTML_ENTITY_TO_CHARACTER Undocumented
Class Method _substitute_html_entity Used with a regular expression to substitute the appropriate HTML entity for a special character string.
Class Method _substitute_xml_entity Used with a regular expression to substitute the appropriate XML entity for a special character string.
Method _populate_class_variables Initialize variables used by this class to manage the plethora of HTML5 named entities.
def __init__(self, language=None, entity_substitution=None, void_element_close_prefix='/', cdata_containing_tags=None, empty_attributes_are_booleans=False, indent=1): (source)

Constructor. :param language: This should be Formatter.XML if you are formatting XML markup and Formatter.HTML if you are formatting HTML markup. :param entity_substitution: A function to call to replace special characters with XML/HTML entities. For examples, see bs4.dammit.EntitySubstitution.substitute_html and substitute_xml. :param void_element_close_prefix: By default, void elements are represented as <tag/> (XML rules) rather than <tag> (HTML rules). To get <tag>, pass in the empty string. :param cdata_containing_tags: The list of tags that are defined as containing CDATA in this dialect. For example, in HTML, <script> and <style> tags are defined as containing CDATA, and their contents should not be formatted. :param blank_attributes_are_booleans: Render attributes whose value is the empty string as HTML-style boolean attributes. (Attributes whose value is None are always rendered this way.) :param indent: If indent is a non-negative integer or string, then the contents of elements will be indented appropriately when pretty-printing. An indent level of 0, negative, or "" will only insert newlines. Using a positive integer indent indents that many spaces per level. If indent is a string (such as " "), that string is used to indent each level. The default behavior to indent one space per level.

def attribute_value(self, value): (source)

Process the value of an attribute. :param ns: A string. :return: A string with certain characters replaced by named or numeric entities.

def attributes(self, tag): (source)

Reorder a tag's attributes however you want. By default, attributes are sorted alphabetically. This makes behavior consistent between Python 2 and Python 3, and preserves backwards compatibility with older versions of Beautiful Soup. If `empty_boolean_attributes` is True, then attributes whose values are set to the empty string will be treated as boolean attributes.

def substitute(self, ns): (source)

Process a string that needs to undergo entity substitution. This may be a string encountered in an attribute value or as text. :param ns: A string. :return: A string with certain characters replaced by named or numeric entities.

Undocumented

Value
'html'
HTML_DEFAULTS = (source)

Undocumented

Value
dict(cdata_containing_tags=set(['script', 'style']))
HTML_FORMATTERS: dict = (source)

Undocumented

Value
{}

Undocumented

Value
'xml'
XML_FORMATTERS: dict = (source)

Undocumented

Value
{}
cdata_containing_tags = (source)

Undocumented

empty_attributes_are_booleans = (source)

Undocumented

entity_substitution = (source)

Undocumented

Undocumented

language = (source)

Undocumented

void_element_close_prefix = (source)

Undocumented

def _default(self, language, value, kwarg): (source)

Undocumented