bs4.tests.HTMLTreeBuilderSmokeTest

class documentation

class HTMLTreeBuilderSmokeTest(TreeBuilderSmokeTest): (source)

Known subclasses: bs4.tests.HTML5TreeBuilderSmokeTest, bs4.tests.test_htmlparser.TestHTMLParserTreeBuilder, bs4.tests.test_lxml.TestLXMLTreeBuilder

View In Hierarchy

A basic test of a treebuilder's competence. Any HTML treebuilder, present or future, should be able to pass these tests. With invalid markup, there's room for interpretation, and different parsers can handle it differently. But with the markup in these tests, there's not much room for interpretation.

Method	`assertDoctypeHandled`	Assert that a given doctype string is handled correctly.
Method	`test_ampersand_in_attribute_value_gets_escaped`	Undocumented
Method	`test_angle_brackets_in_attribute_values_are_escaped`	Undocumented
Method	`test_apos_entity`	Undocumented
Method	`test_attribute_values_with_double_nested_quotes_get_quoted`	Undocumented
Method	`test_attribute_values_with_nested_quotes_are_left_alone`	Undocumented
Method	`test_basic_namespaces`	Parsers don't need to understand namespaces, but at the very least they should not choke on namespaces or lose data.
Method	`test_br_is_always_empty_element_tag`	A <br> tag is designated as an empty-element tag.
Method	`test_can_parse_unicode_document`	Undocumented
Method	`test_closing_tag_with_no_opening_tag`	Undocumented
Method	`test_comment`	Undocumented
Method	`test_correctly_nested_tables`	One table can go inside another one.
Method	`test_deepcopy`	Make sure you can copy the tree builder.
Method	`test_deeply_nested_multivalued_attribute`	Undocumented
Method	`test_detect_xml_parsed_as_html`	Undocumented
Method	`test_double_head`	Undocumented
Method	`test_empty_doctype`	Undocumented
Method	`test_empty_element_tags`	Verify consistent handling of empty-element tags, no matter how they come in through the markup.
Method	`test_entities_converted_on_the_way_out`	Undocumented
Method	`test_entities_in_attributes_converted_to_unicode`	Undocumented
Method	`test_entities_in_foreign_document_encoding`	Undocumented
Method	`test_entities_in_strings_converted_during_parsing`	Undocumented
Method	`test_entities_in_text_converted_to_unicode`	Undocumented
Method	`test_escaped_ampersand_in_attribute_value_is_left_alone`	Undocumented
Method	`test_head_tag_between_head_and_body`	Prevent recurrence of a bug in the html5lib treebuilder.
Method	`test_html5_style_meta_tag_reflects_current_encoding`	Undocumented
Method	`test_meta_tag_reflects_current_encoding`	Undocumented
Method	`test_mixed_case_doctype`	Undocumented
Method	`test_multipart_strings`	Mostly to prevent a recurrence of a bug in the html5lib treebuilder.
Method	`test_multiple_copies_of_a_tag`	Prevent recurrence of a bug in the html5lib treebuilder.
Method	`test_multivalued_attribute_on_html`	Undocumented
Method	`test_multivalued_attribute_value_becomes_list`	Undocumented
Method	`test_multivalued_attribute_with_whitespace`	Undocumented
Method	`test_namespaced_html`	Undocumented
Method	`test_namespaced_public_doctype`	Undocumented
Method	`test_namespaced_system_doctype`	Undocumented
Method	`test_nested_block_level_elements`	Block elements can be nested.
Method	`test_nested_formatting_elements`	Undocumented
Method	`test_nested_inline_elements`	Inline elements can be nested indefinitely.
Method	`test_non_breaking_spaces_converted_on_the_way_in`	Undocumented
Method	`test_normal_doctypes`	Make sure normal, everyday HTML doctypes are handled correctly.
Method	`test_out_of_range_entity`	Undocumented
Method	`test_p_tag_is_never_empty_element`	A <p> tag is never designated as an empty-element tag.
Method	`test_pickle_and_unpickle_identity`	Undocumented
Method	`test_preserved_whitespace_in_pre_and_textarea`	Whitespace must be preserved in <pre> and <textarea> tags, even if that would mean not prettifying the markup.
Method	`test_processing_instruction`	Undocumented
Method	`test_public_doctype_with_url`	Undocumented
Method	`test_python_specific_encodings_not_used_in_charset`	Undocumented
Method	`test_quot_entity_converted_to_quotation_mark`	Undocumented
Method	`test_real_hebrew_document`	Undocumented
Method	`test_real_iso_8859_document`	Undocumented
Method	`test_real_shift_jis_document`	Undocumented
Method	`test_real_xhtml_document`	A real XHTML document should come out more or less the same as it went in.
Method	`test_single_quote_attribute_values_become_double_quotes`	Undocumented
Method	`test_smart_quotes_converted_on_the_way_in`	Undocumented
Method	`test_soupstrainer`	Parsers should be able to work with SoupStrainers.
Method	`test_special_string_containers`	Undocumented
Method	`test_strings_resembling_character_entity_references`	Undocumented
Method	`test_system_doctype`	Undocumented
Method	`test_tag_with_no_attributes_can_have_attributes_added`	Undocumented
Method	`test_unclosed_tags_get_closed`	A tag that's not closed by the end of the document should be closed.
Method	`test_worst_case`	Test the worst case (currently) for linking issues.
Method	`_document_with_doctype`	Generate and parse a document with the given doctype.

Inherited from TreeBuilderSmokeTest:

Method	`test_attribute_multi_valued`	Undocumented
Method	`test_attribute_not_multi_valued`	Undocumented
Method	`test_fuzzed_input`	Undocumented

def assertDoctypeHandled(self, doctype_fragment): (source) ¶

Assert that a given doctype string is handled correctly.

def test_ampersand_in_attribute_value_gets_escaped(self): (source) ¶

Undocumented

def test_angle_brackets_in_attribute_values_are_escaped(self): (source) ¶

Undocumented

def test_apos_entity(self): (source) ¶

Undocumented

def test_attribute_values_with_double_nested_quotes_get_quoted(self): (source) ¶

Undocumented

def test_attribute_values_with_nested_quotes_are_left_alone(self): (source) ¶

Undocumented

def test_basic_namespaces(self): (source) ¶

Parsers don't need to *understand* namespaces, but at the very least they should not choke on namespaces or lose data.

def test_br_is_always_empty_element_tag(self): (source) ¶

A tag is designated as an empty-element tag. Some parsers treat as one tag, some parsers as two tags, but it should always be an empty-element tag.

def test_can_parse_unicode_document(self): (source) ¶

Undocumented

def test_closing_tag_with_no_opening_tag(self): (source) ¶

Undocumented

def test_comment(self): (source) ¶

Undocumented

def test_correctly_nested_tables(self): (source) ¶

overridden in bs4.tests.test_html5lib.TestHTML5LibBuilder

One table can go inside another one.

def test_deepcopy(self): (source) ¶

Make sure you can copy the tree builder. This is important because the builder is part of a BeautifulSoup object, and we want to be able to copy that.

def test_deeply_nested_multivalued_attribute(self): (source) ¶

Undocumented

def test_detect_xml_parsed_as_html(self): (source) ¶

Undocumented

def test_double_head(self): (source) ¶

Undocumented

def test_empty_doctype(self): (source) ¶

overridden in bs4.tests.test_lxml.TestLXMLTreeBuilder

Undocumented

def test_empty_element_tags(self): (source) ¶

Verify consistent handling of empty-element tags, no matter how they come in through the markup.

def test_entities_converted_on_the_way_out(self): (source) ¶

Undocumented

def test_entities_in_attributes_converted_to_unicode(self): (source) ¶

Undocumented

def test_entities_in_foreign_document_encoding(self): (source) ¶

overridden in bs4.tests.test_lxml.TestLXMLTreeBuilder

Undocumented

def test_entities_in_strings_converted_during_parsing(self): (source) ¶

Undocumented

def test_entities_in_text_converted_to_unicode(self): (source) ¶

Undocumented

def test_escaped_ampersand_in_attribute_value_is_left_alone(self): (source) ¶

Undocumented

def test_head_tag_between_head_and_body(self): (source) ¶

Prevent recurrence of a bug in the html5lib treebuilder.

def test_html5_style_meta_tag_reflects_current_encoding(self): (source) ¶

Undocumented

def test_meta_tag_reflects_current_encoding(self): (source) ¶

Undocumented

def test_mixed_case_doctype(self): (source) ¶

Undocumented

def test_multipart_strings(self): (source) ¶

Mostly to prevent a recurrence of a bug in the html5lib treebuilder.

def test_multiple_copies_of_a_tag(self): (source) ¶

Prevent recurrence of a bug in the html5lib treebuilder.

def test_multivalued_attribute_on_html(self): (source) ¶

Undocumented

def test_multivalued_attribute_value_becomes_list(self): (source) ¶

Undocumented

def test_multivalued_attribute_with_whitespace(self): (source) ¶

Undocumented

def test_namespaced_html(self): (source) ¶

Undocumented

def test_namespaced_public_doctype(self): (source) ¶

overridden in bs4.tests.test_htmlparser.TestHTMLParserTreeBuilder

Undocumented

def test_namespaced_system_doctype(self): (source) ¶

overridden in bs4.tests.test_htmlparser.TestHTMLParserTreeBuilder

Undocumented

def test_nested_block_level_elements(self): (source) ¶

Block elements can be nested.

def test_nested_formatting_elements(self): (source) ¶

Undocumented

def test_nested_inline_elements(self): (source) ¶

Inline elements can be nested indefinitely.

def test_non_breaking_spaces_converted_on_the_way_in(self): (source) ¶

Undocumented

def test_normal_doctypes(self): (source) ¶

Make sure normal, everyday HTML doctypes are handled correctly.

def test_out_of_range_entity(self): (source) ¶

overridden in bs4.tests.test_lxml.TestLXMLTreeBuilder

Undocumented

def test_p_tag_is_never_empty_element(self): (source) ¶

A tag is never designated as an empty-element tag. Even if the markup shows it as an empty-element tag, it shouldn't be presented that way.

def test_pickle_and_unpickle_identity(self): (source) ¶

Undocumented

def test_preserved_whitespace_in_pre_and_textarea(self): (source) ¶

Whitespace must be preserved in <pre> and <textarea> tags, even if that would mean not prettifying the markup.

def test_processing_instruction(self): (source) ¶

overridden in bs4.tests.test_html5lib.TestHTML5LibBuilder

Undocumented

def test_public_doctype_with_url(self): (source) ¶

Undocumented

def test_python_specific_encodings_not_used_in_charset(self): (source) ¶

Undocumented

def test_quot_entity_converted_to_quotation_mark(self): (source) ¶

Undocumented

def test_real_hebrew_document(self): (source) ¶

Undocumented

def test_real_iso_8859_document(self): (source) ¶

Undocumented

def test_real_shift_jis_document(self): (source) ¶

Undocumented

def test_real_xhtml_document(self): (source) ¶

overridden in bs4.tests.HTML5TreeBuilderSmokeTest

A real XHTML document should come out more or less the same as it went in.

def test_single_quote_attribute_values_become_double_quotes(self): (source) ¶

Undocumented

def test_smart_quotes_converted_on_the_way_in(self): (source) ¶

Undocumented

def test_soupstrainer(self): (source) ¶

overridden in bs4.tests.test_html5lib.TestHTML5LibBuilder

Parsers should be able to work with SoupStrainers.

def test_special_string_containers(self): (source) ¶

overridden in bs4.tests.test_html5lib.TestHTML5LibBuilder

Undocumented

def test_strings_resembling_character_entity_references(self): (source) ¶

Undocumented

def test_system_doctype(self): (source) ¶

Undocumented

def test_tag_with_no_attributes_can_have_attributes_added(self): (source) ¶

Undocumented

def test_unclosed_tags_get_closed(self): (source) ¶

A tag that's not closed by the end of the document should be closed. This applies to all tags except empty-element tags.

def test_worst_case(self): (source) ¶

Test the worst case (currently) for linking issues.

def _document_with_doctype(self, doctype_fragment, doctype_string='DOCTYPE'): (source) ¶

Generate and parse a document with the given doctype.