class documentation

class BeautifulStoneSoup(BeautifulSoup): (source)

View In Hierarchy

Deprecated interface to an XML parser.

Method __init__ Constructor.

Inherited from BeautifulSoup:

Method __copy__ Copy a BeautifulSoup object by converting the document to a string and parsing it again.
Method __getstate__ Undocumented
Method decode Returns a string or Unicode representation of the parse tree as an HTML or XML document.
Method endData Method called by the TreeBuilder when the end of a data segment occurs.
Method handle_data Called by the tree builder when a chunk of textual data is encountered.
Method handle_endtag Called by the tree builder when an ending tag is encountered.
Method handle_starttag Called by the tree builder when a new tag is encountered.
Method insert_after This method is part of the PageElement API, but `BeautifulSoup` doesn't implement it because there is nothing before or after it in the parse tree.
Method insert_before This method is part of the PageElement API, but `BeautifulSoup` doesn't implement it because there is nothing before or after it in the parse tree.
Method new_string Create a new NavigableString associated with this BeautifulSoup object.
Method new_tag Create a new Tag associated with this BeautifulSoup object.
Method object_was_parsed Method called by the TreeBuilder to integrate an object into the parse tree.
Method popTag Internal method called by _popToTag when a tag is closed.
Method pushTag Internal method called by handle_starttag when a tag is opened.
Method reset Reset this object to a state as though it had never parsed any markup.
Method string_container Undocumented
Constant ASCII_SPACES Undocumented
Constant DEFAULT_BUILDER_FEATURES Undocumented
Constant NO_PARSER_SPECIFIED_WARNING Undocumented
Constant ROOT_TAG_NAME Undocumented
Instance Variable builder Undocumented
Instance Variable current_data Undocumented
Instance Variable currentTag Undocumented
Instance Variable element_classes Undocumented
Instance Variable hidden Undocumented
Instance Variable is_xml Undocumented
Instance Variable known_xml Undocumented
Instance Variable markup Undocumented
Instance Variable open_tag_counter Undocumented
Instance Variable parse_only Undocumented
Instance Variable preserve_whitespace_tag_stack Undocumented
Instance Variable string_container_stack Undocumented
Instance Variable tagStack Undocumented
Class Method _decode_markup Ensure `markup` is bytes so it's safe to send into warnings.warn.
Class Method _markup_is_url Error-handling method to raise a warning if incoming markup looks like a URL.
Class Method _markup_resembles_filename Error-handling method to raise a warning if incoming markup resembles a filename.
Method _feed Internal method that parses previously set markup, creating a large number of Tag and NavigableString objects.
Method _linkage_fixer Make sure linkage of this fragment is sound.
Method _popToTag Pops the tag stack up to and including the most recent instance of the given tag.
Instance Variable _most_recent_element Undocumented
Instance Variable _namespaces Undocumented

Inherited from Tag (via BeautifulSoup):

Method __bool__ A tag is non-None even if it has no contents.
Method __call__ Calling a Tag like a function is the same as calling its find_all() method. Eg. tag('a') returns a list of all the A tags found within this tag.
Method __contains__ Undocumented
Method __delitem__ Deleting tag[key] deletes all 'key' attributes for the tag.
Method __eq__ Returns true iff this Tag has the same name, the same attributes, and the same contents (recursively) as `other`.
Method __getattr__ Calling tag.subtag is the same as calling tag.find(name="subtag")
Method __getitem__ tag[key] returns the value of the 'key' attribute for the Tag, and throws an exception if it's not there.
Method __hash__ Undocumented
Method __iter__ Iterating over a Tag iterates over its contents.
Method __len__ The length of a Tag is the length of its list of contents.
Method __ne__ Returns true iff this Tag is not identical to `other`, as defined in __eq__.
Method __repr__ Renders this PageElement as a string.
Method __setitem__ Setting tag[key] sets the value of the 'key' attribute for the tag.
Method __unicode__ Renders this PageElement as a Unicode string.
Method childGenerator Deprecated generator.
Method clear Wipe out all children of this PageElement by calling extract() on them.
Method decode_contents Renders the contents of this tag as a Unicode string.
Method decompose Recursively destroys this PageElement and its children.
Method encode Render a bytestring representation of this PageElement and its contents.
Method encode_contents Renders the contents of this PageElement as a bytestring.
Method find Look in the children of this PageElement and find the first PageElement that matches the given criteria.
Method find_all Look in the children of this PageElement and find all PageElements that match the given criteria.
Method get Returns the value of the 'key' attribute for the tag, or the value given for 'default' if it doesn't have that attribute.
Method get_attribute_list The same as get(), but always returns a list.
Method has_attr Does this PageElement have an attribute with the given name?
Method has_key Deprecated method. This was kind of misleading because has_key() (attributes) was different from __in__ (contents).
Method index Find the index of a child by identity, not value.
Method prettify Pretty-print this PageElement as a string.
Method recursiveChildGenerator Deprecated generator.
Method renderContents Deprecated method for BS3 compatibility.
Method select Perform a CSS selection operation on the current element.
Method select_one Perform a CSS selection operation on the current element.
Method smooth Smooth out this element's children by consolidating consecutive strings.
Method string.setter Replace this PageElement's contents with `string`.
Constant DEFAULT_INTERESTING_STRING_TYPES Undocumented
Class Variable parserClass Undocumented
Class Variable strings Undocumented
Instance Variable attrs Undocumented
Instance Variable can_be_empty_element Undocumented
Instance Variable cdata_list_attributes Undocumented
Instance Variable contents Undocumented
Instance Variable interesting_string_types Undocumented
Instance Variable name Undocumented
Instance Variable namespace Undocumented
Instance Variable parser_class Undocumented
Instance Variable prefix Undocumented
Instance Variable preserve_whitespace_tags Undocumented
Instance Variable sourceline Undocumented
Instance Variable sourcepos Undocumented
Property children Iterate over all direct children of this PageElement.
Property descendants Iterate over all children of this PageElement in a breadth-first sequence.
Property is_empty_element Is this tag an empty-element tag? (aka a self-closing tag)
Property string Convenience property to get the single string within this PageElement.
Method _all_strings Yield all strings of certain classes, possibly stripping them.
Method _should_pretty_print Should this tag be pretty-printed?

Inherited from PageElement (via BeautifulSoup, Tag):

Method append Appends the given PageElement to the contents of this one.
Method extend Appends the given PageElements to this one's contents.
Method extract Destructively rips this element out of the tree.
Method find_all_next Find all PageElements that match the given criteria and appear later in the document than this PageElement.
Method find_all_previous Look backwards in the document from this PageElement and find all PageElements that match the given criteria.
Method find_next Find the first PageElement that matches the given criteria and appears later in the document than this PageElement.
Method find_next_sibling Find the closest sibling to this PageElement that matches the given criteria and appears later in the document.
Method find_next_siblings Find all siblings of this PageElement that match the given criteria and appear later in the document.
Method find_parent Find the closest parent of this PageElement that matches the given criteria.
Method find_parents Find all parents of this PageElement that match the given criteria.
Method find_previous Look backwards in the document from this PageElement and find the first PageElement that matches the given criteria.
Method find_previous_sibling Returns the closest sibling to this PageElement that matches the given criteria and appears earlier in the document.
Method find_previous_siblings Returns all siblings to this PageElement that match the given criteria and appear earlier in the document.
Method format_string Format the given string using the given formatter.
Method formatter_for_name Look up or create a Formatter for the given identifier, if necessary.
Method get_text Get all child strings of this PageElement, concatenated using the given separator.
Method insert Insert a new PageElement in the list of this PageElement's children.
Method nextGenerator Undocumented
Method nextSiblingGenerator Undocumented
Method parentGenerator Undocumented
Method previousGenerator Undocumented
Method previousSiblingGenerator Undocumented
Method replace_with Replace this PageElement with one or more PageElements, keeping the rest of the tree the same.
Method setup Sets up the initial relations between this element and other elements.
Method unwrap Replace this PageElement with its contents.
Method wrap Wrap this PageElement inside another one.
Class Variable default Undocumented
Class Variable nextSibling Undocumented
Class Variable previousSibling Undocumented
Class Variable text Undocumented
Instance Variable next_element Undocumented
Instance Variable next_sibling Undocumented
Instance Variable parent Undocumented
Instance Variable previous_element Undocumented
Instance Variable previous_sibling Undocumented
Property decomposed Check whether a PageElement has been decomposed.
Property next The PageElement, if any, that was parsed just after this one.
Property next_elements All PageElements that were parsed after this one.
Property next_siblings All PageElements that are siblings of this one but were parsed later.
Property parents All PageElements that are parents of this PageElement.
Property previous The PageElement, if any, that was parsed just before this one.
Property previous_elements All PageElements that were parsed before this one.
Property previous_siblings All PageElements that are siblings of this one but were parsed earlier.
Property stripped_strings Yield all strings in this PageElement, stripping them first.
Method _find_all Iterates over a generator looking for things that match.
Method _find_one Undocumented
Method _last_descendant Finds the last element beneath this object to be parsed.
Property _is_xml Is this element part of an XML tree or an HTML tree?
def __init__(self, *args, **kwargs): (source)

Constructor. :param markup: A string or a file-like object representing markup to be parsed. :param features: Desirable features of the parser to be used. This may be the name of a specific parser ("lxml", "lxml-xml", "html.parser", or "html5lib") or it may be the type of markup to be used ("html", "html5", "xml"). It's recommended that you name a specific parser, so that Beautiful Soup gives you the same results across platforms and virtual environments. :param builder: A TreeBuilder subclass to instantiate (or instance to use) instead of looking one up based on `features`. You only need to use this if you've implemented a custom TreeBuilder. :param parse_only: A SoupStrainer. Only parts of the document matching the SoupStrainer will be considered. This is useful when parsing part of a document that would otherwise be too large to fit into memory. :param from_encoding: A string indicating the encoding of the document to be parsed. Pass this in if Beautiful Soup is guessing wrongly about the document's encoding. :param exclude_encodings: A list of strings indicating encodings known to be wrong. Pass this in if you don't know the document's encoding but you know Beautiful Soup's guess is wrong. :param element_classes: A dictionary mapping BeautifulSoup classes like Tag and NavigableString, to other classes you'd like to be instantiated instead as the parse tree is built. This is useful for subclassing Tag or NavigableString to modify default behavior. :param kwargs: For backwards compatibility purposes, the constructor accepts certain keyword arguments used in Beautiful Soup 3. None of these arguments do anything in Beautiful Soup 4; they will result in a warning and then be ignored. Apart from this, any keyword arguments passed into the BeautifulSoup constructor are propagated to the TreeBuilder constructor. This makes it possible to configure a TreeBuilder by passing in arguments, not just by saying which one to use.