class BeautifulStoneSoup(BeautifulSoup): (source)
Deprecated interface to an XML parser.
Method | __init__ |
Constructor. |
Inherited from BeautifulSoup
:
Method | __copy__ |
Copy a BeautifulSoup object by converting the document to a string and parsing it again. |
Method | __getstate__ |
Undocumented |
Method | decode |
Returns a string or Unicode representation of the parse tree as an HTML or XML document. |
Method | end |
Method called by the TreeBuilder when the end of a data segment occurs. |
Method | handle |
Called by the tree builder when a chunk of textual data is encountered. |
Method | handle |
Called by the tree builder when an ending tag is encountered. |
Method | handle |
Called by the tree builder when a new tag is encountered. |
Method | insert |
This method is part of the PageElement API, but `BeautifulSoup` doesn't implement it because there is nothing before or after it in the parse tree. |
Method | insert |
This method is part of the PageElement API, but `BeautifulSoup` doesn't implement it because there is nothing before or after it in the parse tree. |
Method | new |
Create a new NavigableString associated with this BeautifulSoup object. |
Method | new |
Create a new Tag associated with this BeautifulSoup object. |
Method | object |
Method called by the TreeBuilder to integrate an object into the parse tree. |
Method | pop |
Internal method called by _popToTag when a tag is closed. |
Method | push |
Internal method called by handle_starttag when a tag is opened. |
Method | reset |
Reset this object to a state as though it had never parsed any markup. |
Method | string |
Undocumented |
Constant | ASCII |
Undocumented |
Constant | DEFAULT |
Undocumented |
Constant | NO |
Undocumented |
Constant | ROOT |
Undocumented |
Instance Variable | builder |
Undocumented |
Instance Variable | current |
Undocumented |
Instance Variable | current |
Undocumented |
Instance Variable | element |
Undocumented |
Instance Variable | hidden |
Undocumented |
Instance Variable | is |
Undocumented |
Instance Variable | known |
Undocumented |
Instance Variable | markup |
Undocumented |
Instance Variable | open |
Undocumented |
Instance Variable | parse |
Undocumented |
Instance Variable | preserve |
Undocumented |
Instance Variable | string |
Undocumented |
Instance Variable | tag |
Undocumented |
Class Method | _decode |
Ensure `markup` is bytes so it's safe to send into warnings.warn. |
Class Method | _markup |
Error-handling method to raise a warning if incoming markup looks like a URL. |
Class Method | _markup |
Error-handling method to raise a warning if incoming markup resembles a filename. |
Method | _feed |
Internal method that parses previously set markup, creating a large number of Tag and NavigableString objects. |
Method | _linkage |
Make sure linkage of this fragment is sound. |
Method | _pop |
Pops the tag stack up to and including the most recent instance of the given tag. |
Instance Variable | _most |
Undocumented |
Instance Variable | _namespaces |
Undocumented |
Inherited from Tag
(via BeautifulSoup
):
Method | __bool__ |
A tag is non-None even if it has no contents. |
Method | __call__ |
Calling a Tag like a function is the same as calling its find_all() method. Eg. tag('a') returns a list of all the A tags found within this tag. |
Method | __contains__ |
Undocumented |
Method | __delitem__ |
Deleting tag[key] deletes all 'key' attributes for the tag. |
Method | __eq__ |
Returns true iff this Tag has the same name, the same attributes, and the same contents (recursively) as `other`. |
Method | __getattr__ |
Calling tag.subtag is the same as calling tag.find(name="subtag") |
Method | __getitem__ |
tag[key] returns the value of the 'key' attribute for the Tag, and throws an exception if it's not there. |
Method | __hash__ |
Undocumented |
Method | __iter__ |
Iterating over a Tag iterates over its contents. |
Method | __len__ |
The length of a Tag is the length of its list of contents. |
Method | __ne__ |
Returns true iff this Tag is not identical to `other`, as defined in __eq__. |
Method | __repr__ |
Renders this PageElement as a string. |
Method | __setitem__ |
Setting tag[key] sets the value of the 'key' attribute for the tag. |
Method | __unicode__ |
Renders this PageElement as a Unicode string. |
Method | child |
Deprecated generator. |
Method | clear |
Wipe out all children of this PageElement by calling extract() on them. |
Method | decode |
Renders the contents of this tag as a Unicode string. |
Method | decompose |
Recursively destroys this PageElement and its children. |
Method | encode |
Render a bytestring representation of this PageElement and its contents. |
Method | encode |
Renders the contents of this PageElement as a bytestring. |
Method | find |
Look in the children of this PageElement and find the first PageElement that matches the given criteria. |
Method | find |
Look in the children of this PageElement and find all PageElements that match the given criteria. |
Method | get |
Returns the value of the 'key' attribute for the tag, or the value given for 'default' if it doesn't have that attribute. |
Method | get |
The same as get(), but always returns a list. |
Method | has |
Does this PageElement have an attribute with the given name? |
Method | has |
Deprecated method. This was kind of misleading because has_key() (attributes) was different from __in__ (contents). |
Method | index |
Find the index of a child by identity, not value. |
Method | prettify |
Pretty-print this PageElement as a string. |
Method | recursive |
Deprecated generator. |
Method | render |
Deprecated method for BS3 compatibility. |
Method | select |
Perform a CSS selection operation on the current element. |
Method | select |
Perform a CSS selection operation on the current element. |
Method | smooth |
Smooth out this element's children by consolidating consecutive strings. |
Method | string |
Replace this PageElement's contents with `string`. |
Constant | DEFAULT |
Undocumented |
Class Variable | parser |
Undocumented |
Class Variable | strings |
Undocumented |
Instance Variable | attrs |
Undocumented |
Instance Variable | can |
Undocumented |
Instance Variable | cdata |
Undocumented |
Instance Variable | contents |
Undocumented |
Instance Variable | interesting |
Undocumented |
Instance Variable | name |
Undocumented |
Instance Variable | namespace |
Undocumented |
Instance Variable | parser |
Undocumented |
Instance Variable | prefix |
Undocumented |
Instance Variable | preserve |
Undocumented |
Instance Variable | sourceline |
Undocumented |
Instance Variable | sourcepos |
Undocumented |
Property | children |
Iterate over all direct children of this PageElement. |
Property | descendants |
Iterate over all children of this PageElement in a breadth-first sequence. |
Property | is |
Is this tag an empty-element tag? (aka a self-closing tag) |
Property | string |
Convenience property to get the single string within this PageElement. |
Method | _all |
Yield all strings of certain classes, possibly stripping them. |
Method | _should |
Should this tag be pretty-printed? |
Inherited from PageElement
(via BeautifulSoup
, Tag
):
Method | append |
Appends the given PageElement to the contents of this one. |
Method | extend |
Appends the given PageElements to this one's contents. |
Method | extract |
Destructively rips this element out of the tree. |
Method | find |
Find all PageElements that match the given criteria and appear later in the document than this PageElement. |
Method | find |
Look backwards in the document from this PageElement and find all PageElements that match the given criteria. |
Method | find |
Find the first PageElement that matches the given criteria and appears later in the document than this PageElement. |
Method | find |
Find the closest sibling to this PageElement that matches the given criteria and appears later in the document. |
Method | find |
Find all siblings of this PageElement that match the given criteria and appear later in the document. |
Method | find |
Find the closest parent of this PageElement that matches the given criteria. |
Method | find |
Find all parents of this PageElement that match the given criteria. |
Method | find |
Look backwards in the document from this PageElement and find the first PageElement that matches the given criteria. |
Method | find |
Returns the closest sibling to this PageElement that matches the given criteria and appears earlier in the document. |
Method | find |
Returns all siblings to this PageElement that match the given criteria and appear earlier in the document. |
Method | format |
Format the given string using the given formatter. |
Method | formatter |
Look up or create a Formatter for the given identifier, if necessary. |
Method | get |
Get all child strings of this PageElement, concatenated using the given separator. |
Method | insert |
Insert a new PageElement in the list of this PageElement's children. |
Method | next |
Undocumented |
Method | next |
Undocumented |
Method | parent |
Undocumented |
Method | previous |
Undocumented |
Method | previous |
Undocumented |
Method | replace |
Replace this PageElement with one or more PageElements, keeping the rest of the tree the same. |
Method | setup |
Sets up the initial relations between this element and other elements. |
Method | unwrap |
Replace this PageElement with its contents. |
Method | wrap |
Wrap this PageElement inside another one. |
Class Variable | default |
Undocumented |
Class Variable | next |
Undocumented |
Class Variable | previous |
Undocumented |
Class Variable | text |
Undocumented |
Instance Variable | next |
Undocumented |
Instance Variable | next |
Undocumented |
Instance Variable | parent |
Undocumented |
Instance Variable | previous |
Undocumented |
Instance Variable | previous |
Undocumented |
Property | decomposed |
Check whether a PageElement has been decomposed. |
Property | next |
The PageElement, if any, that was parsed just after this one. |
Property | next |
All PageElements that were parsed after this one. |
Property | next |
All PageElements that are siblings of this one but were parsed later. |
Property | parents |
All PageElements that are parents of this PageElement. |
Property | previous |
The PageElement, if any, that was parsed just before this one. |
Property | previous |
All PageElements that were parsed before this one. |
Property | previous |
All PageElements that are siblings of this one but were parsed earlier. |
Property | stripped |
Yield all strings in this PageElement, stripping them first. |
Method | _find |
Iterates over a generator looking for things that match. |
Method | _find |
Undocumented |
Method | _last |
Finds the last element beneath this object to be parsed. |
Property | _is |
Is this element part of an XML tree or an HTML tree? |
bs4.BeautifulSoup.__init__
Constructor. :param markup: A string or a file-like object representing markup to be parsed. :param features: Desirable features of the parser to be used. This may be the name of a specific parser ("lxml", "lxml-xml", "html.parser", or "html5lib") or it may be the type of markup to be used ("html", "html5", "xml"). It's recommended that you name a specific parser, so that Beautiful Soup gives you the same results across platforms and virtual environments. :param builder: A TreeBuilder subclass to instantiate (or instance to use) instead of looking one up based on `features`. You only need to use this if you've implemented a custom TreeBuilder. :param parse_only: A SoupStrainer. Only parts of the document matching the SoupStrainer will be considered. This is useful when parsing part of a document that would otherwise be too large to fit into memory. :param from_encoding: A string indicating the encoding of the document to be parsed. Pass this in if Beautiful Soup is guessing wrongly about the document's encoding. :param exclude_encodings: A list of strings indicating encodings known to be wrong. Pass this in if you don't know the document's encoding but you know Beautiful Soup's guess is wrong. :param element_classes: A dictionary mapping BeautifulSoup classes like Tag and NavigableString, to other classes you'd like to be instantiated instead as the parse tree is built. This is useful for subclassing Tag or NavigableString to modify default behavior. :param kwargs: For backwards compatibility purposes, the constructor accepts certain keyword arguments used in Beautiful Soup 3. None of these arguments do anything in Beautiful Soup 4; they will result in a warning and then be ignored. Apart from this, any keyword arguments passed into the BeautifulSoup constructor are propagated to the TreeBuilder constructor. This makes it possible to configure a TreeBuilder by passing in arguments, not just by saying which one to use.