class documentation

Contains the navigational information for some part of the page: that is, its current location in the parse tree. NavigableString, Tag, etc. are all subclasses of PageElement.

Method append Appends the given PageElement to the contents of this one.
Method extend Appends the given PageElements to this one's contents.
Method extract Destructively rips this element out of the tree.
Method find_all_next Find all PageElements that match the given criteria and appear later in the document than this PageElement.
Method find_all_previous Look backwards in the document from this PageElement and find all PageElements that match the given criteria.
Method find_next Find the first PageElement that matches the given criteria and appears later in the document than this PageElement.
Method find_next_sibling Find the closest sibling to this PageElement that matches the given criteria and appears later in the document.
Method find_next_siblings Find all siblings of this PageElement that match the given criteria and appear later in the document.
Method find_parent Find the closest parent of this PageElement that matches the given criteria.
Method find_parents Find all parents of this PageElement that match the given criteria.
Method find_previous Look backwards in the document from this PageElement and find the first PageElement that matches the given criteria.
Method find_previous_sibling Returns the closest sibling to this PageElement that matches the given criteria and appears earlier in the document.
Method find_previous_siblings Returns all siblings to this PageElement that match the given criteria and appear earlier in the document.
Method format_string Format the given string using the given formatter.
Method formatter_for_name Look up or create a Formatter for the given identifier, if necessary.
Method get_text Get all child strings of this PageElement, concatenated using the given separator.
Method insert Insert a new PageElement in the list of this PageElement's children.
Method insert_after Makes the given element(s) the immediate successor of this one.
Method insert_before Makes the given element(s) the immediate predecessor of this one.
Method nextGenerator Undocumented
Method nextSiblingGenerator Undocumented
Method parentGenerator Undocumented
Method previousGenerator Undocumented
Method previousSiblingGenerator Undocumented
Method replace_with Replace this PageElement with one or more PageElements, keeping the rest of the tree the same.
Method setup Sets up the initial relations between this element and other elements.
Method unwrap Replace this PageElement with its contents.
Method wrap Wrap this PageElement inside another one.
Class Variable default Undocumented
Class Variable nextSibling Undocumented
Class Variable previousSibling Undocumented
Class Variable text Undocumented
Instance Variable next_element Undocumented
Instance Variable next_sibling Undocumented
Instance Variable parent Undocumented
Instance Variable previous_element Undocumented
Instance Variable previous_sibling Undocumented
Property decomposed Check whether a PageElement has been decomposed.
Property next The PageElement, if any, that was parsed just after this one.
Property next_elements All PageElements that were parsed after this one.
Property next_siblings All PageElements that are siblings of this one but were parsed later.
Property parents All PageElements that are parents of this PageElement.
Property previous The PageElement, if any, that was parsed just before this one.
Property previous_elements All PageElements that were parsed before this one.
Property previous_siblings All PageElements that are siblings of this one but were parsed earlier.
Property stripped_strings Yield all strings in this PageElement, stripping them first.
Method _all_strings Yield all strings of certain classes, possibly stripping them.
Method _find_all Iterates over a generator looking for things that match.
Method _find_one Undocumented
Method _last_descendant Finds the last element beneath this object to be parsed.
Property _is_xml Is this element part of an XML tree or an HTML tree?
def append(self, tag): (source)

Appends the given PageElement to the contents of this one. :param tag: A PageElement.

def extend(self, tags): (source)

Appends the given PageElements to this one's contents. :param tags: A list of PageElements. If a single Tag is provided instead, this PageElement's contents will be extended with that Tag's contents.

def extract(self, _self_index=None): (source)

Destructively rips this element out of the tree. :param _self_index: The location of this element in its parent's .contents, if known. Passing this in allows for a performance optimization. :return: `self`, no longer part of the tree.

def find_all_next(self, name=None, attrs={}, string=None, limit=None, **kwargs): (source)

Find all PageElements that match the given criteria and appear later in the document than this PageElement. All find_* methods take a common set of arguments. See the online documentation for detailed explanations. :param name: A filter on tag name. :param attrs: A dictionary of filters on attribute values. :param string: A filter for a NavigableString with specific text. :param limit: Stop looking after finding this many results. :kwargs: A dictionary of filters on attribute values. :return: A ResultSet containing PageElements.

def find_all_previous(self, name=None, attrs={}, string=None, limit=None, **kwargs): (source)

Look backwards in the document from this PageElement and find all PageElements that match the given criteria. All find_* methods take a common set of arguments. See the online documentation for detailed explanations. :param name: A filter on tag name. :param attrs: A dictionary of filters on attribute values. :param string: A filter for a NavigableString with specific text. :param limit: Stop looking after finding this many results. :kwargs: A dictionary of filters on attribute values. :return: A ResultSet of PageElements. :rtype: bs4.element.ResultSet

def find_next(self, name=None, attrs={}, string=None, **kwargs): (source)

Find the first PageElement that matches the given criteria and appears later in the document than this PageElement. All find_* methods take a common set of arguments. See the online documentation for detailed explanations. :param name: A filter on tag name. :param attrs: A dictionary of filters on attribute values. :param string: A filter for a NavigableString with specific text. :kwargs: A dictionary of filters on attribute values. :return: A PageElement. :rtype: bs4.element.Tag | bs4.element.NavigableString

def find_next_sibling(self, name=None, attrs={}, string=None, **kwargs): (source)

Find the closest sibling to this PageElement that matches the given criteria and appears later in the document. All find_* methods take a common set of arguments. See the online documentation for detailed explanations. :param name: A filter on tag name. :param attrs: A dictionary of filters on attribute values. :param string: A filter for a NavigableString with specific text. :kwargs: A dictionary of filters on attribute values. :return: A PageElement. :rtype: bs4.element.Tag | bs4.element.NavigableString

def find_next_siblings(self, name=None, attrs={}, string=None, limit=None, **kwargs): (source)

Find all siblings of this PageElement that match the given criteria and appear later in the document. All find_* methods take a common set of arguments. See the online documentation for detailed explanations. :param name: A filter on tag name. :param attrs: A dictionary of filters on attribute values. :param string: A filter for a NavigableString with specific text. :param limit: Stop looking after finding this many results. :kwargs: A dictionary of filters on attribute values. :return: A ResultSet of PageElements. :rtype: bs4.element.ResultSet

def find_parent(self, name=None, attrs={}, **kwargs): (source)

Find the closest parent of this PageElement that matches the given criteria. All find_* methods take a common set of arguments. See the online documentation for detailed explanations. :param name: A filter on tag name. :param attrs: A dictionary of filters on attribute values. :kwargs: A dictionary of filters on attribute values. :return: A PageElement. :rtype: bs4.element.Tag | bs4.element.NavigableString

def find_parents(self, name=None, attrs={}, limit=None, **kwargs): (source)

Find all parents of this PageElement that match the given criteria. All find_* methods take a common set of arguments. See the online documentation for detailed explanations. :param name: A filter on tag name. :param attrs: A dictionary of filters on attribute values. :param limit: Stop looking after finding this many results. :kwargs: A dictionary of filters on attribute values. :return: A PageElement. :rtype: bs4.element.Tag | bs4.element.NavigableString

def find_previous(self, name=None, attrs={}, string=None, **kwargs): (source)

Look backwards in the document from this PageElement and find the first PageElement that matches the given criteria. All find_* methods take a common set of arguments. See the online documentation for detailed explanations. :param name: A filter on tag name. :param attrs: A dictionary of filters on attribute values. :param string: A filter for a NavigableString with specific text. :kwargs: A dictionary of filters on attribute values. :return: A PageElement. :rtype: bs4.element.Tag | bs4.element.NavigableString

def find_previous_sibling(self, name=None, attrs={}, string=None, **kwargs): (source)

Returns the closest sibling to this PageElement that matches the given criteria and appears earlier in the document. All find_* methods take a common set of arguments. See the online documentation for detailed explanations. :param name: A filter on tag name. :param attrs: A dictionary of filters on attribute values. :param string: A filter for a NavigableString with specific text. :kwargs: A dictionary of filters on attribute values. :return: A PageElement. :rtype: bs4.element.Tag | bs4.element.NavigableString

def find_previous_siblings(self, name=None, attrs={}, string=None, limit=None, **kwargs): (source)

Returns all siblings to this PageElement that match the given criteria and appear earlier in the document. All find_* methods take a common set of arguments. See the online documentation for detailed explanations. :param name: A filter on tag name. :param attrs: A dictionary of filters on attribute values. :param string: A filter for a NavigableString with specific text. :param limit: Stop looking after finding this many results. :kwargs: A dictionary of filters on attribute values. :return: A ResultSet of PageElements. :rtype: bs4.element.ResultSet

def format_string(self, s, formatter): (source)

Format the given string using the given formatter. :param s: A string. :param formatter: A Formatter object, or a string naming one of the standard formatters.

def formatter_for_name(self, formatter): (source)

Look up or create a Formatter for the given identifier, if necessary. :param formatter: Can be a Formatter object (used as-is), a function (used as the entity substitution hook for an XMLFormatter or HTMLFormatter), or a string (used to look up an XMLFormatter or HTMLFormatter in the appropriate registry.

def get_text(self, separator='', strip=False, types=default): (source)

Get all child strings of this PageElement, concatenated using the given separator. :param separator: Strings will be concatenated using this separator. :param strip: If True, strings will be stripped before being concatenated. :param types: A tuple of NavigableString subclasses. Any strings of a subclass not found in this list will be ignored. Although there are exceptions, the default behavior in most cases is to consider only NavigableString and CData objects. That means no comments, processing instructions, etc. :return: A string.

def insert(self, position, new_child): (source)

Insert a new PageElement in the list of this PageElement's children. This works the same way as `list.insert`. :param position: The numeric position that should be occupied in `self.children` by the new PageElement. :param new_child: A PageElement.

def insert_after(self, *args): (source)
overridden in bs4.BeautifulSoup

Makes the given element(s) the immediate successor of this one. The elements will have the same parent, and the given elements will be immediately after this one. :param args: One or more PageElements.

def insert_before(self, *args): (source)
overridden in bs4.BeautifulSoup

Makes the given element(s) the immediate predecessor of this one. All the elements will have the same parent, and the given elements will be immediately before this one. :param args: One or more PageElements.

def nextGenerator(self): (source)

Undocumented

def nextSiblingGenerator(self): (source)

Undocumented

def parentGenerator(self): (source)

Undocumented

def previousGenerator(self): (source)

Undocumented

def previousSiblingGenerator(self): (source)

Undocumented

def replace_with(self, *args): (source)

Replace this PageElement with one or more PageElements, keeping the rest of the tree the same. :param args: One or more PageElements. :return: `self`, no longer part of the tree.

def setup(self, parent=None, previous_element=None, next_element=None, previous_sibling=None, next_sibling=None): (source)

Sets up the initial relations between this element and other elements. :param parent: The parent of this element. :param previous_element: The element parsed immediately before this one. :param next_element: The element parsed immediately before this one. :param previous_sibling: The most recently encountered element on the same level of the parse tree as this one. :param previous_sibling: The next element to be encountered on the same level of the parse tree as this one.

def unwrap(self): (source)

Replace this PageElement with its contents. :return: `self`, no longer part of the tree.

def wrap(self, wrap_inside): (source)

Wrap this PageElement inside another one. :param wrap_inside: A PageElement. :return: `wrap_inside`, occupying the position in the tree that used to be occupied by `self`, and with `self` inside it.

Undocumented

nextSibling = (source)

Undocumented

previousSibling = (source)

Undocumented

Undocumented

next_element = (source)

Undocumented

next_sibling = (source)

Undocumented

Undocumented

previous_element = (source)

Undocumented

previous_sibling = (source)

Undocumented

Check whether a PageElement has been decomposed. :rtype: bool

The PageElement, if any, that was parsed just after this one. :return: A PageElement. :rtype: bs4.element.Tag | bs4.element.NavigableString

@property
next_elements = (source)

All PageElements that were parsed after this one. :yield: A sequence of PageElements.

@property
next_siblings = (source)

All PageElements that are siblings of this one but were parsed later. :yield: A sequence of PageElements.

All PageElements that are parents of this PageElement. :yield: A sequence of PageElements.

The PageElement, if any, that was parsed just before this one. :return: A PageElement. :rtype: bs4.element.Tag | bs4.element.NavigableString

@property
previous_elements = (source)

All PageElements that were parsed before this one. :yield: A sequence of PageElements.

@property
previous_siblings = (source)

All PageElements that are siblings of this one but were parsed earlier. :yield: A sequence of PageElements.

@property
stripped_strings = (source)

Yield all strings in this PageElement, stripping them first. :yield: A sequence of stripped strings.

def _all_strings(self, strip=False, types=default): (source)

Yield all strings of certain classes, possibly stripping them. This is implemented differently in Tag and NavigableString.

def _find_all(self, name, attrs, string, limit, generator, **kwargs): (source)

Iterates over a generator looking for things that match.

def _find_one(self, method, name, attrs, string, **kwargs): (source)

Undocumented

def _last_descendant(self, is_initialized=True, accept_self=True): (source)

Finds the last element beneath this object to be parsed. :param is_initialized: Has `setup` been called on this PageElement yet? :param accept_self: Is `self` an acceptable answer to the question?

Is this element part of an XML tree or an HTML tree? This is used in formatter_for_name, when deciding whether an XMLFormatter or HTMLFormatter is more appropriate. It can be inefficient, but it should be called very rarely.