class documentation

class XMLFeedSpider(Spider): (source)

View In Hierarchy

This class intends to be the base class for spiders that scrape from XML feeds. You can choose whether to parse the file using the 'iternodes' iterator, an 'xml' selector, or an 'html' selector. In most cases, it's convenient to use iternodes, since it's a faster and cleaner.

Method adapt_response You can override this function in order to make any changes you want to into the feed before parsing it. This function must return a response.
Method parse_node This method must be overridden with your custom spider functionality
Method parse_nodes This method is called for the nodes matching the provided tag name (itertag). Receives the response and an Selector for each node. Overriding this method is mandatory. Otherwise, you spider won't work. ...
Method process_results This overridable method is called for each result (item or request) returned by the spider, and it's intended to perform any last time processing required before returning the results to the framework core, for example setting the item GUIDs...
Class Variable iterator Undocumented
Class Variable itertag Undocumented
Class Variable namespaces Undocumented
Method _iternodes Undocumented
Method _parse Undocumented
Method _register_namespaces Undocumented

Inherited from Spider:

Class Method from_crawler Undocumented
Class Method handles_request Undocumented
Class Method update_settings Undocumented
Static Method close Undocumented
Method __init__ Undocumented
Method __repr__ Undocumented
Method log Log the given message at the given log level
Method parse Undocumented
Method start_requests Undocumented
Class Variable custom_settings Undocumented
Instance Variable crawler Undocumented
Instance Variable name Undocumented
Instance Variable settings Undocumented
Instance Variable start_urls Undocumented
Property logger Undocumented
Method _set_crawler Undocumented

Inherited from object_ref (via Spider):

Method __new__ Undocumented
Class Variable __slots__ Undocumented
def adapt_response(self, response): (source)

You can override this function in order to make any changes you want to into the feed before parsing it. This function must return a response.

def parse_node(self, response, selector): (source)

This method must be overridden with your custom spider functionality

def parse_nodes(self, response, nodes): (source)

This method is called for the nodes matching the provided tag name (itertag). Receives the response and an Selector for each node. Overriding this method is mandatory. Otherwise, you spider won't work. This method must return either an item, a request, or a list containing any of them.

def process_results(self, response, results): (source)

This overridable method is called for each result (item or request) returned by the spider, and it's intended to perform any last time processing required before returning the results to the framework core, for example setting the item GUIDs. It receives a list of results and the response which originated that results. It must return a list of results (items or requests).

iterator: str = (source)

Undocumented

Undocumented

namespaces: tuple = (source)

Undocumented

def _iternodes(self, response): (source)

Undocumented

def _parse(self, response, **kwargs): (source)

Undocumented

def _register_namespaces(self, selector): (source)

Undocumented