class XMLFeedSpider(Spider): (source)
This class intends to be the base class for spiders that scrape from XML feeds. You can choose whether to parse the file using the 'iternodes' iterator, an 'xml' selector, or an 'html' selector. In most cases, it's convenient to use iternodes, since it's a faster and cleaner.
Method | adapt |
You can override this function in order to make any changes you want to into the feed before parsing it. This function must return a response. |
Method | parse |
This method must be overridden with your custom spider functionality |
Method | parse |
This method is called for the nodes matching the provided tag name (itertag). Receives the response and an Selector for each node. Overriding this method is mandatory. Otherwise, you spider won't work. ... |
Method | process |
This overridable method is called for each result (item or request) returned by the spider, and it's intended to perform any last time processing required before returning the results to the framework core, for example setting the item GUIDs... |
Class Variable | iterator |
Undocumented |
Class Variable | itertag |
Undocumented |
Class Variable | namespaces |
Undocumented |
Method | _iternodes |
Undocumented |
Method | _parse |
Undocumented |
Method | _register |
Undocumented |
Inherited from Spider
:
Class Method | from |
Undocumented |
Class Method | handles |
Undocumented |
Class Method | update |
Undocumented |
Static Method | close |
Undocumented |
Method | __init__ |
Undocumented |
Method | __repr__ |
Undocumented |
Method | log |
Log the given message at the given log level |
Method | parse |
Undocumented |
Method | start |
Undocumented |
Class Variable | custom |
Undocumented |
Instance Variable | crawler |
Undocumented |
Instance Variable | name |
Undocumented |
Instance Variable | settings |
Undocumented |
Instance Variable | start |
Undocumented |
Property | logger |
Undocumented |
Method | _set |
Undocumented |
Inherited from object_ref
(via Spider
):
Method | __new__ |
Undocumented |
Class Variable | __slots__ |
Undocumented |
You can override this function in order to make any changes you want to into the feed before parsing it. This function must return a response.
This method is called for the nodes matching the provided tag name (itertag). Receives the response and an Selector for each node. Overriding this method is mandatory. Otherwise, you spider won't work. This method must return either an item, a request, or a list containing any of them.
This overridable method is called for each result (item or request) returned by the spider, and it's intended to perform any last time processing required before returning the results to the framework core, for example setting the item GUIDs. It receives a list of results and the response which originated that results. It must return a list of results (items or requests).