scrapy.utils.request

module documentation

(source)

This module provides some useful functions for working with scrapy.http.Request objects

Class	`RequestFingerprinter`	Default fingerprinter.
Function	`fingerprint`	Return the request fingerprint.
Function	`referer_str`	Return Referer HTTP header suitable for logging.
Function	`request_authenticate`	Authenticate the given request (in place) using the HTTP basic access authentication mechanism (RFC 2617) and the given username and password
Function	`request_fingerprint`	Return the request fingerprint as an hexadecimal string.
Function	`request_from_dict`	Create a :class:`~scrapy.Request` object from a dict.
Function	`request_httprepr`	Return the raw HTTP representation (as bytes) of the given request. This is provided only for reference since it's not the actual stream of bytes that will be send when performing the request (that's controlled by Twisted).
Function	`_get_method`	Helper function for request_from_dict
Function	`_request_fingerprint_as_bytes`	Undocumented
Function	`_serialize_headers`	Undocumented
Variable	`_deprecated_fingerprint_cache`	Undocumented
Variable	`_fingerprint_cache`	Undocumented

def fingerprint(request: Request, *, include_headers: Optional[Iterable[Union[bytes, str]]] = None, keep_fragments: bool = False) -> bytes: (source) ¶

Return the request fingerprint. The request fingerprint is a hash that uniquely identifies the resource the request points to. For example, take the following two urls: http://www.example.com/query?id=111&cat=222 http://www.example.com/query?cat=222&id=111 Even though those are two different URLs both point to the same resource and are equivalent (i.e. they should return the same response). Another example are cookies used to store session ids. Suppose the following page is only accessible to authenticated users: http://www.example.com/members/offers.html Lots of sites use a cookie to store the session id, which adds a random component to the HTTP Request and thus should be ignored when calculating the fingerprint. For this reason, request headers are ignored by default when calculating the fingerprint. If you want to include specific headers use the include_headers argument, which is a list of Request headers to include. Also, servers usually ignore fragments in urls when handling requests, so they are also ignored by default when calculating the fingerprint. If you want to include them, set the keep_fragments argument to True (for instance when handling requests with a headless browser).

def referer_str(request: Request) -> Optional[str]: (source) ¶

Return Referer HTTP header suitable for logging.

def request_authenticate(request: Request, username: str, password: str): (source) ¶

Authenticate the given request (in place) using the HTTP basic access authentication mechanism (RFC 2617) and the given username and password

def request_fingerprint(request: Request, include_headers: Optional[Iterable[Union[bytes, str]]] = None, keep_fragments: bool = False) -> str: (source) ¶

Return the request fingerprint as an hexadecimal string. The request fingerprint is a hash that uniquely identifies the resource the request points to. For example, take the following two urls: http://www.example.com/query?id=111&cat=222 http://www.example.com/query?cat=222&id=111 Even though those are two different URLs both point to the same resource and are equivalent (i.e. they should return the same response). Another example are cookies used to store session ids. Suppose the following page is only accessible to authenticated users: http://www.example.com/members/offers.html Lots of sites use a cookie to store the session id, which adds a random component to the HTTP Request and thus should be ignored when calculating the fingerprint. For this reason, request headers are ignored by default when calculating the fingerprint. If you want to include specific headers use the include_headers argument, which is a list of Request headers to include. Also, servers usually ignore fragments in urls when handling requests, so they are also ignored by default when calculating the fingerprint. If you want to include them, set the keep_fragments argument to True (for instance when handling requests with a headless browser).

def request_from_dict(d: dict, *, spider: Optional[Spider] = None) -> Request: (source) ¶

Create a :class:`~scrapy.Request` object from a dict. If a spider is given, it will try to resolve the callbacks looking at the spider for methods with the same name.

def request_httprepr(request: Request) -> bytes: (source) ¶

Return the raw HTTP representation (as bytes) of the given request. This is provided only for reference since it's not the actual stream of bytes that will be send when performing the request (that's controlled by Twisted).

def _get_method(obj, name): (source) ¶

Helper function for request_from_dict

def _request_fingerprint_as_bytes(*args, **kwargs): (source) ¶

Undocumented

def _serialize_headers(headers, request): (source) ¶

Undocumented

_deprecated_fingerprint_cache = (source) ¶

Undocumented

_fingerprint_cache = (source) ¶

Undocumented