module documentation

Helper functions which don't fit anywhere else

Function arg_to_iter Convert an argument to an iterable. The argument can be a None, single value, or an iterable.
Function create_instance Construct a class instance using its ``from_crawler`` or ``from_settings`` constructors, if available.
Function extract_regex Extract a list of unicode strings from the given text/encoding using the following policies:
Function is_generator_with_return_value Returns True if a callable is a generator function which includes a 'return' statement with a value different than None, False otherwise
Function load_object Load an object given its absolute object path, and return it.
Function md5sum Calculate the md5 checksum of a file-like object without reading its whole content in memory.
Function rel_has_nofollow Return True if link rel attribute has nofollow type
Function set_environ Temporarily set environment variables inside the context manager and fully restore previous environment afterwards
Function walk_callable Similar to ``ast.walk``, but walks only function body and skips nested functions defined within the node.
Function walk_modules Loads a module and all its submodules from the given module path and returns them. If *any* module throws an exception while importing, that exception is thrown back.
Function warn_on_generator_with_return_value Logs a warning if a callable is a generator function and includes a 'return' statement with a value different than None
Constant _ITERABLE_SINGLE_VALUES Undocumented
Variable _generator_callbacks_cache Undocumented
def arg_to_iter(arg): (source)

Convert an argument to an iterable. The argument can be a None, single value, or an iterable. Exception: if arg is a dict, [arg] will be returned

def create_instance(objcls, settings, crawler, *args, **kwargs): (source)

Construct a class instance using its ``from_crawler`` or ``from_settings`` constructors, if available. At least one of ``settings`` and ``crawler`` needs to be different from ``None``. If ``settings `` is ``None``, ``crawler.settings`` will be used. If ``crawler`` is ``None``, only the ``from_settings`` constructor will be tried. ``*args`` and ``**kwargs`` are forwarded to the constructors. Raises ``ValueError`` if both ``settings`` and ``crawler`` are ``None``. .. versionchanged:: 2.2 Raises ``TypeError`` if the resulting instance is ``None`` (e.g. if an extension has not been implemented correctly).

def extract_regex(regex, text, encoding='utf-8'): (source)

Extract a list of unicode strings from the given text/encoding using the following policies: * if the regex contains a named group called "extract" that will be returned * if the regex contains multiple numbered groups, all those will be returned (flattened) * if the regex doesn't contain any group the entire regex matching is returned

def is_generator_with_return_value(callable): (source)

Returns True if a callable is a generator function which includes a 'return' statement with a value different than None, False otherwise

def load_object(path): (source)

Load an object given its absolute object path, and return it. The object can be the import path of a class, function, variable or an instance, e.g. 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware'. If ``path`` is not a string, but is a callable object, such as a class or a function, then return it as is.

def md5sum(file): (source)

Calculate the md5 checksum of a file-like object without reading its whole content in memory. >>> from io import BytesIO >>> md5sum(BytesIO(b'file content to hash')) '784406af91dd5a54fbb9c84c2236595a'

def rel_has_nofollow(rel): (source)

Return True if link rel attribute has nofollow type

@contextmanager
def set_environ(**kwargs): (source)

Temporarily set environment variables inside the context manager and fully restore previous environment afterwards

def walk_callable(node): (source)

Similar to ``ast.walk``, but walks only function body and skips nested functions defined within the node.

def walk_modules(path): (source)

Loads a module and all its submodules from the given module path and returns them. If *any* module throws an exception while importing, that exception is thrown back. For example: walk_modules('scrapy.utils')

def warn_on_generator_with_return_value(spider, callable): (source)

Logs a warning if a callable is a generator function and includes a 'return' statement with a value different than None

_ITERABLE_SINGLE_VALUES = (source)

Undocumented

Value
(dict, Item, str, bytes)
_generator_callbacks_cache = (source)

Undocumented