module documentation

This module contains general purpose URL functions not found in the standard library. Some of the functions that used to be imported from this module have been moved to the w3lib.url module. Always import those from there instead.

Function add_http_if_no_scheme Add http as the default scheme if it is missing from the url.
Function escape_ajax Return the crawlable url according to: https://developers.google.com/webmasters/ajax-crawling/docs/getting-started
Function guess_scheme Add an URL scheme if missing: file:// for filepath-like input or http:// otherwise.
Function parse_url Return urlparsed url from the given argument (which could be an already parsed url)
Function strip_url Strip URL string from some of its components:
Function url_has_any_extension Return True if the url ends with one of the extensions provided
Function url_is_from_any_domain Return True if the url belongs to any of the given domains
Function url_is_from_spider Return True if the url belongs to the given spider
Function _is_filesystem_path Undocumented
Function _is_posix_path Undocumented
Function _is_windows_path Undocumented
def add_http_if_no_scheme(url): (source)

Add http as the default scheme if it is missing from the url.

def escape_ajax(url): (source)

Return the crawlable url according to: https://developers.google.com/webmasters/ajax-crawling/docs/getting-started >>> escape_ajax("www.example.com/ajax.html#!key=value") 'www.example.com/ajax.html?_escaped_fragment_=key%3Dvalue' >>> escape_ajax("www.example.com/ajax.html?k1=v1&k2=v2#!key=value") 'www.example.com/ajax.html?k1=v1&k2=v2&_escaped_fragment_=key%3Dvalue' >>> escape_ajax("www.example.com/ajax.html?#!key=value") 'www.example.com/ajax.html?_escaped_fragment_=key%3Dvalue' >>> escape_ajax("www.example.com/ajax.html#!") 'www.example.com/ajax.html?_escaped_fragment_=' URLs that are not "AJAX crawlable" (according to Google) returned as-is: >>> escape_ajax("www.example.com/ajax.html#key=value") 'www.example.com/ajax.html#key=value' >>> escape_ajax("www.example.com/ajax.html#") 'www.example.com/ajax.html#' >>> escape_ajax("www.example.com/ajax.html") 'www.example.com/ajax.html'

def guess_scheme(url): (source)

Add an URL scheme if missing: file:// for filepath-like input or http:// otherwise.

def parse_url(url, encoding=None): (source)

Return urlparsed url from the given argument (which could be an already parsed url)

def strip_url(url, strip_credentials=True, strip_default_port=True, origin_only=False, strip_fragment=True): (source)

Strip URL string from some of its components: - ``strip_credentials`` removes "user:password@" - ``strip_default_port`` removes ":80" (resp. ":443", ":21") from http:// (resp. https://, ftp://) URLs - ``origin_only`` replaces path component with "/", also dropping query and fragment components ; it also strips credentials - ``strip_fragment`` drops any #fragment component

def url_has_any_extension(url, extensions): (source)

Return True if the url ends with one of the extensions provided

def url_is_from_any_domain(url, domains): (source)

Return True if the url belongs to any of the given domains

def url_is_from_spider(url, spider): (source)

Return True if the url belongs to the given spider

def _is_filesystem_path(string): (source)

Undocumented

def _is_posix_path(string): (source)

Undocumented

def _is_windows_path(string): (source)

Undocumented