An open API service indexing awesome lists of open source software.

https://github.com/stef/urlclean

module that resolves and cleans up urls
https://github.com/stef/urlclean

Last synced: about 1 year ago
JSON representation

module that resolves and cleans up urls

Awesome Lists containing this project

README

          

Welcome to urlclean's documentation!
************************************

urlclean provides functions:

* to follow a http redirect,

* to follow a HTML META redirect,

* to remove Urchin and Facebook tracker URL parameters,

* plugins for futher cleaning power,

* combines all these to unshorten and resolve various URLS

Try it out from the commandline:

python -m urlclean

Documentation
=============

urlcleaner a module that resolves redirected urls and removes tracking
url params

urlclean.weedparams(url)

removes Urchin Tracker and Facebook surveillance params from urls.

Args:

url (str): The url to scrub

Returns:

(str). The return cleaned url

urlclean.httpresolve(url, ua=None, proxyhost='', proxyport='')

resolve one redirection of a http request.

Args:

url (str): The url to follow one redirect

ua (fn): A function returning a User Agent string (optional)

proxyhost (str): http proxy server (optional)

proxyport (int): http proxy server port (optional)

Returns: (str, http.client.response). The return resolved url, and
the response from the http query

urlclean.unmeta(url, res)

Finds any meta redirects a http.client.response object that has
text/html as content-type.

Args:

url (str): The url to follow one redirect

res (http.client.response): a http response object

Returns: (str). The return resolved url

urlclean.unshorten(url, cache=None, ua=None, >>**<>**<