https://github.com/stef/urlclean
module that resolves and cleans up urls
https://github.com/stef/urlclean
Last synced: about 1 year ago
JSON representation
module that resolves and cleans up urls
- Host: GitHub
- URL: https://github.com/stef/urlclean
- Owner: stef
- Created: 2012-01-24T01:12:51.000Z (over 14 years ago)
- Default Branch: master
- Last Pushed: 2021-07-20T22:10:45.000Z (almost 5 years ago)
- Last Synced: 2025-04-16T00:11:27.276Z (about 1 year ago)
- Language: Python
- Homepage: http://pypi.python.org/pypi/urlclean/
- Size: 26.4 KB
- Stars: 22
- Watchers: 4
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
Awesome Lists containing this project
README
Welcome to urlclean's documentation!
************************************
urlclean provides functions:
* to follow a http redirect,
* to follow a HTML META redirect,
* to remove Urchin and Facebook tracker URL parameters,
* plugins for futher cleaning power,
* combines all these to unshorten and resolve various URLS
Try it out from the commandline:
python -m urlclean
Documentation
=============
urlcleaner a module that resolves redirected urls and removes tracking
url params
urlclean.weedparams(url)
removes Urchin Tracker and Facebook surveillance params from urls.
Args:
url (str): The url to scrub
Returns:
(str). The return cleaned url
urlclean.httpresolve(url, ua=None, proxyhost='', proxyport='')
resolve one redirection of a http request.
Args:
url (str): The url to follow one redirect
ua (fn): A function returning a User Agent string (optional)
proxyhost (str): http proxy server (optional)
proxyport (int): http proxy server port (optional)
Returns: (str, http.client.response). The return resolved url, and
the response from the http query
urlclean.unmeta(url, res)
Finds any meta redirects a http.client.response object that has
text/html as content-type.
Args:
url (str): The url to follow one redirect
res (http.client.response): a http response object
Returns: (str). The return resolved url
urlclean.unshorten(url, cache=None, ua=None, >>**<>**<