https://github.com/chfoo/furlat
Find URL Archiving Tool. Furlat is a tool and library that discovers URL shortcodes generated by URL shorteners.
https://github.com/chfoo/furlat
Last synced: 4 months ago
JSON representation
Find URL Archiving Tool. Furlat is a tool and library that discovers URL shortcodes generated by URL shorteners.
- Host: GitHub
- URL: https://github.com/chfoo/furlat
- Owner: chfoo
- License: gpl-3.0
- Created: 2013-07-28T06:54:11.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2013-10-19T23:24:29.000Z (over 12 years ago)
- Last Synced: 2023-03-23T04:57:44.277Z (almost 3 years ago)
- Language: Python
- Size: 180 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- License: COPYING.txt
Awesome Lists containing this project
README
===============================
FURLAT: Find URL Archiving Tool
===============================
Furlat is a tool and library that discovers and analyzes URL shortcodes generated by URL shorteners.
Quick Start
===========
Installation
++++++++++++
You will need:
* Python 3.2 or greater
* Firefox
* Selenium (Python 3 Package)
You can install the dependent Python packages using ``pip``. For example on Ubuntu::
pip3 install selenium
Running
+++++++
You can run the package as a script::
python3 -m furlat find bit.ly --verbose
To just search Twitter::
python3 -m furlat find bit.ly --verbose --source twitter
Use the ``--help`` to see details about arguments.
Results are currently stored into a text file. For example, if you run bit.ly, a folder called ``bitly`` will be created with the text files inside the folder. The text files contain the discovered URLs.
Infinitely running commands check for a sentinel file called ``STOP``. If the modified file is newly modified or created after starting the command, the command will stop gracefully::
touch STOP
Commands
--------
analyze
Print statistics about the URL shortcodes
find
Launch a find URL project
sort
Sort the URLs by length, then value
Library
+++++++
The library is not yet stable as an API, but you can read the ``__main__.py`` file to get a overview of how it works.
About
=====
The goal of Furlat is to find valid shortcodes as much as possible, without brute-force discovery, using 3rd party sources such as search engines and microblogs.
Links
+++++
* Homepage: https://github.com/chfoo/furlat
.. * Questions?: https://answers.launchpad.net/furlat
.. * Bugs?: https://github.com/chfoo/furlat/issues
.. * PyPI: https://pypi.python.org/pypi/furlat/
* Chat: irc://irc.efnet.org/archiveteam-bs (I'll be on #archiveteam-bs on EFnet)
Testing
+++++++
The unit tests can be run with ``nosetests``::
nosetests3
Roadmap
+++++++
This software is currently in **experimental-but-could-be-useful** state.
What's Available
----------------
* Launching a real web browser.
* Searching through Google, Yahoo, Bing, and Twitter.
* Random keyword search term generation using word lists and MediaWiki page title dump files.
What's To-Do
------------
* Searching Identica
* Nicer result output options
* Configurable options such as fetch rate and number of jobs run concurrently
* Travis CI setup
* PyPI and other websites setup
* Inline documentation
* Launching a fake web browser.
See also
--------
* https://github.com/chfoo/rdai
* https://github.com/chfoo/cloaked-octo-nemesis