An open API service indexing awesome lists of open source software.

https://github.com/chfoo/furlat

Find URL Archiving Tool. Furlat is a tool and library that discovers URL shortcodes generated by URL shorteners.
https://github.com/chfoo/furlat

Last synced: 4 months ago
JSON representation

Find URL Archiving Tool. Furlat is a tool and library that discovers URL shortcodes generated by URL shorteners.

Awesome Lists containing this project

README

          

===============================
FURLAT: Find URL Archiving Tool
===============================

Furlat is a tool and library that discovers and analyzes URL shortcodes generated by URL shorteners.

Quick Start
===========

Installation
++++++++++++

You will need:

* Python 3.2 or greater
* Firefox
* Selenium (Python 3 Package)

You can install the dependent Python packages using ``pip``. For example on Ubuntu::

pip3 install selenium

Running
+++++++

You can run the package as a script::

python3 -m furlat find bit.ly --verbose

To just search Twitter::

python3 -m furlat find bit.ly --verbose --source twitter

Use the ``--help`` to see details about arguments.

Results are currently stored into a text file. For example, if you run bit.ly, a folder called ``bitly`` will be created with the text files inside the folder. The text files contain the discovered URLs.

Infinitely running commands check for a sentinel file called ``STOP``. If the modified file is newly modified or created after starting the command, the command will stop gracefully::

touch STOP

Commands
--------

analyze
Print statistics about the URL shortcodes

find
Launch a find URL project

sort
Sort the URLs by length, then value

Library
+++++++

The library is not yet stable as an API, but you can read the ``__main__.py`` file to get a overview of how it works.

About
=====

The goal of Furlat is to find valid shortcodes as much as possible, without brute-force discovery, using 3rd party sources such as search engines and microblogs.

Links
+++++

* Homepage: https://github.com/chfoo/furlat

.. * Questions?: https://answers.launchpad.net/furlat

.. * Bugs?: https://github.com/chfoo/furlat/issues

.. * PyPI: https://pypi.python.org/pypi/furlat/

* Chat: irc://irc.efnet.org/archiveteam-bs (I'll be on #archiveteam-bs on EFnet)

Testing
+++++++

The unit tests can be run with ``nosetests``::

nosetests3

Roadmap
+++++++

This software is currently in **experimental-but-could-be-useful** state.

What's Available
----------------

* Launching a real web browser.
* Searching through Google, Yahoo, Bing, and Twitter.
* Random keyword search term generation using word lists and MediaWiki page title dump files.

What's To-Do
------------

* Searching Identica
* Nicer result output options
* Configurable options such as fetch rate and number of jobs run concurrently
* Travis CI setup
* PyPI and other websites setup
* Inline documentation
* Launching a fake web browser.

See also
--------

* https://github.com/chfoo/rdai
* https://github.com/chfoo/cloaked-octo-nemesis