An open API service indexing awesome lists of open source software.

https://github.com/dnephin/threaded-crawler


https://github.com/dnephin/threaded-crawler

Last synced: about 2 months ago
JSON representation

Awesome Lists containing this project

README

        

Threaded Crawler

This web crawler is designed to be a generic and highly configurable crawler, that
can quickly traverse sites, and pull content based on regex and other selection criteria.

__Requirements__

Uses BeatifulSoup to parse html pages (http://www.crummy.com/software/BeautifulSoup/)
Uses epydoc for documentation
Uses JobSite common package

python-psycopg2 2.0.8

__Development__

The 'cmd' script can be used to clean and build docs.
Documentation is in doc/API.

__INSTALL__

python setup.py install

__Running__

$COMMON environment variable should be set to the path for common/patterns.py
lib, or the lib should be installed on the default python path.