https://github.com/sdiehl/pycraig
Python library for scraping data from Craigslist
https://github.com/sdiehl/pycraig
Last synced: 3 months ago
JSON representation
Python library for scraping data from Craigslist
- Host: GitHub
- URL: https://github.com/sdiehl/pycraig
- Owner: sdiehl
- License: mit
- Created: 2010-11-08T23:27:50.000Z (almost 15 years ago)
- Default Branch: master
- Last Pushed: 2011-08-26T12:42:17.000Z (about 14 years ago)
- Last Synced: 2025-04-05T19:41:31.450Z (6 months ago)
- Language: C
- Homepage:
- Size: 724 KB
- Stars: 6
- Watchers: 1
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
PyCraig
=========Written by Stephen Diehl .
PyCraig is a python library for scraping small amounts of data
off of Craigslist.PyCraig is for personal use only, too many requests to craigslist
will get your ip address banned.All code is released under a MIT license, see LICENSE for details.
Dependencies
============PyCraig depends on BeautifulSoup, you can install it with
pip install BeautifulSoup
It also uses GNU Curl for grabbing web pages. If you are running
Linux, BSD, or OS X you probably have this installed.jellyfish ( https://github.com/sunlightlabs/jellyfish ) is
optionally included for doing approximate string matching. It
is written in C and is very fast.To use jellyfish as a local module use:
cd pycraig/jellyfish
makeOr install globally with:
python pycraig/jellyfish/setup.py install
Example
=======>>> from pycraig import *
# Get 3 page of listings for "cars & trucks" for sale "by owner"
# in the "San Franciso Bay" area
>>> listings = get_listings(url='sfbay.craigslist.org',
cat='cars & trucks - by owner',
pages=3)
# Create table with our car listings
>>> cars = Table()
>>> extract_rows(listings, cars)
# Show all hondas under $15,000
>>> for car in cars:
if car.price < 15000 and 'honda' in car.desc:
print car.link, car.desc