https://github.com/codelucas/yelpcrawl
Crawl and scrape Yelp's restaurant data for every zip code in the United States (or a specified zipcode). Yelp Crawler.
https://github.com/codelucas/yelpcrawl
Last synced: 9 months ago
JSON representation
Crawl and scrape Yelp's restaurant data for every zip code in the United States (or a specified zipcode). Yelp Crawler.
- Host: GitHub
- URL: https://github.com/codelucas/yelpcrawl
- Owner: codelucas
- Created: 2014-01-03T00:08:42.000Z (about 12 years ago)
- Default Branch: master
- Last Pushed: 2017-05-12T04:49:17.000Z (over 8 years ago)
- Last Synced: 2025-03-27T08:45:07.432Z (10 months ago)
- Language: Python
- Homepage:
- Size: 200 KB
- Stars: 55
- Watchers: 10
- Forks: 41
- Open Issues: 2
-
Metadata Files:
- Readme: README.rst
Awesome Lists containing this project
README
YelpCrawl: Exhaustive Yelp! Scraper
===================================
Example usage for `yelp`_ extraction.
Extract all restaurant data from a specific zipcode.
::
$ python2.7 crawler.py -z 98029
===== Attempting extraction for zipcode < 98029 >=====
title: Issaquah Coffee Company
categories: Coffee & Tea
rating: 4.0 star rating
...
Extract all restaurant data from America (all American zipcodes).
::
$ python2.7 crawler.py
**We are attempting to extract all zipcodes in Amerrica!**
===== Attempting extraction for zipcode < 35004 >=====
title: Brasher Sam Tire & Auto Service Inc
categories: Tires
rating: 5.0 star rating
...
Installation:
-------------
::
$ git clone https://github.com/codelucas/yelpcrawl
$ cd yelpcrawl
$ pip install -r requirements.txt
And now you can begin!
::
$ python2.7 crawler.py -z 98029
Feel free to send in pull requests. We need some test cases please :)
This code was written when the two of us were still relatively new at python
so excuse the shittyness. This was open sourced just for keepsake, it's nothing
fancy and there are definitely better scraping solutions out there.
We used slower parsers like `beautifulsoup`_ and no multithreading
because `yelp`_ would've rate limited us anyways :)
By: `Lucas`_, `Mathew`_
.. _`yelp`: http://www.yelp.com
.. _`beautifulsoup`: http://www.crummy.com/software/BeautifulSoup/
.. _`Lucas`: http://codelucas.com
.. _`Mathew`: https://www.facebook.com/matsprehn