Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/farukalamai/yelp-scraper-scrapy-python
Yelp Restaurant data scraping using python, scrapy spider
https://github.com/farukalamai/yelp-scraper-scrapy-python
ai-bot data-extraction data-mining data-scraper data-scraping python python-scraper scrapy scrapy-crawler scrapy-spider web-scraper web-scraping web-scraping-python web-scraping-software yelp yelp-api yelp-restaurants yelp-resturant-data-scraping yelp-scraper
Last synced: 4 days ago
JSON representation
Yelp Restaurant data scraping using python, scrapy spider
- Host: GitHub
- URL: https://github.com/farukalamai/yelp-scraper-scrapy-python
- Owner: farukalamai
- License: mit
- Created: 2023-07-09T04:58:24.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-07-09T16:49:33.000Z (over 1 year ago)
- Last Synced: 2024-11-07T20:20:01.317Z (about 2 months ago)
- Topics: ai-bot, data-extraction, data-mining, data-scraper, data-scraping, python, python-scraper, scrapy, scrapy-crawler, scrapy-spider, web-scraper, web-scraping, web-scraping-python, web-scraping-software, yelp, yelp-api, yelp-restaurants, yelp-resturant-data-scraping, yelp-scraper
- Language: Python
- Homepage: https://www.linkedin.com/in/farukalamai/
- Size: 23.4 KB
- Stars: 3
- Watchers: 2
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Yelp Restaurant data scraping using python, scrapy spider
![Top-10-Best-Restaurants-in-San-Francisco-CA-July-2023-Yelp](https://github.com/farukalampro/yelp-webscraper-using-scrapy-python-spider/assets/92469073/e3b0e25f-d55b-44b5-b496-828832240397)## Deployment
#### 1. Clone Repository
```bash
git clone https://github.com/farukalampro/yelp-webscraper-using-scrapy-python.git
```
```bash
cd yelp-webscraper-using-scrapy-python
```
#### 2. Create Virtual Environment
```bash
python -m venv env
```
- For Windows:
```bash
.\env\Scripts\activate
```
- For macOS/Linux:
```bash
source env/bin/activate
```#### 3. To install required packages
```bash
pip install -r requirements.txt
```#### 4. Input your own link from yelp.com
- Go to the **data.py** file. Insert link from Yelp
- I have added one link in data.py as a sample. You can insert as many links as you want.
```bash
start_urls = [
# This is the sample URL
# Here you have to put your own search link
'https://www.yelp.com/search?find_desc=Restaurants&find_loc=San+Francisco%2C+CA'
]
```#### 5. Run the command in the terminal
```bash
scrapy crawl data -o sample_file.csv
```
- you can download the data in any format. I have given the format below
```bash
scrapy crawl "spider name" -o file_name.csv/json/xml
```
- Here we have scraped some restaurant data which is in the **Sample File** folder## Important Note
- As Yelp is continuously updating its website, so make sure you are updating **xpath**