Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/buihdk/scrapy-books
A demo of scraping book data from the website https://books.toscrape.com using Scrapy
https://github.com/buihdk/scrapy-books
ipython scrapy scrapy-spider
Last synced: about 1 month ago
JSON representation
A demo of scraping book data from the website https://books.toscrape.com using Scrapy
- Host: GitHub
- URL: https://github.com/buihdk/scrapy-books
- Owner: buihdk
- Created: 2023-05-16T06:50:55.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-05-18T04:52:28.000Z (over 1 year ago)
- Last Synced: 2024-10-13T11:21:59.017Z (2 months ago)
- Topics: ipython, scrapy, scrapy-spider
- Language: Python
- Homepage:
- Size: 24.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
This python tool is used to crawl book data from the website https://books.toscrape.com
## Steps to crawl
- run `python -m venv venv` at root to create a virtual environment
- run `source venv/bin/activate` at root to activate the newly created virtual environment
- run `pip install -r requirements.txt` to install all the required modules for this python project
- run `scrapy crawl bookspider` inside web-scrapy/bookscraper/bookscraper to start crawling
### A few useful commands
- run `scrapy startproject bookscraper` inside web-scrapy to initiate a bookscraper project
- run `scrapy genspider bookspider books.toscrape.com` inside bookscraper/spiders to generate a spider bookspider
- run `pip3 install ipython`
- add `shell = ipython` in scrapy.cfg
- run `scrapy shell` for testing scrapy commands
- run `scrapy crawl bookspider -o bookdata.csv` to craw and output data to bookdata.csv
- run `scrapy crawl bookspider -o bookdata.json` to craw and output data to bookdata.json