https://github.com/buihdk/scrapy-books

A demo of scraping book data from the website https://books.toscrape.com using Scrapy
https://github.com/buihdk/scrapy-books

ipython scrapy scrapy-spider

Last synced: 3 months ago
JSON representation

A demo of scraping book data from the website https://books.toscrape.com using Scrapy

Host: GitHub
URL: https://github.com/buihdk/scrapy-books
Owner: buihdk
Created: 2023-05-16T06:50:55.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-05-18T04:52:28.000Z (about 2 years ago)
Last Synced: 2025-01-19T17:09:58.219Z (5 months ago)
Topics: ipython, scrapy, scrapy-spider
Language: Python
Homepage:
Size: 24.4 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

This python tool is used to crawl book data from the website https://books.toscrape.com

## Steps to crawl
- run `python -m venv venv` at root to create a virtual environment
- run `source venv/bin/activate` at root to activate the newly created virtual environment
- run `pip install -r requirements.txt` to install all the required modules for this python project
- run `scrapy crawl bookspider` inside web-scrapy/bookscraper/bookscraper to start crawling
### A few useful commands
- run `scrapy startproject bookscraper` inside web-scrapy to initiate a bookscraper project
- run `scrapy genspider bookspider books.toscrape.com` inside bookscraper/spiders to generate a spider bookspider
- run `pip3 install ipython`
- add `shell = ipython` in scrapy.cfg
- run `scrapy shell` for testing scrapy commands
- run `scrapy crawl bookspider -o bookdata.csv` to craw and output data to bookdata.csv
- run `scrapy crawl bookspider -o bookdata.json` to craw and output data to bookdata.json

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/buihdk/scrapy-books

Awesome Lists containing this project

README