https://github.com/addono/bookbe-spider
Spider for book titles at book.be.
https://github.com/addono/bookbe-spider
Last synced: 7 months ago
JSON representation
Spider for book titles at book.be.
- Host: GitHub
- URL: https://github.com/addono/bookbe-spider
- Owner: Addono
- Created: 2019-12-23T12:45:42.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-05-14T22:16:33.000Z (about 2 years ago)
- Last Synced: 2024-12-27T00:27:29.489Z (over 1 year ago)
- Language: Python
- Homepage: https://gist.github.com/Addono/de7b0633d7faa1da3aeaf1f43985b163
- Size: 4.88 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Book.be Scraper
## About
Collects book titles from [book.be](https://book.be). See [this](https://gist.github.com/Addono/de7b0633d7faa1da3aeaf1f43985b163) example of a scrape getting the titles of all books released in 2019.
## Installation
Create a virtual Python 3 environment.
```bash
virtualenv venv
```
Enable the environment.
```bash
source venv/bin/activate
```
Install all requirements.
```bash
pip install -r requirements.txt
```
## Usage
Run the spider.
```bash
scrapy runspider spider.py
```
Or if you want to save the output gathered by the spider, e.g. as CSV:
```bash
scrapy runspider spider.py --output=res.csv -t csv
```
If you want to filter the collected titles, then change `URL` in `spider.py` to include the desired filters.