https://github.com/scrapy/quotesbot
This is a sample Scrapy project for educational purposes
https://github.com/scrapy/quotesbot
Last synced: 10 months ago
JSON representation
This is a sample Scrapy project for educational purposes
- Host: GitHub
- URL: https://github.com/scrapy/quotesbot
- Owner: scrapy
- License: mit
- Created: 2016-09-27T13:55:40.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2023-11-29T22:31:31.000Z (over 2 years ago)
- Last Synced: 2025-04-08T02:38:36.224Z (11 months ago)
- Language: Python
- Homepage: http://doc.scrapy.org/en/latest/intro/tutorial.html
- Size: 5.86 KB
- Stars: 1,316
- Watchers: 70
- Forks: 783
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-security-collection - **813**星
README
# QuotesBot
This is a Scrapy project to scrape quotes from famous people from http://quotes.toscrape.com ([github repo](https://github.com/scrapinghub/spidyquotes)).
This project is only meant for educational purposes.
## Extracted data
This project extracts quotes, combined with the respective author names and tags.
The extracted data looks like this sample:
{
'author': 'Douglas Adams',
'text': '“I may not have gone where I intended to go, but I think I ...”',
'tags': ['life', 'navigation']
}
## Spiders
This project contains two spiders and you can list them using the `list`
command:
$ scrapy list
toscrape-css
toscrape-xpath
Both spiders extract the same data from the same website, but `toscrape-css`
employs CSS selectors, while `toscrape-xpath` employs XPath expressions.
You can learn more about the spiders by going through the
[Scrapy Tutorial](http://doc.scrapy.org/en/latest/intro/tutorial.html).
## Running the spiders
You can run a spider using the `scrapy crawl` command, such as:
$ scrapy crawl toscrape-css
If you want to save the scraped data to a file, you can pass the `-o` option:
$ scrapy crawl toscrape-css -o quotes.json