Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/stummjr/scrapy-fieldstats
A Scrapy extension to log items coverage when the spider shuts down
https://github.com/stummjr/scrapy-fieldstats
crawling extension scraping scrapy scrapy-extension scrapy-plugin
Last synced: about 1 month ago
JSON representation
A Scrapy extension to log items coverage when the spider shuts down
- Host: GitHub
- URL: https://github.com/stummjr/scrapy-fieldstats
- Owner: stummjr
- License: mit
- Created: 2017-10-11T01:34:41.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2020-04-11T15:31:53.000Z (over 4 years ago)
- Last Synced: 2024-11-01T08:37:23.826Z (about 1 month ago)
- Topics: crawling, extension, scraping, scrapy, scrapy-extension, scrapy-plugin
- Language: Python
- Size: 42 KB
- Stars: 18
- Watchers: 5
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-scrapy - scrapy-fieldstats
README
Scrapy FieldStats
=================![](https://github.com/stummjr/scrapy-fieldstats/workflows/CI/badge.svg)
[![Downloads](https://pepy.tech/badge/scrapy-fieldstats)](https://pepy.tech/project/scrapy-fieldstats)A Scrapy extension that generates a summary of fields coverage from your scraped data.
## What?
Upon finishing a job, Scrapy prints some useful stats about that job, such as: number of requests, responses, scraped items, etc.However, it's often useful to have an overview of the field coverage in such scraped items. Let's say you want to know the percentage of items missing the `price` field. That's when this extension comes into play!
Check out an example:
```bash
$ scrapy crawl example
2017-10-12 11:10:10 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: examplebot)
...
2017-10-12 11:10:20 [scrapy_fieldstats.fieldstats] INFO: Field stats:
{
'author': {
'name': '100.0%',
'age': '52.0%'
},
'image': '97.0%',
'title': '100.0%',
'price': '92.0%',
'stars': '47.5%'
}
2017-10-12 11:10:20 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
...
```## Installation
First, pip install this package:```bash
$ pip install scrapy-fieldstats
```## Usage
Enable the extension in your project's `settings.py` file, by adding the following lines:```python
EXTENSIONS = {
'scrapy_fieldstats.fieldstats.FieldStatsExtension': 10,
}
FIELDSTATS_ENABLED = True
```
That's all! Now run your job and have a look at the field stats.## Settings
The settings below can be defined as any other Scrapy settings, as described on [Scrapy docs](https://doc.scrapy.org/en/latest/topics/settings.html#populating-the-settings).* `FIELDSTATS_ENABLED`: to enable/disable the extension.
* `FIELDSTATS_COUNTS_ONLY`: when `True`, the extension will output absolute counts, instead of percentages.
* `FIELDSTATS_SKIP_NONE`: when `True`, `None` values won't be counted as existing values for fields.
* `FIELDSTATS_ADD_TO_STATS`: when `True`, the extension will add the field coverage report to the job stats.## Contributing
If you spot a bug, or want to propose a new feature please create an issue in this project's
[issue tracker](https://github.com/stummjr/scrapy-fieldstats/issues).