Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/zachgoldstein/scrapy-statsd

Statsd integration middleware for scrapy
https://github.com/zachgoldstein/scrapy-statsd

Last synced: 2 months ago
JSON representation

Statsd integration middleware for scrapy

Awesome Lists containing this project

README

        

# Scrapy Statsd Middleware

## Usage

```
pip install scrapy-statsd-middleware
```

```
DOWNLOADER_MIDDLEWARES = {
'statsd_middleware.StatsdMiddleware': 543,
}

SPIDER_MIDDLEWARES = {
'statsd_middleware.StatsdMiddleware': 543,
}
```

There's also a few settings that you can use:

- `STATSD_HOSTNAME` - Defaults to the current machine's hostname
- `STATSD_PREFIX` - Defaults to "hostname.spider-name."
- `STATSD_HOST_IP` - Defaults to "0.0.0.0"

This will increment `statsd` with the following:
- requests (`spider_reqs_issued`)
- response (`spider_resps_received`)
- errors (`error_KeyError`, where KeyError is whatever the error name is)
- items processed (`processed_product`, where Product is whatever the item class name is)

## Example Implementation

An example implementation of this middleware is in /example.
It includes a docker-compose file that describes how to use this middleware with statsd & graphite

## Example Installation & Usage

- Build the docker images: `docker-compose build`
- Start the statsd container: `docker-compose up -d`
- Run the example spider: `docker-compose -f ./example/docker-compose.yml run spider bash -c "cd ./opt/scrapy/dirbot/ && scrapy crawl dmoz"`

You can see a live graphite dashboard at You should see stats show up under something like "stats.Z-MacBook-Pro.local.dmoz.spider_reqs_issued"

## Development
You can run the tests via make test