https://github.com/scrapy-plugins/scrapy-statsd
https://github.com/scrapy-plugins/scrapy-statsd
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/scrapy-plugins/scrapy-statsd
- Owner: scrapy-plugins
- License: bsd-3-clause
- Created: 2019-05-08T13:53:43.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2022-10-19T19:42:23.000Z (over 3 years ago)
- Last Synced: 2025-04-07T08:02:04.457Z (about 1 year ago)
- Language: Python
- Size: 42 KB
- Stars: 6
- Watchers: 5
- Forks: 4
- Open Issues: 4
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
scrapy-statsd-extension
=======================
The scrapy-statsd-extension is a scrapy extension to send scrapy stats
to statsd host.
Naming
------
Scrapy stats names will be mapped to names supported by statsd. Some
examples:
::
downloader/request_bytes -> downloader.request_bytes
downloader/request_method_count/GET -> downloader.request_method_count.GET
finish_reason -> finish_reason
robotstxt/exception_count/ -> robotstxt.exception_count.class_PermissionError
Installation
------------
Install scrapy-statsd-extension using ``pip``:
::
$ pip install scrapy-statsd-extension
Configuration
-------------
First, you need to include the extension to your ``EXTENSIONS`` dict in
``settings.py``, like so:
::
STATSD_ENABLED = True
EXTENSIONS = {
...
'scrapy_statsd_extension.StatsdExtension': 123,
...
}
Settings
--------
To configure host settings, you can adjust the following:
::
STATSD_HOST = 'localhost'
STATSD_PORT = 8125
The default endpoint for sending stats will be:
::
http://localhost:8125
To enable the extension you must set:
::
STATSD_ENABLED = False
Periodic logging is enabled by default but you can disable it, in which
case the metrics will be logged only once, when a spider is closed:
::
STATSD_LOG_PERIODIC = True
By default, stats are logged every 5 seconds, you can adjust that using
``STATSD_LOG_EVERY`` which is the number of seconds between logging
operations:
::
STATSD_LOG_EVERY = 5
Set the desired prefix:
::
STATSD_PREFIX = 'scrapy'
You can specify which prefixes you want logged if you don't want all
scrapy stats. The default is an empty list which indicates that all
stats should be logged. You can, for example, log only downloader and
robotstxt exception stats by setting ``STATSD_LOG_ONLY`` to
``['downloader', 'robotstxt.exception_count']``.
::
STATSD_LOG_ONLY = []
You can also specify prefixes to ignore the same way using
``STATSD_IGNORE``:
::
STATSD_IGNORE = []
Tags
----
Certain platforms such as datadog and influxdb offer tagging options.
To enable tagging set ``STATSD_TAGGING`` to ``True``, it is disabled by
default:
::
STATSD_TAGGING = False
Then, you can set custom tags using ``STATSD_TAGS``. Currently, only
``spider_name_tag`` is supported and setting it to True will add the spider's
as a tag on all metrics:
::
STATSD_TAGS = {
'spider_name_tag': True
}
You can also set custom tags by setting ``statsd_tags`` attribute on each
spider. This must be a dictionary containing tag names as keys and tag values as
dictionary values.