{"id":13530149,"url":"https://github.com/statsite/statsite","last_synced_at":"2026-03-14T13:38:31.247Z","repository":{"id":3166012,"uuid":"4196881","full_name":"statsite/statsite","owner":"statsite","description":"C implementation of statsd","archived":false,"fork":false,"pushed_at":"2021-06-11T12:37:55.000Z","size":3323,"stargazers_count":1817,"open_issues_count":27,"forks_count":242,"subscribers_count":74,"default_branch":"master","last_synced_at":"2024-08-02T07:11:16.527Z","etag":null,"topics":["aggregated-metrics","stats","statsd","statsite"],"latest_commit_sha":null,"homepage":"http://statsite.github.io/statsite/","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/statsite.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-05-01T21:42:31.000Z","updated_at":"2024-07-18T20:07:46.000Z","dependencies_parsed_at":"2022-09-06T21:21:23.926Z","dependency_job_id":null,"html_url":"https://github.com/statsite/statsite","commit_stats":null,"previous_names":["armon/statsite"],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statsite%2Fstatsite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statsite%2Fstatsite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statsite%2Fstatsite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statsite%2Fstatsite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/statsite","download_url":"https://codeload.github.com/statsite/statsite/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222748291,"owners_count":17031898,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aggregated-metrics","stats","statsd","statsite"],"created_at":"2024-08-01T07:00:44.703Z","updated_at":"2025-12-17T03:04:00.928Z","avatar_url":"https://github.com/statsite.png","language":"C","funding_links":[],"categories":["Collecting data into InfluxDB","C"],"sub_categories":["Projects"],"readme":"Statsite [![Build Status](https://travis-ci.org/statsite/statsite.png)](https://travis-ci.org/statsite/statsite)\n========\n\nStatsite is a metrics aggregation server. Statsite is based heavily\non Etsy's StatsD \u003chttps://github.com/etsy/statsd\u003e, and is wire compatible.\n\nFeatures\n--------\n\n* Multiple metric types\n  - Key / Value\n  - Gauges\n  - Counters\n  - Timers\n  - Sets\n* Efficient summary metrics for timer data:\n  - Mean\n  - Min/Max\n  - Standard deviation\n  - Median, Percentile 95, Percentile 99\n  - Histograms\n* Dynamic set implementation:\n  - Exactly counts for small sets\n  - HyperLogLog for large sets\n* Included sinks:\n  - Graphite\n  - InfluxDB\n  - Ganglia\n  - Librato\n  - CloudWatch\n  - OpenTSDB\n  - HTTP\n* Binary protocol\n* TCP, UDP, and STDIN\n* Fast\n\n\nArchitecture\n-------------\n\nStatsite is designed to be both highly performant,\nand very flexible. To achieve this, it implements the stats\ncollection and aggregation in pure C, using an event loop to be\nextremely fast. This allows it to handle hundreds of connections,\nand millions of metrics. After each flush interval expires,\nstatsite performs a fork/exec to start a new stream handler\ninvoking a specified application. Statsite then streams the\naggregated metrics over stdin to the application, which is\nfree to handle the metrics as it sees fit.\n\nThis allows statsite to aggregate metrics and then ship metrics\nto any number of sinks (Graphite, SQL databases, etc). There\nis an included Python script that ships metrics to graphite.\n\nStatsite tries to minimize memory usage by not\nstoring all the metrics that are received. Counter values are\naggregated as they are received, and timer values are stored\nand aggregated using the Cormode-Muthukrishnan algorithm from\n\"Effective Computation of Biased Quantiles over Data Streams\".\nThis means that the percentile values are not perfectly accurate,\nand are subject to a specifiable error epsilon. This allows us to\nstore only a fraction of the samples.\n\nHistograms can also be optionally maintained for timer values.\nThe minimum and maximum values along with the bin widths must\nbe specified in advance, and as samples are received the bins\nare updated. Statsite supports multiple histograms configurations,\nand uses a longest-prefix match policy.\n\nHandling of Sets in statsite depend on the number of\nentries received. For small cardinalities (\u003c64 currently),\nstatsite will count exactly the number of unique items. For\nlarger sets, it switches to using a HyperLogLog to estimate\ncardinalities with high accuracy and low space utilization.\nThis allows statsite to estimate huge set sizes without\nretaining all the values. The parameters of the HyperLogLog\ncan be tuned to provide greater accuracy at the cost of memory.\n\nThe HyperLogLog is based on the Google paper, \"HyperLogLog in\nPractice: Algorithmic Engineering of a State of The Art Cardinality\nEstimation Algorithm\".\n\nInstall\n-------\n\nThe following quickstart will probably work. If not, see INSTALL.md for detailed information.\n\nDownload and build from source. This requires `autoconf`, `automake` and `libtool` to be available,\navailable usually through a system package manager. Steps:\n\n    $ git clone https://github.com/statsite/statsite.git\n    $ cd statsite\n    $ ./autogen.sh\n    $ ./configure\n    $ make\n    $ ./statsite\n\nIf you get any errors, you may need to check if all dependencies are installed, see INSTALL.md.\n\nBuilding the test code may generate errors if libcheck is not available.\nTo build the test code successfully, do the following:\n\n    $ cd deps/check-0.10.0/\n    $ ./configure\n    $ make\n    # make install\n    # ldconfig (necessary on some Linux distros)\n    $ cd ../../\n    $ make test\n\nAt this point, the test code should build successfully.\n\nDocker\n------\n\nYou can build your own image of docker using the Dockerfile\n\n    $ git clone https://github.com/statsite/statsite.git\n    $ cd statsite\n    $ docker build -t statsite/statsite:latest .\n    $ docker run statsite/statsite:latest\n\nYou can override the configuration via a mount that provide a `statsite.conf`\n\n    $ docker run -v /config/statsite:/etc/statsite statsite/statsite:latest\n\nOr override the configuration with a different path by passing it in the `CMD`\n\n    $ docker run -v /config/statsite:/tmp statsite/statsite:latest -f /tmp/statsite.docker.example\n\nSee [statsite.docker.conf](https://github.com/statsite/statsite/blob/master/statsite.docker.example) for a starting point\n\nUsage\n-----\n\nStatsite is configured using a simple INI file.\nHere is an example configuration file:\n\n    [statsite]\n    port = 8125\n    udp_port = 8125\n    log_level = INFO\n    log_facility = local0\n    flush_interval = 10\n    timer_eps = 0.01\n    set_eps = 0.02\n    stream_cmd = python sinks/graphite.py localhost 2003 statsite\n\n    [histogram_api]\n    prefix=api\n    min=0\n    max=100\n    width=5\n\n    [histogram_default]\n    prefix=\n    min=0\n    max=200\n    width=20\n\nThen run statsite, pointing it to that file::\n\n    statsite -f /etc/statsite.conf\n\nA full list of configuration options is below.\n\nConfiguration Options\n---------------------\n\nEach statsite configuration option is documented below. Statsite configuration\noptions must exist in the `statsite` section of the INI file:\n\n* tcp\\_port : Integer, sets the TCP port to listen on. Default 8125. 0 to disable.\n\n* port: Same as above. For compatibility.\n\n* udp\\_port : Integer, sets the UDP port. Default 8125. 0 to disable.\n\n* udp_rcvbuf : Integer, sets the SO_RCVBUF socket buffer in bytes on the UDP port.\n  Defaults to 0 which does not change the OS default setting.\n\n* bind\\_address : The address to bind on. Defaults to 0.0.0.0\n\n* parse\\_stdin: Enables parsing stdin as an input stream. Defaults to 0.\n\n* log\\_level : The logging level that statsite should use. One of:\n  DEBUG, INFO, WARN, ERROR, or CRITICAL. All logs go to syslog,\n  and also stderr when not daemonizing. Default is DEBUG.\n\n* log\\_facility : The syslog logging facility that statsite should use.\n  One of: user, daemon, local0, local1, local2, local3, local4, local5,\n  local6, local7. All logs go to syslog.\n\n* flush\\_interval : How often the metrics should be flushed to the\n  sink in seconds. Defaults to 10 seconds.\n\n* timer\\_eps : The upper bound on error for timer estimates. Defaults\n  to 1%. Decreasing this value causes more memory utilization per timer.\n\n* set\\_eps : The upper bound on error for unique set estimates. Defaults\n  to 2%. Decreasing this value causes more memory utilization per set.\n\n* stream\\_cmd : This is the command that statsite invokes every\n  `flush_interval` seconds to handle the metrics. It can be any executable.\n  It should read inputs over stdin and exit with status code 0 on success.\n\n* aligned\\_flush : If set, flushes will be aligned on `flush_interval` boundaries, eg.\n  for a 15 second flush interval the flushes would be aligned to (0,15,30,45) boundaries \n  of every minute. This means the first flush period might be shorter than the flush\n  interval depending on the start time of statsite.\n\n* input\\_counter : If set, statsite will count how many commands it received\n  in the flush interval, and the count will be emitted under this name. For\n  example if set to \"numStats\", then statsite will emit \"counter.numStats\" with\n  the number of samples it has received.\n\n* daemonize : Should statsite daemonize. Defaults to 0.\n\n* pid\\_file : When daemonizing, where to put the pid file. Defaults\n  to /var/run/statsite.pid\n\n* binary\\_stream : Should data be streamed to the stream\\_cmd in\n  binary form instead of ASCII form. Defaults to 0.\n\n* use\\_type\\_prefix : Should prefixes with message type be added to the messages.\n  Does not affect global\\_prefix. Defaults to 1.\n\n* global\\_prefix : Prefix that will be added to all messages.\n  Defaults to empty string.\n\n* kv\\_prefix, gauges\\_prefix, counts\\_prefix, sets\\_prefix, timers\\_prefix : prefix for\n  each message type. Defaults to respectively: \"kv.\", \"gauges.\", \"counts.\",\n  \"sets.\", \"timers.\". Values will be ignored if use_type_prefix set to 0.\n\n* extended\\_counters : If enabled, the counter output will be extended to include the rate.\n  Defaults to false.\n\n* legacy\\_extended\\_counters : If enabled, the meaning of the \"count\" generated metrics on the\n  counters would be the number of metrics received. If false, it would be the sum of the values.\n  This is done for backwards compatibility. Defaults to true.\n\n* timers\\_include : Allows you to configure which timer metrics to include\n  through a comma separated list of values. Supported values include `count`, `mean`, `stdev`, `sum`, `sum_sq`,\n  `lower`, `upper`, `rate`, `median` and `sample_rate`. If this option is not specified then all values except `median` will be included by default.\n  `median` will be included if `quantiles` include 0.5\n\n* prefix\\_binary\\_stream : If enabled, the keys streamed to a the stream\\_cmd\n  when using binary\\_stream mode are also prefixed. By default, this is false,\n  and keys do not get the prefix.\n\n* quantiles : A comma-separated list of quantiles to calculate for timers.\n  Defaults to `0.5, 0.95, 0.99`\n\nIn addition to global configurations, statsite supports histograms\nas well. Histograms are configured one per section, and the INI\nsection must start with the word `histogram`. These are the recognized\noptions:\n\n* prefix : This is the key prefix to match on. The longest matching prefix\n  is used. If the prefix is blank, it is the default for all keys.\n\n* min : Floating value. The minimum bound on the histogram. Values below\n  this go into a special bucket containing everything less than this value.\n\n* max: Floating value. The maximum bound on the histogram. Values above\n  this go into a special bucket containing everything more than this value.\n\n* width : Floating value. The width of each bucket between the min and max.\n\nEach histogram section must specify all options to be valid.\n\n\nProtocol\n--------\n\nBy default, Statsite will listen for TCP and UDP connections. A message\nlooks like the following (where the flag is optional)::\n\n    key:value|type[|@flag]\n\nMessages must be terminated by newlines (`\\n`).\n\nCurrently supported message types:\n\n* `kv` - Simple Key/Value.\n* `g`  - Gauge, similar to `kv` but only the last value per key is retained\n* `ms` - Timer.\n* `h`  - Alias for timer\n* `c`  - Counter.\n* `s`  - Unique Set\n\nAfter the flush interval, the counters and timers of the same key are\naggregated and this is sent to the store.\n\nGauges also support \"delta\" updates, which are supported by prefixing the\nvalue with either a `+` or a `-`. This implies you can't explicitly set a gauge to a negative number without first setting it to zero.\n\nMultiple metrics may be batched together in one UDP packet a separated by a\nnewline (`\\n`) character.  Care must be taken to keep UDP data size smaller\nthan the network MTU minus 28 bytes for IP/UDP headers.  Statsite supports\na maximum UDP data length of 1500 bytes.\n\nExamples:\n\nThe following is a simple key/value pair, in this case reporting how many\nqueries we've seen in the last second on MySQL::\n\n    mysql.queries:1381|kv\n\nThe following is a timer, timing the response speed of an API call::\n\n    api.session_created:114|ms\n\nThe next example increments the \"rewards\" counter by 1::\n\n    rewards:1|c\n\nHere we initialize a gauge and then modify its value::\n\n    inventory:100|g\n    inventory:-5|g\n    inventory:+2|g\n\nSets count the unique items, so if statsite gets::\n\n    users:abe|s\n    users:zoe|s\n    users:bob|s\n    users:abe|s\n\nThen it will emit a count 3 for the number of uniques it has seen.\n\nWriting Statsite Sinks\n---------------------\n\nStatsite ships with graphite, librato, gmetric, and influxdb sinks, but ANY executable\nor script  can be used as a sink. The sink should read its inputs from stdin, where\neach metric is in the form::\n\n    key|val|timestamp\\n\n\nEach metric is separated by a newline. The process should terminate with\nan exit code of 0 to indicate success.\n\nHere is an example of the simplest possible Python sink:\n\n    #!/usr/bin/env python\n    import sys\n\n    lines = sys.stdin.read().split(\"\\n\")\n    metrics = [l.split(\"|\") for l in lines]\n\n    for key, value, timestamp in metrics:\n        print key, value, timestamp\n\n\nBinary Protocol\n---------------\n\nIn addition to the statsd compatible ASCII protocol, statsite includes\na lightweight binary protocol. This can be used if you want to make use\nof special characters such as the colon, pipe character, or newlines. It\nis also marginally faster to process, and may provide 10-20% more throughput.\n\nEach command is sent to statsite over the same ports with this header:\n\n    \u003cMagic Byte\u003e\u003cMetric Type\u003e\u003cKey Length\u003e\n\nThen depending on the metric type, it is followed by either:\n\n    \u003cValue\u003e\u003cKey\u003e\n    \u003cSet Length\u003e\u003cKey\u003e\u003cSet Key\u003e\n\nThe \"Magic Byte\" is the value 0xaa (170). This switches the internal\nprocessing from the ASCII mode to binary. The metric type is one of:\n\n* 0x1 : Key value / Gauge\n* 0x2 : Counter\n* 0x3 : Timer\n* 0x4 : Set\n* 0x5 : Gauge\n* 0x6 : Gauge Delta update\n\nThe key length is a 2 byte unsigned integer with the length of the\nkey, INCLUDING a NULL terminator. The key must include a null terminator,\nand it's length must include this.\n\nIf the metric type is K/V, Counter or Timer, then we expect a value and\na key. The value is a standard IEEE754 double value, which is 8 bytes in length.\nThe key is provided as a byte stream which is `Key Length` long,\nterminated by a NULL (0) byte.\n\nIf the metric type is Set, then we expect the length of a set key,\nprovided like the key length. The key should then be followed by\nan additional Set Key, which is `Set Length` long, terminated\nby a NULL (0) byte.\n\nAll of these values must be transmitted in Little Endian order.\n\nHere is an example of sending (\"Conns\", \"c\", 200) as hex:\n\n    0xaa 0x02 0x0600 0x0000000000006940 0x436f6e6e7300\n\n\nNote: The binary protocol does not include support for \"flags\" and resultantly\ncannot be used for transmitting sampled counters.\n\n\nBinary Sink Protocol\n--------------------\n\nIt is also possible to have the data streamed to be represented\nin a binary format. Again, this is used if you want to use the reserved\ncharacters. It may also be faster.\n\nEach command is sent to the sink in the following manner:\n\n    \u003cTimestamp\u003e\u003cMetric Type\u003e\u003cValue Type\u003e\u003cKey Length\u003e\u003cValue\u003e\u003cKey\u003e[\u003cCount\u003e]\n\nMost of these are the same as the binary protocol. There are a few.\nchanges however. The Timestamp is sent as an 8 byte unsigned integer,\nwhich is the current Unix timestamp. The Metric type is one of:\n\n* 0x1 : Key value\n* 0x2 : Counter\n* 0x3 : Timer\n* 0x4 : Set\n* 0x5 : Gauge\n\nThe value type is one of:\n\n* 0x0 : No type (Key/Value)\n* 0x1 : Sum (Also used for Sets)\n* 0x2 : Sum Squared\n* 0x3 : Mean\n* 0x4 : Count\n* 0x5 : Standard deviation\n* 0x6 : Minimum Value\n* 0x7 : Maximum Value\n* 0x8 : Histogram Floor Value\n* 0x9 : Histogram Bin Value\n* 0xa : Histogram Ceiling Value\n* 0xb : Count Rate (Sum / Flush Interval)\n* 0xc : Sample Rate (Count / Flush Interval)\n* 0x80 OR `percentile` :  If the type OR's with 128 (0x80), then it is a\n    percentile amount. The amount is OR'd with 0x80 to provide the type. For\n    example (0x80 | 0x32) = 0xb2 is the 50% percentile or medium. The 95th\n    percentile is (0x80 | 0xdf) = 0xdf.\n\nThe key length is a 2 byte unsigned integer representing the key length\nterminated by a NULL character. The Value is an IEEE754 double. Lastly,\nthe key is a NULL-terminated character stream.\n\nThe final `\u003cCount\u003e` field is only set for histogram values.\nIt is always provided as an unsigned 32 bit integer value. Histograms use the\nvalue field to specify the bin, and the count field for the entries in that\nbin. The special values for histogram floor and ceiling indicate values that\nwere outside the specified histogram range. For example, if the min value was\n50 and the max 200, then HISTOGRAM\\_FLOOR will have value 50, and the count is\nthe number of entires which were below this minimum value. The ceiling is the same\nbut visa versa. For bin values, the value is the minimum value of the bin, up to\nbut not including the next bin.\n\nTo enable the binary sink protocol, add a configuration variable `binary_stream`\nto the configuration file with the value `yes`. An example sink is provided in\n`sinks/binary_sink.py`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstatsite%2Fstatsite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstatsite%2Fstatsite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstatsite%2Fstatsite/lists"}