{"id":13514286,"url":"https://github.com/stripe/veneur","last_synced_at":"2025-05-14T00:05:03.534Z","repository":{"id":37502535,"uuid":"55277533","full_name":"stripe/veneur","owner":"stripe","description":"A distributed, fault-tolerant pipeline for observability data","archived":false,"fork":false,"pushed_at":"2024-03-20T18:01:51.000Z","size":69150,"stargazers_count":1741,"open_issues_count":62,"forks_count":175,"subscribers_count":39,"default_branch":"master","last_synced_at":"2025-04-10T02:15:18.714Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stripe.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-04-02T04:24:34.000Z","updated_at":"2025-03-23T13:47:40.000Z","dependencies_parsed_at":"2023-02-19T13:15:45.654Z","dependency_job_id":"7fcba140-ae11-4fbb-8e6b-6a80a056af80","html_url":"https://github.com/stripe/veneur","commit_stats":{"total_commits":1706,"total_committers":91,"mean_commits":"18.747252747252748","dds":0.8241500586166471,"last_synced_commit":"0db558ae5941fd953166648123faf5949f74688b"},"previous_names":[],"tags_count":31,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stripe%2Fveneur","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stripe%2Fveneur/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stripe%2Fveneur/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stripe%2Fveneur/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stripe","download_url":"https://codeload.github.com/stripe/veneur/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254043328,"owners_count":22004927,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T05:00:51.720Z","updated_at":"2025-05-14T00:05:03.506Z","avatar_url":"https://github.com/stripe.png","language":"Go","funding_links":[],"categories":["开源类库","Go","Open source library"],"sub_categories":["统计分析","Statistical Analysis"],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/stripe/veneur/gh-pages/veneur_logo.svg?sanitize=true\"\u003e\n\u003c/div\u003e\n\n[![Build Status](https://circleci.com/gh/stripe/veneur.svg?style=svg)](https://app.circleci.com/pipelines/github/stripe/veneur)\n[![GoDoc](https://godoc.org/github.com/stripe/veneur?status.svg)](http://godoc.org/github.com/stripe/veneur)\n\n# Table of Contents\n\n   * [What Is Veneur?](#what-is-veneur)\n      * [Use Case](#use-case)\n      * [See Also](#see-also)\n   * [Status](#status)\n   * [Features](#features)\n      * [Vendor And Backend Agnostic](#vendor-and-backend-agnostic)\n      * [Modern Metrics Format (Or Others!)](#modern-metrics-format-or-others)\n      * [Global Aggregation](#global-aggregation)\n      * [Approximate Histograms](#approximate-histograms)\n      * [Approximate Sets](#approximate-sets)\n      * [Global Counters](#global-counters)\n      * [Sink Routing](#sink-routing)\n   * [Concepts](#concepts)\n      * [By Metric Type Behavior](#by-metric-type-behavior)\n      * [Expiration](#expiration)\n      * [Other Notes](#other-notes)\n   * [Usage](#usage)\n   * [Setup](#setup)\n      * [Clients](#clients)\n      * [Einhorn Usage](#einhorn-usage)\n      * [Forwarding](#forwarding)\n         * [Proxy](#proxy)\n         * [Static Configuration](#static-configuration)\n         * [Magic Tag](#magic-tag)\n            * [Global Counters And Gauges](#global-counters-and-gauges)\n            * [Routing metrics](#routing-metrics)\n   * [Configuration](#configuration)\n      * [Configuration via Environment Variables](#configuration-via-environment-variables)\n   * [Monitoring](#monitoring)\n      * [At Local Node](#at-local-node)\n         * [Forwarding](#forwarding-1)\n      * [At Global Node](#at-global-node)\n      * [Metrics](#metrics)\n      * [Error Handling](#error-handling)\n   * [Performance](#performance)\n      * [Benchmarks](#benchmarks)\n      * [SO_REUSEPORT](#so_reuseport)\n      * [TCP connections](#tcp-connections)\n      * [TLS encryption and authentication](#tls-encryption-and-authentication)\n         * [Performance implications of TLS](#performance-implications-of-tls)\n   * [Name](#name)\n\n# What Is Veneur?\n\nVeneur  (`/vɛnˈʊr/`, rhymes with “assure”) is a distributed, fault-tolerant pipeline for runtime data. It provides a server implementation of the [DogStatsD protocol](http://docs.datadoghq.com/guides/dogstatsd/#datagram-format) or [SSF](https://github.com/stripe/veneur/tree/master/ssf) for aggregating metrics and sending them to downstream storage to one or more supported sinks. It can also act as a [global aggregator](#global-aggregation) for histograms, sets and counters.\n\nMore generically, Veneur is a convenient sink for various observability primitives with lots of outputs!\n\n## Use Case\n\nOnce you cross a threshold into dozens, hundreds or (gasp!) thousands of machines emitting metric data for an application, you've moved into that world where data about individual hosts is uninteresting except in aggregate form. Instead of paying to store tons of data points and then aggregating them later at read-time, Veneur can calculate global aggregates, like percentiles and forward those along to your time series database, etc.\n\nVeneur is also a StatsD or [DogStatsD](https://docs.datadoghq.com/developers/dogstatsd/) protocol transport, forwarding the locally collected metrics over more reliable TCP\nimplementations.\n\nHere are some examples of why Stripe and other companies are using Veneur today:\n* reducing cost by pre-aggregating metrics such as timers into percentiles\n* creating a vendor-agnostic metric collection pipeline\n* consolidating disparate observability data (from trace spans to metrics, and more!)\n* improving efficiency over other metric aggregator implementations\n* improving reliability by building a more resilient forwarding system over single points of failure\n\n## See Also\n\n* A unified, standard format for observability primitives, the [SSF](https://github.com/stripe/veneur/tree/master/ssf/#readme)\n* A proxy for resilient distributed aggregation, [veneur-proxy](https://github.com/stripe/veneur/tree/master/cmd/veneur-proxy/#readme)\n* A command line tool for emitting metrics, [veneur-emit](https://github.com/stripe/veneur/tree/master/cmd/veneur-emit/#readme)\n* A poller for scraping Prometheus metrics, [veneur-prometheus](https://github.com/stripe/veneur/tree/master/cmd/veneur-prometheus/#readme)\n* The [sinks supported by Veneur](https://github.com/stripe/veneur/tree/master/sinks#readme)\n\nWe wanted percentiles, histograms and sets to be global. We wanted to unify our observability clients, be vendor agnostic and build automatic features like SLI measurement. Veneur helps us do all this and more!\n\n# Status\n\nVeneur is currently handling all metrics for Stripe and is considered production ready. It is under active development and maintenance! Starting with v1.6, Veneur operates on a six-week release cycle, and all releases are tagged in git. If you'd like to contribute, see [CONTRIBUTING](https://github.com/stripe/veneur/blob/master/CONTRIBUTING.md)!\n\nBuilding Veneur requires Go 1.11 or later.\n\n# Features\n\n## Vendor And Backend Agnostic\n\nVeneur has many [sinks](https://github.com/stripe/veneur/tree/master/sinks#readme) such that your data can be sent one or more vendors, TSDBs or tracing stores!\n\n## Modern Metrics Format (Or Others!)\n\nUnify metrics, spans and logs via the [Sensor Sensibility Format](https://github.com/stripe/veneur/tree/master/ssf). Also works with [DogStatsD](https://github.com/stripe/veneur/tree/master/sinks/datadog#readme), StatsD and [Prometheus](https://github.com/stripe/veneur/tree/master/cmd/veneur-prometheus/#readme).\n\n## Global Aggregation\n\nIf configured to do so, Veneur can selectively aggregate global metrics to be cumulative across all instances that report to a central Veneur, allowing global percentile calculation, global counters or global sets.\n\nFor example, say you emit a timer `foo.bar.call_duration_ms` from 20 hosts that are configured to forward to a central Veneur. You'll see the following:\n\n* Metrics that have been \"globalized\"\n  * `foo.bar.call_duration_ms.50percentile`: the p50 across all hosts, by tag\n  * `foo.bar.call_duration_ms.90percentile`: the p90 across all hosts, by tag\n  * `foo.bar.call_duration_ms.95percentile`: the p95 across all hosts, by tag\n  * `foo.bar.call_duration_ms.99percentile`: the p99 across all hosts, by tag\n* Metrics that remain host-local\n  * `foo.bar.call_duration_ms.avg`: by-host tagged average\n  * `foo.bar.call_duration_ms.count`: by-host tagged count which (when summed) shows the total count of times this metric was emitted\n  * `foo.bar.call_duration_ms.max`: by-host tagged maximum value\n  * `foo.bar.call_duration_ms.median`: by-host tagged median value\n  * `foo.bar.call_duration_ms.min`: by-host tagged minimum value\n  * `foo.bar.call_duration_ms.sum`: by-host tagged sum value representing the total time\n\nClients can choose to override this behavior by [including the tag `veneurlocalonly`](#magic-tag).\n\n## Approximate Histograms\n\nBecause Veneur is built to handle lots and lots of data, it uses approximate histograms. We have our own implementation of [Dunning's t-digest](tdigest/merging_digest.go), which has bounded memory consumption and reduced error at extreme quantiles. Metrics are consistently routed to the same worker to distribute load and to be added to the same histogram.\n\nDatadog's DogStatsD — and StatsD — uses an exact histogram which retains all samples and is reset every flush period. This means that there is a loss of precision when using Veneur, but the resulting percentile values are meant to be more representative of a global view.\n\n### Datadog Distributions\n\nBecause Veneur already handles \"global\" histograms, any DogStatsD packets received with type `d` — [Datadog's distribution type](https://docs.datadoghq.com/developers/metrics/distributions/) — will be considered a histogram and therefore compatible with all sinks. Veneur does **not** send any metrics to Datadog typed as a Datadog-native distribution.\n\n## Approximate Sets\n\nVeneur uses [HyperLogLogs](https://github.com/axiomhq/hyperloglog) for approximate unique sets. These are [a very efficient unique counter with fixed memory consumption](https://djhworld.github.io/hyperloglog/).\n\n## Global Counters\n\nVia an optional [magic tag](#magic-tag) Veneur will forward counters to a global host for accumulation. This feature was primarily developed to control tag cardinality. Some counters are valuable but do not require per-host tagging.\n\n## Sink Routing\n\nVeneur supports routing metrics to specific sinks using the\n`metric_sink_routing` configuration field with the structure:\n\n```\nmetric_sink_routing:  # or\n  - name: string\n    match:  # or\n      - name:\n          kind: any | exact | prefix | regex\n          value: string\n        tags:  # and\n          - kind: exact | prefix | regex\n            unset: bool\n            value: string\n    sinks:\n      matched:  # and\n        - string\n      not_matched:  # and\n        - string\n```\n\nThe `metric_sink_routing` field contains a list of routing rules, containing a\nname field for identifying the rule in logs, a list of matchers, and sinks. A\nmatcher contains a name matcher for matching the name of a metric, and a list of\ntag matchers for matching tags the metric has; the name matcher and all of the\ntag matchers must match in order for the matcher to match a given metric.\n\nThe `kind` for the name matcher can be one of `any`, `exact`, `prefix`, or\n`regex`. The name matcher matches a metric name:\n  - `any`: always; the name of the metric is ignored, and the `value` field is\n    unused.\n  - `exact`: if the metric name equals the value field.\n  - `prefix`: if the name starts with the value field.\n  - `regex`: if the name of the metric name matches the regex expression\n    specified in the value field.\n\nThe `kind` of the tag matcher can be one of `exact`, `prefix` or `regex`. The\ntag matcher matches a metric tag:\n  - `exact`: if the tag equals the value field.\n  - `prefix`: if the tag starts with the value field.\n  - `regex`: if the tag matches the regex expression specified in the value\n    field.\n\nFor a tag matcher to match a given metric, if the `unset` field is not set or is\n`false`, the tag matcher must match at least one tag in the metric; if the\n`unset` field is `true`, the tag matcher must match none of the tags in the\nmetric.\n\nIf a metric matches any of the entries in the `match` field of a given rule, it\nis flushed to all of the sinks listed in the `matched` field; if a metric\nmatches none of the matchers in a given rule, it is sent to all of the sinks\nlisted in the `not_matched` section.\n\n# Concepts\n\n* Global metrics are those that benefit from being aggregated for chunks — or all — of your infrastructure. These are histograms (including the percentiles generated by timers) and sets.\n* Metrics that are sent to another Veneur instance for aggregation are said to be \"forwarded\". This terminology helps to decipher configuration and metric options below.\n* Flushed, in Veneur, means metrics or spans processed by a sink.\n\n## By Metric Type Behavior\n\nTo clarify how each metric type behaves in Veneur, please use the following:\n* Counters: Locally accrued, flushed to sinks (see [magic tags](#magic-tag) for global version)\n* Gauges: Locally accrued, flushed to sinks  (see [magic tags](#magic-tag) for global version)\n* Histograms: Locally accrued, count, max and min flushed to sinks, percentiles forwarded to `forward_address` for global aggregation when set.\n* Timers: Locally accrued, count, max and min flushed to sinks, percentiles forwarded to `forward_address` for global aggregation when set.\n* Sets: Locally accrued, forwarded to `forward_address` for sinks aggregation when set.\n\n## Expiration\n\nVeneur expires all metrics on each flush. If a metric is no longer being sent (or is sent sparsely) Veneur will not send it as zeros! This was chosen because the combination of the approximation's features and the additional hysteresis imposed by *retaining* these approximations over time was deemed more complex than desirable.\n\n## Other Notes\n\n* Veneur aligns its flush timing with the local clock. For the default interval of `10s` Veneur will generally emit metrics at 00, 10, 20, 30, … seconds after the minute.\n* Veneur will delay it's first metric emission to align the clock as stated above. This may result in a brief quiet period on a restart at worst \u003c `interval` seconds long.\n\n# Usage\n\n```\nveneur -f example.yaml\n```\n\nSee example.yaml for a sample config. Be sure to set the appropriate `*_api_key`!\n\n# Setup\n\nHere we'll document some explanations of setup choices you may make when using Veneur.\n\n## Clients\n\nVeneur is capable of ingesting:\n\n* [DogStatsD](https://docs.datadoghq.com/guides/dogstatsd/) including events and service checks\n* [SSF](https://github.com/stripe/veneur/tree/master/ssf)\n* StatsD as a subset of DogStatsD, but this may cause trouble depending on where you store your metrics.\n\nTo use clients with Veneur you need only configure your client of choice to the proper host and port combination. This port should match one of:\n\n* `statsd_listen_addresses` for UDP- and TCP-based clients\n* `ssf_listen_addresses` for SSF-based clients using UDP or UNIX domain sockets.\n* `grpc_listen_addresses` for both SSF and dogstatsd based clients using GRPC (over TCP).\n\n## Einhorn Usage\n\nWhen you upgrade Veneur (deploy, stop, start with new binary) there will be a\nbrief period where Veneur will not be able to handle HTTP requests. At Stripe\nwe use [Einhorn](https://github.com/stripe/einhorn) as a shared socket manager to\nbridge the gap until Veneur is ready to handle HTTP requests again.\n\nYou'll need to consult Einhorn's documentation for installation, setup and usage.\nBut once you've done that you can tell Veneur to use Einhorn by setting `http_address`\nto `einhorn@0`. This informs [goji/bind](https://github.com/zenazn/goji/tree/master/bind) to use its\nEinhorn handling code to bind to the file descriptor for HTTP.\n\n## Forwarding\n\nVeneur instances can be configured to forward their global metrics to another Veneur instance. You can use this feature to get the best of both worlds: metrics that benefit from global aggregation can be passed up to a single global Veneur, but other metrics can be published locally with host-scoped information. Note: **Forwarding adds an additional delay to metric availability corresponding to the value of the `interval` configuration option**, as the local veneur will flush it to its configured upstream, which will then flush any recieved metrics when its interval expires.\n\nIf a local instance receives a histogram or set, it will publish the local parts of that metric (the count, min and max) directly to sinks, but instead of publishing percentiles, it will package the entire histogram and send it to the global instance. The global instance will aggregate all the histograms together and publish their percentiles to sinks.\n\nNote that the global instance can also receive metrics over UDP. It will publish a count, min and max for the samples that were sent directly to it, but not counting any samples from other Veneur instances (this ensures that things don't get double-counted). You can even chain multiple levels of forwarding together if you want. This might be useful if, for example, your global Veneur is under too much load. The root of the tree will be the Veneur instance that has an empty `forward_address`. (Do not tell a Veneur instance to forward metrics to itself. We don't support that and it doesn't really make sense in the first place.)\n\nWith respect to the `tags` configuration option, the tags that will be added are those of the Veneur that actually publishes to a sink. If a local instance forwards its histograms and sets to a global instance, the local instance's tags will not be attached to the forwarded structures. It will still use its own tags for the other metrics it publishes, but the percentiles will get extra tags only from the global instance.\n\n### Proxy\n\nTo improve availability, you can [leverage veneur-proxy](https://github.com/stripe/veneur/tree/master/cmd/veneur-proxy/#readme) in conjunction with [Consul](https://www.consul.io) service discovery.\n\nThe proxy can be configured to query the Consul API for instances of a service using `consul_forward_service_name`. Each **healthy** instance is then entered in to a hash ring. When choosing which host to forward to, Veneur will use a combination of metric name and tags to _consistently_ choose the same host for forwarding.\n\nSee [more documentation for Proxy Veneur](https://github.com/stripe/veneur/tree/master/cmd/veneur-proxy/#readme).\n\n### Static Configuration\n\nFor static configuration you need one Veneur, which we'll call the _global_ instance, and one or more other Veneurs, which we'll call _local_ instances. The local instances should have their `forward_address` configured to the global instance's `http_address`. The global instance should have an empty `forward_address` (ie just don't set it). You can then report metrics to any Veneur's `statsd_listen_addresses` as usual.\n\n### Magic Tag\n\nIf you want a metric to be strictly host-local, you can tell Veneur not to forward it by including a `veneurlocalonly` tag in the metric packet, eg `foo:1|h|#veneurlocalonly`. This tag will not actually appear in storage; Veneur removes it.\n\n#### Global Counters And Gauges\n\nRelatedly, if you want to forward a counter or gauge to the global Veneur instance to reduce tag cardinality, you can tell Veneur to flush it to the global instance by including a `veneurglobalonly` tag in the metric's packet. This `veneurglobalonly` tag is stripped and will not be passed on to sinks.\n\n**Note**: For global counters to report correctly, the local and global Veneur instances should be configured to have the same flush interval.\n\n**Note**: Global gauges are \"random write wins\" since they are merged in a non-deterministic order at the global Veneur.\n\n# Configuration\n\nVeneur expects to have a config file supplied via `-f PATH`. The included [example.yaml](https://github.com/stripe/veneur/blob/master/example.yaml) explains all the options!\n\nThe config file can be validated using a pair of flags:\n\n* `-validate-config`: checks that the config file specified via `-f` is valid YAML, and has correct datatypes for all fields.\n* `-validate-config-strict`: checks the above, and also that there are no unknown fields.\n\n## Configuration via Environment Variables\n\nVeneur and veneur-proxy each allow configuration via environment variables using [envconfig](https://github.com/kelseyhightower/envconfig). Options provided via environment variables take precedent over those in config. This allows stuff like:\n\n```\nVENEUR_DEBUG=true veneur -f someconfig.yml\n```\n\n**Note**: The environment variables used for configuration map to the field names in [config.go](https://github.com/stripe/veneur/blob/master/config.go), capitalized, with the prefix `VENEUR_`. For example, the environment variable equivalent of `datadog_api_hostname` is `VENEUR_DATADOGAPIHOSTNAME`.\n\nYou may specify configurations that are arrays by separating them with a comma, for example `VENEUR_AGGREGATES=\"min,max\"`\n\n# Monitoring\n\nHere are the important things to monitor with Veneur:\n\n## At Local Node\n\nWhen running as a local instance, you will be primarily concerned with the following metrics:\n* `veneur.flush*.error_total` as a count of errors when flushing metrics. This should rarely happen. Occasional errors are fine, but sustained is bad.\n\n### Forwarding\n\nIf you are forwarding metrics to central Veneur, you'll want to monitor these:\n* `veneur.forward.error_total` and the `cause` tag. This should pretty much never happen and definitely not be sustained.\n* `veneur.forward.duration_ns` and `veneur.forward.duration_ns.count`. These metrics track the per-host time spent performing a forward. The time should be minimal!\n\n## At Global Node\n\nWhen forwarding you'll want to also monitor the global nodes you're using for aggregation:\n* `veneur.import.request_error_total` and the `cause` tag. This should pretty much never happen and definitely not be sustained.\n* `veneur.import.response_duration_ns` and `veneur.import.response_duration_ns.count` to monitor duration and number of received forwards. This should not fail and not take very long. How long it takes will depend on how many metrics you're forwarding.\n* And the same `veneur.flush.*` metrics from the \"At Local Node\" section.\n\n## Metrics\n\nVeneur will emit metrics to the `stats_address` configured above in DogStatsD form. Those metrics are:\n\n* `veneur.sink.metric_flush_total_duration_ns.*` - Duration of flushes *per-sink*, tagged by `sink`.\n* `veneur.packet.error_total` - Number of packets that Veneur could not parse due to some sort of formatting error by the client. Tagged by `packet_type` and `reason`.\n* `veneur.forward.post_metrics_total` - Indicates how many metrics are being forwarded in a given POST request. A \"metric\", in this context, refers to a unique combination of name, tags and metric type.\n* `veneur.*.content_length_bytes.*` - The number of bytes in a single POST body. Remember that Veneur POSTs large sets of metrics in multiple separate bodies in parallel. Uses a histogram, so there are multiple metrics generated depending on your local DogStatsD config.\n* `veneur.forward.duration_ns` - Same as `flush.duration_ns`, but for forwarding requests.\n* `veneur.flush.error_total` - Number of errors received POSTing via sinks.\n* `veneur.forward.error_total` - Number of errors received POSTing to an upstream Veneur. See also `import.request_error_total` below.\n* `veneur.gc.number` - Number of completed GC cycles.\n* `veneur.gc.pause_total_ns` - Total seconds of STW GC since the program started.\n* `veneur.mem.heap_alloc_bytes` - Total number of reachable and unreachable but uncollected heap objects in bytes.\n* `veneur.worker.metrics_processed_total` - Total number of metric packets processed between flushes by workers, tagged by `worker`. This helps you find hot spots where a single worker is handling a lot of metrics. The sum across all workers should be approximately proportional to the number of packets received.\n* `veneur.worker.metrics_flushed_total` - Total number of metrics flushed at each flush time, tagged by `metric_type`. A \"metric\", in this context, refers to a unique combination of name, tags and metric type. You can use this metric to detect when your clients are introducing new instrumentation, or when you acquire new clients.\n* `veneur.worker.metrics_imported_total` - Total number of metrics received via the importing endpoint. A \"metric\", in this context, refers to a unique combination of name, tags, type _and originating host_. This metric indicates how much of a Veneur instance's load is coming from imports.\n* `veneur.import.response_duration_ns` - Time spent responding to import HTTP requests. This metric is broken into `part` tags for `request` (time spent blocking the client) and `merge` (time spent sending metrics to workers).\n* `veneur.import.request_error_total` - A counter for the number of import requests that have errored out. You can use this for monitoring and alerting when imports fail.\n* `veneur.listen.received_per_protocol_total` - A counter for the number of metrics/spans/etc. received by direct listening on global Veneur instances. This can be used to observe metrics that were received from direct emits as opposed to imports. Tagged by `protocol`.\n\n## Error Handling\n\nIn addition to logging, Veneur will dutifully send any errors it generates to a [Sentry](https://sentry.io/) instance. This will occur if you set the `sentry_dsn` configuration option. Not setting the option will disable Sentry reporting.\n\n# Performance\n\nProcessing packets quickly is the name of the game.\n\n## Benchmarks\n\nThe common use case for Veneur is as an aggregator and host-local replacement for DogStatsD, therefore processing UDP fast is no longer the priority. That said,\nwe were processing \u003e 60k packets/second in production before shifting to the current local aggregation method. This outperformed both the Datadog-provided DogStatsD\nand StatsD in our infrastructure.\n\n## SO_REUSEPORT\n\nAs [other implementations](http://githubengineering.com/brubeck/) have observed, there's a limit to how many UDP packets a single kernel thread can consume before it starts to fall over. Veneur supports the `SO_REUSEPORT` socket option on Linux, allowing multiple threads to share the UDP socket with kernel-space balancing between them. If you've tried throwing more cores at Veneur and it's just not going fast enough, this feature can probably help by allowing more of those cores to work on the socket (which is Veneur's hottest code path by far). Note that this is only supported on Linux (right now). We have not added support for other platforms, like darwin and BSDs.\n\n## TCP connections\n\nVeneur supports reading the statsd protocol from TCP connections. This is mostly to support TLS encryption and authentication, but might be useful on its own. Since TCP is a continuous stream of bytes, this requires each stat to be terminated by a new line character ('\\n'). Most statsd clients only add new lines between stats within a single UDP packet, and omit the final trailing new line. This means you will likely need to modify your client to use this feature.\n\n## TLS encryption and authentication\n\nIf you specify the `tls_key` and `tls_certificate` options, Veneur will only accept TLS connections on its TCP port. This allows the metrics sent to Veneur to be encrypted.\n\nIf you specify the `tls_authority_certificate` option, Veneur will require clients to present a client certificate, signed by this authority. This ensures that only authenticated clients can connect.\n\nYou can generate your own set of keys using openssl:\n\n```\n# Generate the authority key and certificate (2048-bit RSA signed using SHA-256)\nopenssl genrsa -out cakey.pem 2048\nopenssl req -new -x509 -sha256 -key cakey.pem -out cacert.pem -days 1095 -subj \"/O=Example Inc/CN=Example Certificate Authority\"\n\n# Generate the server key and certificate, signed by the authority\nopenssl genrsa -out serverkey.pem 2048\nopenssl req -new -sha256 -key serverkey.pem -out serverkey.csr -days 1095 -subj \"/O=Example Inc/CN=veneur.example.com\"\nopenssl x509 -sha256 -req -in serverkey.csr -CA cacert.pem -CAkey cakey.pem -CAcreateserial -out servercert.pem -days 1095\n\n# Generate a client key and certificate, signed by the authority\nopenssl genrsa -out clientkey.pem 2048\nopenssl req -new -sha256 -key clientkey.pem -out clientkey.csr -days 1095 -subj \"/O=Example Inc/CN=Veneur client key\"\nopenssl x509 -req -in clientkey.csr -CA cacert.pem -CAkey cakey.pem -CAcreateserial -out clientcert.pem -days 1095\n```\n\nSet `statsd_listen_addresses`, `tls_key`, `tls_certificate`, and `tls_authority_certificate`:\n\n```\nstatsd_listen_addresses:\n  - \"tcp://localhost:8129\"\ntls_certificate: |\n  -----BEGIN CERTIFICATE-----\n  MIIC8TCCAdkCCQDc2V7P5nCDLjANBgkqhkiG9w0BAQsFADBAMRUwEwYDVQQKEwxC\n  ...\n  -----END CERTIFICATE-----\ntls_key: |\n    -----BEGIN RSA PRIVATE KEY-----\n  MIIEpAIBAAKCAQEA7Sntp4BpEYGzgwQR8byGK99YOIV2z88HHtPDwdvSP0j5ZKdg\n  ...\n  -----END RSA PRIVATE KEY-----\ntls_authority_certificate: |\n  -----BEGIN CERTIFICATE-----\n  ...\n  -----END CERTIFICATE-----\n```\n\n### Performance implications of TLS\n\nEstablishing a TLS connection is fairly expensive, so you should reuse connections as much as possible. RSA keys are also far more expensive than using ECDH keys. Using localhost on a machine with one CPU, Veneur was able to establish ~700 connections/second using ECDH `prime256v1` keys, but only ~110 connections/second using RSA 2048-bit keys. According to the Go profiling for a Veneur instance using TLS with RSA keys, approximately 25% of the CPU time was in the TLS handshake, and 13% was decrypting data.\n\n# Name\n\nThe [veneur](https://en.wikipedia.org/wiki/Grand_Huntsman_of_France) is a person acting as superintendent of the chase and especially\nof hounds in French medieval venery and being an important officer of the royal household. In other words, it is the master of dogs. :)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstripe%2Fveneur","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstripe%2Fveneur","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstripe%2Fveneur/lists"}