{"id":13616365,"url":"https://github.com/infinityworks/prometheus-example-queries","last_synced_at":"2026-01-17T17:19:22.217Z","repository":{"id":43810931,"uuid":"77074596","full_name":"infinityworks/prometheus-example-queries","owner":"infinityworks","description":"Simple place for people to provide examples of queries they've found useful.","archived":false,"fork":false,"pushed_at":"2020-10-05T07:02:15.000Z","size":18,"stargazers_count":860,"open_issues_count":2,"forks_count":91,"subscribers_count":47,"default_branch":"master","last_synced_at":"2024-08-02T20:48:09.676Z","etag":null,"topics":["monitoring","prometheus"],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/infinityworks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-12-21T18:00:18.000Z","updated_at":"2024-06-28T03:18:05.000Z","dependencies_parsed_at":"2022-08-25T19:01:42.409Z","dependency_job_id":null,"html_url":"https://github.com/infinityworks/prometheus-example-queries","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/infinityworks%2Fprometheus-example-queries","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/infinityworks%2Fprometheus-example-queries/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/infinityworks%2Fprometheus-example-queries/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/infinityworks%2Fprometheus-example-queries/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/infinityworks","download_url":"https://codeload.github.com/infinityworks/prometheus-example-queries/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223611945,"owners_count":17173541,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["monitoring","prometheus"],"created_at":"2024-08-01T20:01:27.533Z","updated_at":"2026-01-17T17:19:22.206Z","avatar_url":"https://github.com/infinityworks.png","language":null,"funding_links":[],"categories":["Others"],"sub_categories":[],"readme":"# Purpose\n\nPrometheus is awesome, but the human mind doesn't work in PromQL. The intention of this repository is to become a simple place for people to provide examples of queries they've found useful.\nWe encourage all to contribute so that this can become something valuable to the community.\n\nSimple or complex, all input is welcome.\n\n## Further Reading\n\n* [Prometheus Main Site](https://prometheus.io/)\n* [Prometheus Docs](https://prometheus.io/docs/introduction/overview/)\n* [Prometheus Alert Rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/)\n* [Robust Perception](https://www.robustperception.io/blog/)\n\n\n\n# PromQL Examples\n\nThese examples are formatted as [recording rules](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/), but can be used as normal expressions.\n\nPlease ensure all examples are submitted in the same format, we'd like to keep this nice and easy to read and maintain.\nThe examples may contain some metric names and labels that aren't present on your system, if you're looking to re-use these then make sure validate the labels and metric names match your system.\n\n---\n\n#### Show Overall CPU usage for a server\n```yaml\n- record: instance:node_cpu_utilization_percent:rate5m\n  expr: 100 * (1 - avg by(instance)(irate(node_cpu{mode='idle'}[5m])))\n```\n*Summary:* Often useful to newcomers to Prometheus looking to replicate common host CPU checks. This query ultimately provides an overall metric for CPU usage, per instance. It does this by a calculation based on the `idle` metric of the CPU, working out the overall percentage of the other states for a CPU in a 5 minute window and presenting that data per `instance`.\n\n---\n\n#### Track http error rates as a proportion of total traffic\n```yaml\n- record: job_instance_method_path:demo_api_request_errors_50x_requests:rate5m\n  expr: \u003e\n    rate(demo_api_request_duration_seconds_count{status=\"500\",job=\"demo\"}[5m]) * 50\n      \u003e on(job, instance, method, path)\n    rate(demo_api_request_duration_seconds_count{status=\"200\",job=\"demo\"}[5m])\n```\n*Summary:* This query selects the 500-status rate for any job, instance, method, and path combinations for which the 200-status rate is not at least 50 times higher than the 500-status rate. The rate function has been used here as it's designed to be used with the counters in this query.\n\n*link:* [Julius Volz - Tutorial](https://www.digitalocean.com/community/tutorials/how-to-query-prometheus-on-ubuntu-14-04-part-2)\n\n---\n\n#### 90th Percentile latency\n```yaml\n- record: instance:demo_api_90th_over_50ms_and_requests_over_1:rate5m\n  expr: \u003e\n    histogram_quantile(0.9, rate(demo_api_request_duration_seconds_bucket{job=\"demo\"}[5m])) \u003e 0.05\n      and\n    rate(demo_api_request_duration_seconds_count{job=\"demo\"}[5m]) \u003e 1\n```\n*Summary:*  Select any HTTP endpoints that have a 90th percentile latency higher than 50ms (0.05s) but only for the dimensional combinations that receive more than one request per second. We use the `histogram_quantile()` function for the percentile calculation here. It calculates the 90th percentile latency for each sub-dimension. To filter the resulting bad latencies and retain only those that receive more than one request per second. `histogram_quantile` is only suitable for usage with a Histogram metric.\n\n*link:* [Julius Volz - Tutorial](https://www.digitalocean.com/community/tutorials/how-to-query-prometheus-on-ubuntu-14-04-part-2)\n\n---\n\n#### HTTP request rate, per second.. an hour ago\n```yaml\n- record: instance:api_http_requests_total:offset_1h_rate5m\n  expr: rate(api_http_requests_total{status=500}[5m] offset 1h)\n```\n\n*Summary:*  The `rate()` function calculates the per-second average rate of time series in a range vector. Combining all the above tools, we can get the rates of HTTP requests of a specific timeframe. The query calculates the per-second rates of all HTTP requests that occurred in the last 5 minutes, an hour ago. Suitable for usage on a `counter` metric.\n\n*Link:* [Tom Verelst - Ordina](https://ordina-jworks.github.io/monitoring/2016/09/23/Monitoring-with-Prometheus.html)\n\n---\n\n#### Kubernetes Container Memory Usage\n```yaml\n- record: kubernetes_pod_name:container_memory_usage_bytes:sum\n  expr: sum by(kubernetes_pod_name) (container_memory_usage_bytes{kubernetes_namespace=\"kube-system\"})\n```\n\n*Summary:* How much memory are the tools in the kube-system namespace using? Break it down by Pod and NameSpace!\n\n*Link:* [Joe Bowers - CoreOS](https://coreos.com/blog/monitoring-kubernetes-with-prometheus.html)\n\n---\n\n#### Most expensive time series\n```yaml\n- record: metric_name:metrics:top_ten_count\n  expr: topk(10, count by (__name__)({__name__=~\".+\"}))\n```\n\n*Summary:* Which are your most expensive time series to store? When tuning Prometheus, these quries can help you monitor your most expensive metrics. Be cautious, this query is expensive to run.\n\n*Link:* [Brian Brazil - Robust Perception](https://www.robustperception.io/which-are-my-biggest-metrics/)\n\n---\n\n#### Most expensive time series\n```yaml\n- record: job:metrics:top_ten_count\n  expr: topk(10, count by (job)({__name__=~\".+\"}))\n```\n\n*Summary:* Which of your jobs have the most timeseries? Be cautious, this query is expensive to run.\n\n*Link:* [Brian Brazil - Robust Perception](https://www.robustperception.io/which-are-my-biggest-metrics/)\n\n---\n\n#### Which Alerts have been firing?\n```yaml\n- record: alerts_fired:24h\n  expr:   sort_desc(sum(sum_over_time(ALERTS{alertstate=`firing`}[24h])) by (alertname))\n```\n\n*Summary:* Which of your Alerts have been firing the most? Useful to track alert trends.\n\n---\n\n\n# Alert Rules Examples\n\nThese are examples of rules you can use with Prometheus to trigger the firing of an event, usually to the Prometheus alertmanager application. You can refer to the [official documentation](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) for more information.\n\n```yaml\n- alert: \u003calert name\u003e\n  expr: \u003cexpression\u003e\n  for: \u003cduration\u003e\n  labels:\n    label_name: \u003clabel value\u003e\n  annotations:\n    annotation_name: \u003cannotation value\u003e\n```\n\n#### Disk Will Fill in 4 Hours\n```yaml\n- alert: PreditciveHostDiskSpace\n  expr: predict_linear(node_filesystem_free{mountpoint=\"/\"}[4h], 4 * 3600) \u003c 0\n  for: 30m\n  labels:\n    severity: warning\n  annotations:\n    description: 'Based on recent sampling, the disk is likely to will fill on volume\n      {{ $labels.mountpoint }} within the next 4 hours for instace: {{ $labels.instance_id\n      }} tagged as: {{ $labels.instance_name_tag }}'\n    summary: Predictive Disk Space Utilisation Alert\n```\n*Summary:* Asks Prometheus to predict if the hosts disks will fill within four hours, based upon the last hour of sampled data. In this example, we are returning AWS EC2 specific labels to make the alert more readable.\n\n---\n\n#### Alert on High Memory Load\n```yaml\n- expr: (sum(node_memory_MemTotal) - sum(node_memory_MemFree + node_memory_Buffers + node_memory_Cached) ) / sum(node_memory_MemTotal) * 100 \u003e 85\n```\n*Summary:* Trigger an alert if the memory of a host is almost full. This is done by deducting the total memory by the free, buffered and cached memory and dividing it by total again to obtain a percentage. The `\u003e 85` will only return when the resulting value is above 85.\n\n*Link:* [Stefan Prodan - Blog](https://stefanprodan.com/2016/a-monitoring-solution-for-docker-hosts-containers-and-containerized-services/)\n\n---\n\n#### Alert on High CPU utilisation\n```yaml\n- alert: HostCPUUtilisation\n  expr: 100 - (avg by(instance) (irate(node_cpu{mode=\"idle\"}[5m])) * 100) \u003e 70\n  for: 20m\n  labels:\n    severity: warning\n  annotations:\n    description: 'High CPU utilisation detected for instance {{ $labels.instance_id\n      }} tagged as: {{ $labels.instance_name_tag }}, the utilisation is currently:\n      {{ $value }}%'\n    summary: CPU Utilisation Alert\n```\n*Summary:* Trigger an alert if a host's CPU becomes over 70% utilised for 20 minutes or more.\n\n---\n\n#### Alert if Prometheus is throttling\n```yaml\n- alert: PrometheusIngestionThrottling\n  expr: prometheus_local_storage_persistence_urgency_score \u003e 0.95\n  for: 1m\n  labels:\n    severity: warning\n  annotations:\n    description: Prometheus cannot persist chunks to disk fast enough. It's urgency\n      value is {{$value}}.\n    summary: Prometheus is (or borderline) throttling ingestion of metrics\n```\n*Summary:* Trigger an alert if Prometheus begins to throttle its ingestion. If you see this, some TLC is required.\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinityworks%2Fprometheus-example-queries","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finfinityworks%2Fprometheus-example-queries","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinityworks%2Fprometheus-example-queries/lists"}