{"id":16355295,"url":"https://github.com/mikewolfd/raprom","last_synced_at":"2026-01-28T23:01:48.480Z","repository":{"id":203584420,"uuid":"129416669","full_name":"mikewolfd/raprom","owner":"mikewolfd","description":null,"archived":false,"fork":false,"pushed_at":"2018-04-14T18:51:01.000Z","size":1431,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"nvidia-smi","last_synced_at":"2025-05-28T12:42:44.034Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mikewolfd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-04-13T14:54:51.000Z","updated_at":"2018-04-14T18:51:02.000Z","dependencies_parsed_at":null,"dependency_job_id":"44e1825f-707d-4ca7-8203-85da1b5355c8","html_url":"https://github.com/mikewolfd/raprom","commit_stats":null,"previous_names":["mikewolfd/raprom"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mikewolfd/raprom","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mikewolfd%2Fraprom","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mikewolfd%2Fraprom/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mikewolfd%2Fraprom/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mikewolfd%2Fraprom/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mikewolfd","download_url":"https://codeload.github.com/mikewolfd/raprom/tar.gz/refs/heads/nvidia-smi","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mikewolfd%2Fraprom/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28854425,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-28T22:56:21.783Z","status":"ssl_error","status_checked_at":"2026-01-28T22:56:00.861Z","response_time":57,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-11T01:40:25.555Z","updated_at":"2026-01-28T23:01:48.460Z","avatar_url":"https://github.com/mikewolfd.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"dockprom\n========\n\nA monitoring solution for Docker hosts and containers with [Prometheus](https://prometheus.io/), [Grafana](http://grafana.org/), [cAdvisor](https://github.com/google/cadvisor), \n[NodeExporter](https://github.com/prometheus/node_exporter) and alerting with [AlertManager](https://github.com/prometheus/alertmanager).\n\n***If you're looking for the Docker Swarm version please go to [stefanprodan/swarmprom](https://github.com/stefanprodan/swarmprom)***\n\n## Install\n\nClone this repository on your Docker host, cd into dockprom directory and run compose up:\n\n```bash\ngit clone https://github.com/stefanprodan/dockprom\ncd dockprom\n\nADMIN_USER=admin ADMIN_PASSWORD=admin docker-compose up -d\n```\n\nPrerequisites:\n\n* Docker Engine \u003e= 1.13\n* Docker Compose \u003e= 1.11\n\nContainers:\n\n* Prometheus (metrics database) `http://\u003chost-ip\u003e:9090`\n* AlertManager (alerts management) `http://\u003chost-ip\u003e:9093`\n* Grafana (visualize metrics) `http://\u003chost-ip\u003e:3000`\n* NodeExporter (host metrics collector)\n* cAdvisor (containers metrics collector)\n* Caddy (reverse proxy and basic auth provider for prometheus and alertmanager) \n\n## Setup Grafana\n\nNavigate to `http://\u003chost-ip\u003e:3000` and login with user ***admin*** password ***admin***. You can change the credentials in the compose file or by supplying the `ADMIN_USER` and `ADMIN_PASSWORD` environment variables on compose up.\n\nGrafana is preconfigured with dashboards and Prometheus as the default data source:\n\n* Name: Prometheus\n* Type: Prometheus\n* Url: http://prometheus:9090\n* Access: proxy\n\n***Docker Host Dashboard***\n\n![Host](https://raw.githubusercontent.com/stefanprodan/dockprom/master/screens/Grafana_Docker_Host.png)\n\nThe Docker Host Dashboard shows key metrics for monitoring the resource usage of your server:\n\n* Server uptime, CPU idle percent, number of CPU cores, available memory, swap and storage\n* System load average graph, running and blocked by IO processes graph, interrupts graph\n* CPU usage graph by mode (guest, idle, iowait, irq, nice, softirq, steal, system, user)\n* Memory usage graph by distribution (used, free, buffers, cached)\n* IO usage graph (read Bps, read Bps and IO time)\n* Network usage graph by device (inbound Bps, Outbound Bps)\n* Swap usage and activity graphs\n\nFor storage and particularly Free Storage graph, you have to specify the fstype in grafana graph request.\nYou can find it in `grafana/dashboards/docker_host.json`, at line 480 :\n\n      \"expr\": \"sum(node_filesystem_free{fstype=\\\"btrfs\\\"})\",\n      \nI work on BTRFS, so i need to change `aufs` to `btrfs`.\n\nYou can find right value for your system in Prometheus `http://\u003chost-ip\u003e:9090` launching this request :\n\n      node_filesystem_free\n\n***Docker Containers Dashboard***\n\n![Containers](https://raw.githubusercontent.com/stefanprodan/dockprom/master/screens/Grafana_Docker_Containers.png)\n\nThe Docker Containers Dashboard shows key metrics for monitoring running containers:\n\n* Total containers CPU load, memory and storage usage\n* Running containers graph, system load graph, IO usage graph\n* Container CPU usage graph\n* Container memory usage graph\n* Container cached memory usage graph\n* Container network inbound usage graph\n* Container network outbound usage graph\n\nNote that this dashboard doesn't show the containers that are part of the monitoring stack.\n\n***Monitor Services Dashboard***\n\n![Monitor Services](https://raw.githubusercontent.com/stefanprodan/dockprom/master/screens/Grafana_Prometheus.png)\n\nThe Monitor Services Dashboard shows key metrics for monitoring the containers that make up the monitoring stack:\n\n* Prometheus container uptime, monitoring stack total memory usage, Prometheus local storage memory chunks and series\n* Container CPU usage graph\n* Container memory usage graph\n* Prometheus chunks to persist and persistence urgency graphs\n* Prometheus chunks ops and checkpoint duration graphs\n* Prometheus samples ingested rate, target scrapes and scrape duration graphs\n* Prometheus HTTP requests graph\n* Prometheus alerts graph\n\nI've set the Prometheus retention period to 200h and the heap size to 1GB, you can change these values in the compose file.\n\n```yaml\n  prometheus:\n    image: prom/prometheus\n    command:\n      - '-storage.local.target-heap-size=1073741824'\n      - '-storage.local.retention=200h'\n```\n\nMake sure you set the heap size to a maximum of 50% of the total physical memory. \n\n## Define alerts\n\nI've setup three alerts configuration files:\n\n* Monitoring services alerts [targets.rules](https://github.com/stefanprodan/dockprom/blob/master/prometheus/targets.rules)\n* Docker Host alerts [host.rules](https://github.com/stefanprodan/dockprom/blob/master/prometheus/host.rules)\n* Docker Containers alerts [containers.rules](https://github.com/stefanprodan/dockprom/blob/master/prometheus/containers.rules)\n\nYou can modify the alert rules and reload them by making a HTTP POST call to Prometheus:\n\n```\ncurl -X POST http://admin:admin@\u003chost-ip\u003e:9090/-/reload\n```\n\n***Monitoring services alerts***\n\nTrigger an alert if any of the monitoring targets (node-exporter and cAdvisor) are down for more than 30 seconds:\n\n```yaml\nALERT monitor_service_down\n  IF up == 0\n  FOR 30s\n  LABELS { severity = \"critical\" }\n  ANNOTATIONS {\n      summary = \"Monitor service non-operational\",\n      description = \"{{ $labels.instance }} service is down.\",\n  }\n```\n\n***Docker Host alerts***\n\nTrigger an alert if the Docker host CPU is under hight load for more than 30 seconds:\n\n```yaml\nALERT high_cpu_load\n  IF node_load1 \u003e 1.5\n  FOR 30s\n  LABELS { severity = \"warning\" }\n  ANNOTATIONS {\n      summary = \"Server under high load\",\n      description = \"Docker host is under high load, the avg load 1m is at {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}.\",\n  }\n```\n\nModify the load threshold based on your CPU cores.\n\nTrigger an alert if the Docker host memory is almost full:\n\n```yaml\nALERT high_memory_load\n  IF (sum(node_memory_MemTotal) - sum(node_memory_MemFree + node_memory_Buffers + node_memory_Cached) ) / sum(node_memory_MemTotal) * 100 \u003e 85\n  FOR 30s\n  LABELS { severity = \"warning\" }\n  ANNOTATIONS {\n      summary = \"Server memory is almost full\",\n      description = \"Docker host memory usage is {{ humanize $value}}%. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}.\",\n  }\n```\n\nTrigger an alert if the Docker host storage is almost full:\n\n```yaml\nALERT hight_storage_load\n  IF (node_filesystem_size{fstype=\"aufs\"} - node_filesystem_free{fstype=\"aufs\"}) / node_filesystem_size{fstype=\"aufs\"}  * 100 \u003e 85\n  FOR 30s\n  LABELS { severity = \"warning\" }\n  ANNOTATIONS {\n      summary = \"Server storage is almost full\",\n      description = \"Docker host storage usage is {{ humanize $value}}%. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}.\",\n  }\n```\n\n***Docker Containers alerts***\n\nTrigger an alert if a container is down for more than 30 seconds:\n\n```yaml\nALERT jenkins_down\n  IF absent(container_memory_usage_bytes{name=\"jenkins\"})\n  FOR 30s\n  LABELS { severity = \"critical\" }\n  ANNOTATIONS {\n    summary= \"Jenkins down\",\n    description= \"Jenkins container is down for more than 30 seconds.\"\n  }\n```\n\nTrigger an alert if a container is using more than 10% of total CPU cores for more than 30 seconds:\n\n```yaml\n ALERT jenkins_high_cpu\n  IF sum(rate(container_cpu_usage_seconds_total{name=\"jenkins\"}[1m])) / count(node_cpu{mode=\"system\"}) * 100 \u003e 10\n  FOR 30s\n  LABELS { severity = \"warning\" }\n  ANNOTATIONS {\n    summary= \"Jenkins high CPU usage\",\n    description= \"Jenkins CPU usage is {{ humanize $value}}%.\"\n  }\n```\n\nTrigger an alert if a container is using more than 1,2GB of RAM for more than 30 seconds:\n\n```yaml\nALERT jenkins_high_memory\n  IF sum(container_memory_usage_bytes{name=\"jenkins\"}) \u003e 1200000000\n  FOR 30s\n  LABELS { severity = \"warning\" }\n  ANNOTATIONS {\n      summary = \"Jenkins high memory usage\",\n      description = \"Jenkins memory consumption is at {{ humanize $value}}.\",\n  }\n```\n\n## Setup alerting\n\nThe AlertManager service is responsible for handling alerts sent by Prometheus server. \nAlertManager can send notifications via email, Pushover, Slack, HipChat or any other system that exposes a webhook interface. \nA complete list of integrations can be found [here](https://prometheus.io/docs/alerting/configuration).\n\nYou can view and silence notifications by accessing `http://\u003chost-ip\u003e:9093`.\n\nThe notification receivers can be configured in [alertmanager/config.yml](https://github.com/stefanprodan/dockprom/blob/master/alertmanager/config.yml) file.\n\nTo receive alerts via Slack you need to make a custom integration by choose ***incoming web hooks*** in your Slack team app page. \nYou can find more details on setting up Slack integration [here](http://www.robustperception.io/using-slack-with-the-alertmanager/).\n\nCopy the Slack Webhook URL into the ***api_url*** field and specify a Slack ***channel***.\n\n```yaml\nroute:\n    receiver: 'slack'\n\nreceivers:\n    - name: 'slack'\n      slack_configs:\n          - send_resolved: true\n            text: \"{{ .CommonAnnotations.description }}\"\n            username: 'Prometheus'\n            channel: '#\u003cchannel\u003e'\n            api_url: 'https://hooks.slack.com/services/\u003cwebhook-id\u003e'\n```\n\n![Slack Notifications](https://raw.githubusercontent.com/stefanprodan/dockprom/master/screens/Slack_Notifications.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmikewolfd%2Fraprom","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmikewolfd%2Fraprom","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmikewolfd%2Fraprom/lists"}