{"id":13785588,"url":"https://github.com/sepulworld/deadman-check","last_synced_at":"2025-08-21T11:33:26.931Z","repository":{"id":62556886,"uuid":"85531572","full_name":"sepulworld/deadman-check","owner":"sepulworld","description":"Monitoring companion for Nomad periodic jobs and Cron","archived":false,"fork":false,"pushed_at":"2022-06-11T18:55:38.000Z","size":66,"stargazers_count":57,"open_issues_count":0,"forks_count":4,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-12-10T08:51:10.191Z","etag":null,"topics":["aws-sns","consul","cron-healthcheck","deadman-switch","docker","monitor","monitoring","nomad","nomad-batch-jobs","nomad-periodic-jobs"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sepulworld.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-03-20T03:28:26.000Z","updated_at":"2024-07-16T16:16:10.000Z","dependencies_parsed_at":"2022-11-03T06:15:21.385Z","dependency_job_id":null,"html_url":"https://github.com/sepulworld/deadman-check","commit_stats":null,"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sepulworld%2Fdeadman-check","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sepulworld%2Fdeadman-check/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sepulworld%2Fdeadman-check/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sepulworld%2Fdeadman-check/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sepulworld","download_url":"https://codeload.github.com/sepulworld/deadman-check/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229914452,"owners_count":18143894,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws-sns","consul","cron-healthcheck","deadman-switch","docker","monitor","monitoring","nomad","nomad-batch-jobs","nomad-periodic-jobs"],"created_at":"2024-08-03T19:01:02.127Z","updated_at":"2024-12-19T23:14:03.377Z","avatar_url":"https://github.com/sepulworld.png","language":"Ruby","funding_links":[],"categories":["docker","Infrastructure setup"],"sub_categories":["Monitoring and Logging"],"readme":"# Deadman Check\n\n![CodeQL](https://github.com/sepulworld/deadman-check/workflows/CodeQL/badge.svg)\n[![Gem Version](https://badge.fury.io/rb/deadman_check.svg)](http://badge.fury.io/rb/deadman_check)\n\nA monitoring sidecar for Nomad periodic [jobs](https://www.nomadproject.io/docs/job-specification/periodic.html) that alerts if the periodic job isn't\nrunning at the expected interval.\n\n\n# Overview\n1. [Requierments](#requirements)\n2. [Monitoring Modes](#monitoringmodes)\n3. [Alert Options](#alertoptions)\n4. [Example Usage](#exampleusage)\n5. [Alerting Setup](#alertingsetup)\n6. [CLI Usage](#cliusage)\n7. [Local System Installation Option](#localsysteminstallationoption)\n8. [Run deadman-check via Docker Option](#rundeadmanviadockeroption)\n9. [CLI Command Help](#clicommandhelp)\n  1. [Usage for key_set command](#usageforkeysetcommand)\n  2. [Usage for switch_monitor command](#usageforswitchmonitorcommand)\n10. [Development](#development)\n11. [Contributing](#contributing)\n12. [License](#license)\n\n## Requirements\n\n* [Consul](https://www.consul.io/)\n\n\n## Monitoring Modes\n\n1. Run with the Nomad periodic job as an additional [task](https://www.nomadproject.io/docs/job-specification/task.html). In this mode deadman-check will leverage a Consul key store to evaluate task frequency requirements. It uses [Epoch time](https://en.wikipedia.org/wiki/Unix_time) to verify task is running within time frequency required.\n\n2. Run as a stand alone process that can monitor a large grouping of jobs which are reporting time frequency values into a Consul key.\n\n\n## Alert Options\n\n* [Slack](https://slack.com/)\n\u003cimg width=\"752\" alt=\"screen shot 2017-03-26 at 3 29 28 pm\" src=\"https://cloud.githubusercontent.com/assets/538171/24335811/2e57eee8-1239-11e7-9fff-c8a10d956f2e.png\"\u003e\n\n* [AWS SNS](https://aws.amazon.com/documentation/sns/) - Broadcasting alerts and/or triggering [AWS Lambda functions](https://docs.aws.amazon.com/sns/latest/dg/sns-lambda.html) to run code\n\u003cimg width=\"903\" alt=\"screen shot 2017-08-04 at 11 39 12 am\" src=\"https://user-images.githubusercontent.com/538171/28982223-e576743c-7909-11e7-8e65-ebb0b4a76762.png\"\u003e\n\n## Example Usage\n\nLet's say I have a Nomad periodic job that is set to run every 10 minutes. The Nomad configuration looks like this:\n\n```hcl\njob \"SilverBulletPeriodic\" {\n  datacenters = [\"dc1\"]\n  type = \"batch\"\n\n  periodic {\n    cron             = \"*/10 * * * * *\"\n    prohibit_overlap = true\n  }\n\n  group \"utility\" {\n    task \"SilverBulletPeriodicProcess\" {\n      driver = \"docker\"\n      config {\n        image    = \"silverbullet:build_1\"\n        work_dir = \"/utility/silverbullet\"\n        command  = \"blaster\"\n      }\n      resources {\n        cpu = 100\n        memory = 500\n      }\n    }\n  }\n}\n```\n\nTo monitor the SilverBulletPeriodicProcess task let's add a deadmad-check task. The host input is the Consul endpoint required by deadman-check (In this case 10.0.0.10)\n\n```hcl\njob \"SilverBulletPeriodic\" {\n  datacenters = [\"dc1\"]\n  type = \"batch\"\n\n  periodic {\n    cron             = \"*/10 * * * * *\"\n    prohibit_overlap = true\n  }\n\n  group \"silverbullet\" {\n    task \"SilverBulletPeriodicProcess\" {\n      driver = \"docker\"\n      config {\n        image    = \"silverbullet:build_1\"\n        work_dir = \"/utility/silverbullet\"\n        command  = \"blaster\"\n      }\n      resources {\n        cpu = 100\n        memory = 500\n      }\n    }\n    task \"DeadmanSetSilverBulletPeriodicProcess\" {\n      driver = \"docker\"\n      config {\n        image    = \"sepulworld/deadman-check\"\n        command  = \"key_set\"\n        args     = [\n          \"--host\",\n          \"10.0.0.10\",\n          \"--port\",\n          \"8500\",\n          \"--key\",\n          \"deadman/SilverBulletPeriodicProcess\",\n          \"--frequency\",\n          \"700\"]\n      }\n      resources {\n        cpu = 100\n        memory = 256\n      }\n    }\n  }\n}\n```\n\u003cimg width=\"1215\" alt=\"screen shot 2017-04-23 at 11 14 36 pm\" src=\"https://cloud.githubusercontent.com/assets/538171/25324439/b65541d6-287a-11e7-9b6d-4e1c9565eed2.png\"\u003e\n\nThe Consul key, deadman/SilverBulletPeriodicProcess, at 10.0.0.10 will be updated with\nthe Epoch time for each SilverBulletPeriodic job run. If the job hangs or fails to run\nthe job frequency calculation will be in an alerting state. \n\nNext we need a job that will run to monitor this key.\n\n```hcl\njob \"DeadmanMonitoring\" {\n  datacenters = [\"dc1\"]\n  type = \"service\"\n\n  group \"monitor\" {\n    task \"DeadmanMonitorSilverBulletPeriodicProcess\" {\n      driver = \"docker\"\n      config {\n        image    = \"sepulworld/deadman-check\"\n        command  = \"switch_monitor\"\n        args     = [\n          \"--host\",\n          \"10.0.0.10\",\n          \"--port\",\n          \"8500\",\n          \"--key\",\n          \"deadman/SilverBulletPeriodicProcess\",\n          \"--alert-to-slack\",\n          \"slackroom\",\n          \"--daemon\",\n          \"--daemon-sleep\",\n          \"900\"]\n      }\n      resources {\n        cpu = 100\n        memory = 256\n      }\n      env {\n        SLACK_API_TOKEN = \"YourSlackApiToken\"\n      }\n    }\n  }\n}\n```\n\nMonitor a Consul key that contains an Epoch time entry. Send a Slack message if Epoch age hits given frequency threshold\n\n\u003cimg width=\"752\" alt=\"screen shot 2017-03-26 at 3 29 28 pm\" src=\"https://cloud.githubusercontent.com/assets/538171/24335811/2e57eee8-1239-11e7-9fff-c8a10d956f2e.png\"\u003e\n\nIf you have multiple periodic jobs that need to be monitored then use the ```--key-path``` argument instead of ```--key```. Be sure to ```key_set``` all under the same Consul key path.\n\n\u003cimg width=\"658\" alt=\"screen shot 2017-04-23 at 11 17 29 pm\" src=\"https://cloud.githubusercontent.com/assets/538171/25324510/14d6e7f0-287b-11e7-9c0d-733d69e1cc94.png\"\u003e\n\nTo monitor the above you would just use the ```--key-path``` argument instead of ```--key``` and AWS SNS for alerting endpoint\n\n```hcl\njob \"DeadmanMonitoring\" {\n  datacenters = [\"dc1\"]\n  type = \"service\"\n\n  group \"monitor\" {\n    task \"DeadmanMonitorSilverBulletPeriodicProcesses\" {\n      driver = \"docker\"\n      config {\n        image    = \"sepulworld/deadman-check\"\n        command  = \"switch_monitor\"\n        args     = [\n          \"--host\",\n          \"10.0.0.1\",\n          \"--port\",\n          \"8500\",\n          \"--key-path\",\n          \"deadman/\",\n          \"--alert-to-sns\",\n          \"arn:aws:sns:us-east-1:123412345678:deadman-check\",\n          \"--alert-to-sns-region\",\n          \"us-east-1\",\n          \"--daemon\",\n          \"--daemon-sleep\",\n          \"900\"]\n      }\n      resources {\n        cpu = 100\n        memory = 256\n      }\n      env {\n        AWS_ACCESS_KEY_ID = \"YourAWSKEY\"\n        AWS_SECRET_ACCESS_KEY = \"YourAWSSecret\"\n      }\n    }\n  }\n}\n```\n\n## Alerting Setup\n\n* Slack alerting requires a SLACK_API_TOKEN environment variable to be set (use [Slack Bot integration](https://my.slack.com/services/new/bot)) (optional)\n\n* [AWS SNS](https://aws.amazon.com/documentation/sns/) alerting requires appropreiate AWS IAM access to target SNS topic. One of the following can be used for authentication. IAM policy access to publish to the topic will be required\n  - ENV['AWS_ACCESS_KEY_ID'] and ENV['AWS_SECRET_ACCESS_KEY']\n  - The shared credentials ini file at ~/.aws/credentials (more information)\n  - From an [instance profile](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html) when running on EC2\n\n\n# CLI Usage:\n\n## Local System Installation Option \n\n```\ngem install deadman_check\n```\n\n## Run deadman-check via Docker Option\n\n```\n$ alias deadman-check='\\\n  docker run \\\n    -it --rm --name=deadman-check \\\n    sepulworld/deadman-check'\n```\n\n\n## CLI Command Help \n\n```bash\n$ deadman-check -h\n  NAME:\n\n    deadman-check\n\n  DESCRIPTION:\n\n    Monitor a Consul key or key-path that contains an EPOCH time entry and frequency. Send Slack message if EPOCH age is greater than given frequency\n\n  COMMANDS:\n\n    help           Display global or [command] help documentation\n    key_set        Update a given Consul key with current EPOCH\n    switch_monitor Target a Consul key to monitor\n\n  GLOBAL OPTIONS:\n\n    -h, --help\n        Display help documentation\n\n    -v, --version\n        Display version information\n\n    -t, --trace\n        Display backtrace when an error occurs\n```\n\n### Usage for key_set command\n\n```bash\n$ deadman-check key_set -h\n\n  NAME:\n\n    key_set\n\n  SYNOPSIS:\n\n    deadman-check key_set [options]\n\n  DESCRIPTION:\n\n    key_set will set a consul key that contains the current epoch and time frequency that job should be running at, example key {\"epoch\":1493010437,\"frequency\":\"300\"}\n\n  EXAMPLES:\n\n    # Update a Consul key deadman/myservice, with current EPOCH time\n    deadman-check key_set --host 127.0.0.1 --port 8500 --key deadman/myservice --frequency 300\n\n  OPTIONS:\n\n    --host HOST\n        IP address or hostname of Consul system\n\n    --port PORT\n        port Consul is listening on\n\n    --key KEY\n        Consul key to report EPOCH time and frequency for service\n\n    --frequency FREQUENCY\n        Frequency at which this key should be updated in seconds\n\n    --consul-token TOKEN\n        Consul KV access token (optional)\n```\n\n### Usage for switch_monitor command\n\n```bash\n$ deadman-check switch_monitor -h\n\n  NAME:\n\n    switch_monitor\n\n  SYNOPSIS:\n\n    deadman-check switch_monitor [options]\n\n  DESCRIPTION:\n\n    switch_monitor will monitor either a given key which contains a services last epoch checkin and frequency, or a series of services that set keys\nunder a given key-path in Consul\n\n  EXAMPLES:\n\n    # Target a Consul key deadman/myservice, and this key has an EPOCH value to check looking to alert\n    deadman-check switch_monitor --host 127.0.0.1 --port 8500 --key deadman/myservice --alert-to-slack my-slack-monitor-channel\n\n    # Target a Consul key path deadman/, which contains 2 or more service keys to monitor, i.e. deadman/myservice1, deadman/myservice2,\ndeadmman/myservice3 all fall under the path deadman/\n    deadman-check switch_monitor --host 127.0.0.1 --port 8500 --key-path deadman/ --alert-to-slack my-slack-monitor-channel\n\n    # Target a Consul key path deadman/, alert to Amazon SNS, i.e. deadman/myservice1, deadman/myservice2, deadmman/myservice3 all fall under the path\ndeadman/\n    deadman-check switch_monitor --host 127.0.0.1 --port 8500 --key-path deadman/ --alert-to-sns arn:aws:sns:*:123456789012:my_corporate_topic\n\n  OPTIONS:\n\n    --host HOST\n        IP address or hostname of Consul system\n\n    --port PORT\n        port Consul is listening on\n\n    --key-path KEYPATH\n        Consul key path to monitor, performs a recursive key lookup at given path.\n\n    --key KEY\n        Consul key to monitor, provide this or --key-path if you have multiple keys in a given path.\n\n    --alert-to-slack SLACKCHANNEL\n        Slack channel to send alert, don't include the # tag in name\n\n    --alert-to-sns SNSARN\n        Amazon Web Services SNS arn to send alert, example arn arn:aws:sns:*:123456789012:my_corporate_topic\n\n    --alert-to-sns-region AWSREGION\n        Amazon Web Services region the SNS topic is in, defaults to us-west-2\n\n    --daemon\n        Run as a daemon, otherwise will run check just once\n\n    --daemon-sleep SECONDS\n        Set the number of seconds to sleep in between switch checks, default 300\n\n    --consul-token TOKEN\n        Consul KV access token (optional)\n```\n\n### Development\n\nAfter checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.\n\nTo install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).\n\n### Contributing\n\nBug reports and pull requests are welcome on GitHub at https://github.com/sepulworld/deadman_check. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.\n\n\n### License\n\nThe gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsepulworld%2Fdeadman-check","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsepulworld%2Fdeadman-check","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsepulworld%2Fdeadman-check/lists"}