{"id":25239275,"url":"https://github.com/seatgeek/nomad-crashloop-detector","last_synced_at":"2025-10-26T14:30:43.284Z","repository":{"id":64306768,"uuid":"97017620","full_name":"seatgeek/nomad-crashloop-detector","owner":"seatgeek","description":"detect Nomad allocation crash-loops, by consuming the allocation stream from nomad-firehose","archived":false,"fork":false,"pushed_at":"2017-07-12T14:52:56.000Z","size":16,"stargazers_count":5,"open_issues_count":0,"forks_count":2,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-06-20T03:34:53.351Z","etag":null,"topics":["devops","hashicorp","nomad","rabbitmq"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/seatgeek.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-07-12T14:30:29.000Z","updated_at":"2020-06-15T18:47:23.000Z","dependencies_parsed_at":"2023-01-15T10:45:26.917Z","dependency_job_id":null,"html_url":"https://github.com/seatgeek/nomad-crashloop-detector","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seatgeek%2Fnomad-crashloop-detector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seatgeek%2Fnomad-crashloop-detector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seatgeek%2Fnomad-crashloop-detector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seatgeek%2Fnomad-crashloop-detector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/seatgeek","download_url":"https://codeload.github.com/seatgeek/nomad-crashloop-detector/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238343319,"owners_count":19456204,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["devops","hashicorp","nomad","rabbitmq"],"created_at":"2025-02-11T18:14:55.789Z","updated_at":"2025-10-26T14:30:42.975Z","avatar_url":"https://github.com/seatgeek.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# nomad-crashloop-detector\n\n`nomad-crashloop-detector` is a tool meant to detect allocation crash-loops, by consuming the allocation stream from [nomad-firehose](https://github.com/seatgeek/nomad-firehose) in RabbitMQ.\n\n## Running\n\nThe project got build artifacts for linux, darwin and windows in the [GitHub releases tab](https://github.com/seatgeek/nomad-crashloop-detector/releases).\n\nA docker container is also provided at [seatgeek/nomad-crashloop-detector](https://hub.docker.com/r/seatgeek/nomad-crashloop-detector/tags/)\n\n## Requirements\n\n- Go 1.8\n\n## Building\n\nTo build a binary, run the following\n\n```shell\n# get this repo\ngo get github.com/seatgeek/nomad-crashloop-detector\n\n# go to the repo directory\ncd $GOPATH/src/github.com/seatgeek/nomad-crashloop-detector\n\n# build the `nomad-crashloop-detector` binary\nmake build\n```\n\nThis will create a `nomad-crashloop-detector` binary in your `$GOPATH/bin` directory.\n\n## Configuration\n\nAny `NOMAD_*` env that the native `nomad` CLI tool supports are supported by this tool.\n\n- `$AMQP_CONNECTION` is identical to `$SINK_AMQP_CONNECTION`, but is for the consuming stream from `nomad-firehose`\n- `$AMQP_QUEUE` is the RabbitMQ queue to consume the `nomad-firehose` from.\n- `$RESTART_COUNT` how many restarts to allow within `$RESTART_INTERVAL` time (example: `5`)\n- `$RESTART_INTERVAL` within what time frame `$RESTART_COUNT` allocation restarts must happen to trigger an notification (example: `5m`)\n- `$NOTIFICATION_INTERVAL` how often a notification should happen on a crash-looping allocation (example: `5m`)\n\n## Sinks\n\nThe sink type is configured using `$SINK_TYPE` environment variable. Valid values are: `stdout`, `kinesis` and `amqp`.\n\nThe `amqp` sink is configured using `$SINK_AMQP_CONNECTION` (`amqp://guest:guest@127.0.0.1:5672/`), `$SINK_AMQP_EXCHANGE` and `$SINK_AMQP_ROUTING_KEY` environment variables.\n\nThe `kinesis` sink is configured using `$SINK_KINESIS_STREAM_NAME` and `$SINK_KINESIS_PARTITION_KEY` environment variables.\n\nThe `stdout` sink do not have any configuration, it will simply output the JSON to stdout for debugging.\n\n## Example\n\nAssuming the following setup:\n\n- `nomad` exchange (type=topic)\n- `nomad.crash-loop-in` queue which is bound to `nomad` exchange with routing key `allocations`\n- `nomad.crash-loop-out` queue which is bound to `nomad` exchange with routing key `crash-loop`\n\nRunning `nomad-firehose`:\n\n```sh\nSINK_TYPE=amqp \\\nSINK_AMQP_CONNECTION=\"amqp://guest:guest@127.0.0.1:5672/\" \\\nSINK_AMQP_EXCHANGE=nomad \\\nSINK_AMQP_ROUTING_KEY=allocations \\\nnomad-firehose allocations\n```\n\nRunning `nomad-crashloop-detector`:\n\n```sh\nRESTART_COUNT=2 \\\nRESTART_INTERVAL=5m \\\nNOTIFICATION_INTERVAL=5m \\\nSINK_TYPE=amqp \\\nSINK_AMQP_CONNECTION=\"amqp://guest:guest@127.0.0.1:5672/\" \\\nSINK_AMQP_EXCHANGE=nomad \\\nSINK_AMQP_ROUTING_KEY=crash-loop \\\nAMQP_CONNECTION=$SINK_AMQP_CONNECTION \\\nAMQP_QUEUE=nomad.crash-loop-in \\\nnomad-crashloop-detector\n```\n\nThe setup will make `nomad-firehose` send all nomad allocation changes to the `nomad` exchange, that will forward messages to the `nomad.crash-loop-in` queue.\n`nomad-crashloop-detector` will consume the messages in `nomad.crash-loop-in`, and when a restart threshold is reached, submit a AMQP job to the `nomad` exchange, which will redirect the message to `nomad.crash-loop-in`.\n\n## Example crash-loop payload\n\n```json\n{\n    \"LastEvent\": {\n        \"Name\": \"job.task[0]\",\n        \"AllocationID\": \"fd4deb1f-405b-93a6-3eb4-a84e0670049d\",\n        \"DesiredStatus\": \"run\",\n        \"DesiredDescription\": \"\",\n        \"ClientStatus\": \"running\",\n        \"ClientDescription\": \"\",\n        \"JobID\": \"job\",\n        \"GroupName\": \"group\",\n        \"TaskName\": \"task\",\n        \"EvalID\": \"db0064ab-a44d-e450-4f66-2cabbec536bb\",\n        \"TaskState\": \"pending\",\n        \"TaskFailed\": false,\n        \"TaskStartedAt\": \"2017-07-12T13:56:30.932498912Z\",\n        \"TaskFinishedAt\": \"0001-01-01T00:00:00Z\",\n        \"TaskEvent\": {\n            \"Type\": \"Restarting\",\n            \"Time\": 1499867806677609000,\n            \"FailsTask\": false,\n            \"RestartReason\": \"Restart within policy\",\n            \"SetupError\": \"\",\n            \"DriverError\": \"\",\n            \"DriverMessage\": \"\",\n            \"ExitCode\": 0,\n            \"Signal\": 0,\n            \"Message\": \"\",\n            \"KillReason\": \"\",\n            \"KillTimeout\": 0,\n            \"KillError\": \"\",\n            \"StartDelay\": 17425840945,\n            \"DownloadError\": \"\",\n            \"ValidationError\": \"\",\n            \"DiskLimit\": 0,\n            \"DiskSize\": 0,\n            \"FailedSibling\": \"\",\n            \"VaultError\": \"\",\n            \"TaskSignalReason\": \"\",\n            \"TaskSignal\": \"\"\n        }\n    },\n    \"EventLog\": [\n        \"2017-07-12T15:56:15.401013209+02:00\",\n        \"2017-07-12T15:56:46.677608921+02:00\"\n    ]\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseatgeek%2Fnomad-crashloop-detector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fseatgeek%2Fnomad-crashloop-detector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseatgeek%2Fnomad-crashloop-detector/lists"}