Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jasei/check_docker_oomkiller

nagios/icinga check of OOM killed containers
https://github.com/jasei/check_docker_oomkiller

docker golang icinga nagios oom-killed-containers

Last synced: 1 day ago
JSON representation

nagios/icinga check of OOM killed containers

Awesome Lists containing this project

README

        

# check_docker_OOMkiller

## SYNOPSIS

```
$ check_docker_OOMkiller -l /tmp/check_docker_oomkiller
WARNING: Container 1da1ac35179e4a431244118025a4806f88ca172ffdba9e035076add06e912e6f (stress) was killed by OOM killer
exit status 1
```

## DESCRIPTION
this is plugin for nagios, icinga or compatible monitoring

this plugin list non-running containers and check which was killed by OOM killer (means container have less memory then need)

because I don't each run of this plugin report same OOMKilled containers, is possible save last checked container (id) to file

this plugin comunicate with docker API via `unix:///var/run/docker.sock` and must have right rights (root/docker-group)

### how it works?
this plugin list do `inspect` to all non-running container and check State.OOMKilled flag

this plugin do same like this command:
```
docker ps -a -q --filter=status=exited --filter=status=dead --filter=since=$LAST_CHECKED_CONTAINER_ID | xargs docker inspect --format "{{.State.OOMKilled}} {{.ID}} {{.Config.Image}}" | grep -E '^true'
```

## OPTIONS
* `-l` - file path to persist last checked container id
* `-w` - OOM killed container is report as warning (default)
* `-c` - OOM killed container is report as critical
* `--format` - Format of output use go-templates like docker inspect (default "Container {{.ID}} ({{.Config.Image}}) was killed by OOM killer")
* `slack` - slack token
* `slackChannel` - slack (fallback) channel(s) to post message
* `slackUser` - slack bot custom username (default are `OOM killer`)
* `debug` - Enable debug mode. Debug prints are print to STDERR
* `debugFile` - Redirect debug prints to file (must be set debug option too)

## slack support
since version 1.1.0 this check supports slack

since version 2.0.0 we have breaking changes of slack support

1. create new bot via https://YOUR-SLACK.slack.com/apps/manage/custom-integrations

2. insert your token to option `--slack YOUR_TOKEN`

3. use option `slackChannel` as fallback channel (if docker/container don't have set `SLACK_CONTACT`) to send message
(support multiple channels) `--slackChannel A --slackChannel B`

4. to send DM or docker specific channel, use docker [image or container label](https://docs.docker.com/engine/userguide/labels-custom-metadata/#value-guidelines) `SLACK_CONTACT`

* for send to DM use `@user` syntax
* for send to channel use `#channel` syntax
* is possible send to more DM/channel - use coma (`,`) as separator

### example image label (Dockerfile)

```
FROM ...
LABEL SLACK_CONTACT "@user,@other_user"
...
```

### example container label

```
docker run --label SLACK_CONTACT="@user,@otheruser"
```

## WHY EXISTS THIS PLUGIN?

of course is possible find OOM killer in /var/log/messages (dmesg)

```
Dec 17 00:34:31 fanatica kernel: stress invoked oom-killer: gfp_mask=0x24000c0(GFP_KERNEL), order=0, oom_score_adj=0
Dec 17 00:34:31 fanatica kernel: stress cpuset=docker-1da1ac35179e4a431244118025a4806f88ca172ffdba9e035076add06e912e6f.scope mems_allowed=0
Dec 17 00:34:31 fanatica kernel: CPU: 1 PID: 12906 Comm: stress Tainted: P OE 4.8.11-200.fc24.x86_64 #1
Dec 17 00:34:31 fanatica kernel: Hardware name: Dell Inc. Precision T1600/06NWYK, BIOS A07 10/17/2011
Dec 17 00:34:31 fanatica kernel: 0000000000000286 00000000250c8593 ffffa232b5bb3c50 ffffffffbb3e5f4d
Dec 17 00:34:31 fanatica kernel: ffffa232b5bb3d30 ffffa23554033d00 ffffa232b5bb3cb8 ffffffffbb24c308
Dec 17 00:34:31 fanatica kernel: ffffa2359d219580 ffffa23474e29e80 ffffffffbb1bcd86 0000000000080000
Dec 17 00:34:31 fanatica kernel: Call Trace:
Dec 17 00:34:31 fanatica kernel: [] dump_stack+0x63/0x86
Dec 17 00:34:31 fanatica kernel: [] dump_header+0x5c/0x1d5
Dec 17 00:34:31 fanatica kernel: [] ? find_lock_task_mm+0x36/0x80
Dec 17 00:34:31 fanatica kernel: [] oom_kill_process+0x20c/0x3d0
Dec 17 00:34:31 fanatica kernel: [] ? mem_cgroup_iter+0x105/0x2d0
Dec 17 00:34:31 fanatica kernel: [] mem_cgroup_out_of_memory+0x2ce/0x310
Dec 17 00:34:31 fanatica kernel: [] mem_cgroup_oom_synchronize+0x33b/0x350
Dec 17 00:34:31 fanatica kernel: [] ? get_mem_cgroup_from_mm+0xa0/0xa0
Dec 17 00:34:31 fanatica kernel: [] pagefault_out_of_memory+0x4c/0xc0
Dec 17 00:34:31 fanatica kernel: [] mm_fault_error+0x94/0x190
Dec 17 00:34:31 fanatica kernel: [] __do_page_fault+0x484/0x4d0
Dec 17 00:34:31 fanatica kernel: [] do_page_fault+0x30/0x80
Dec 17 00:34:31 fanatica kernel: [] page_fault+0x28/0x30
Dec 17 00:34:31 fanatica kernel: Task in /system.slice/docker-1da1ac35179e4a431244118025a4806f88ca172ffdba9e035076add06e912e6f.scope killed as a result of limit of /system.slice/docker-1da1ac35179e4a431244118025a4806f88ca172ffdba9e035076add06e912e6f.scope
Dec 17 00:34:31 fanatica kernel: memory: usage 1048576kB, limit 1048576kB, failcnt 70428
Dec 17 00:34:31 fanatica kernel: memory+swap: usage 2097080kB, limit 2097152kB, failcnt 17
Dec 17 00:34:31 fanatica kernel: kmem: usage 4708kB, limit 9007199254740988kB, failcnt 0
Dec 17 00:34:31 fanatica kernel: Memory cgroup stats for /system.slice/docker-1da1ac35179e4a431244118025a4806f88ca172ffdba9e035076add06e912e6f.scope: cache:0KB rss:1043868KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:82044KB swap:1048504KB inactive_anon:521952KB active_anon:521912KB inactive_file:0KB active_file:0KB unevictable:0KB
Dec 17 00:34:31 fanatica kernel: [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 17 00:34:31 fanatica kernel: [12870] 0 12870 1866 0 9 3 23 0 stress
Dec 17 00:34:31 fanatica kernel: [12906] 0 12906 526155 239002 1031 5 284061 0 stress
Dec 17 00:34:31 fanatica kernel: Memory cgroup out of memory: Kill process 12906 (stress) score 999 or sacrifice child
Dec 17 00:34:31 fanatica kernel: Killed process 12906 (stress) total-vm:2104620kB, anon-rss:956008kB, file-rss:0kB, shmem-rss:0kB
```

but you see only name of application in docker (for java application you see `java invoked oom-killer: ...`)

if you can pair with container, you find docker id (for example `cpuset=docker-.scope`) and then run `docker inspect $ID`

this plugin do this without the need parsing /var/log/messages file (and without parsing related problems like logrotate, format, localization, ...)