Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/xing/kubernetes-oom-event-generator
Generate a Kubernetes Event when a Pod's container has been OOMKilled
https://github.com/xing/kubernetes-oom-event-generator
events k8s kubernetes monitoring olympus prometheus prometheus-metrics
Last synced: 2 months ago
JSON representation
Generate a Kubernetes Event when a Pod's container has been OOMKilled
- Host: GitHub
- URL: https://github.com/xing/kubernetes-oom-event-generator
- Owner: xing
- License: apache-2.0
- Created: 2019-01-10T12:56:34.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2023-08-30T14:06:51.000Z (over 1 year ago)
- Last Synced: 2024-08-02T01:23:43.626Z (6 months ago)
- Topics: events, k8s, kubernetes, monitoring, olympus, prometheus, prometheus-metrics
- Language: Go
- Homepage:
- Size: 172 KB
- Stars: 161
- Watchers: 25
- Forks: 22
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: docs/CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-repositories - xing/kubernetes-oom-event-generator - Generate a Kubernetes Event when a Pod's container has been OOMKilled (Go)
- awesome-cloud-native - kubernetes-oom-event-generator - Generate a Kubernetes Event when a Pod's container has been OOMKilled. (OPS)
README
# kubernetes-oom-event-generator
[![Build Status](https://travis-ci.org/xing/kubernetes-oom-event-generator.svg?branch=master)](https://travis-ci.org/xing/kubernetes-oom-event-generator)
Generates Kubernetes Event when a container is starting and indicates that
it was previously out-of-memory killed.## Design
The Controller listens to the Kubernetes API for new Events and changes to
Events. Every time a notification regarding an Event is received it checks
whether this Event refers to a "ContainerStarted" event, based on the `Reason`
for the Event and the `Kind` of the involved object. If this is the case
and the Event constitutes a change (meaning it is not a not-changing update,
which happens when the resync, that is executed every two minutes, is run) it checks
the underlying Pod resource. Should the `LastTerminationState` of the Pod refer to
an OOM kill the controller will emit a Kubernetes Event with a level of `Warning`
and a reason of `PreviousContainerWasOOMKilled`.## Usage
Usage:
kubernetes-oom-event-generator [OPTIONS]Application Options:
-v, --verbose= Show verbose debug information [$VERBOSE]
--version Print version informationHelp Options:
-h, --help Show this help messageRun the pre-built image [`xingse/kubernetes-oom-event-generator`] locally (with
local permission):echo VERBOSE=2 >> .env
docker run --env-file=.env -v $HOME/.kube/config:/root/.kube/config xingse/kubernetes-oom-event-generator## Deployment
Example Clusterrole:
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: xing:controller:kubernetes-oom-event-generator
rules:
- apiGroups:
- ""
resources:
- pods
- pods/status
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- list
- watchRun this controller on Kubernetes with the following commands:
kubectl create serviceaccount kubernetes-oom-event-generator \
--namespace=kube-systemkubectl create -f path/to/example-clusterrole.yml
# alternatively run: `cat | kubectl create -f -` and paste the above example, hit Ctrl+D afterwards.kubectl create clusterrolebinding xing:controller:kubernetes-oom-event-generator \
--clusterrole=xing:controller:kubernetes-oom-event-generator \
--serviceaccount=kube-system:kubernetes-oom-event-generatorkubectl run kubernetes-oom-event-generator \
--image=xingse/kubernetes-oom-event-generator \
--env=VERBOSE=2 \
--serviceaccount=kubernetes-oom-event-generator \
--namespace=kube-system## Alerting on OOM killed pods
There are many different ways to send alerts when an OOM occurs. We just want to
mention two of them here.### Forwarding OOM events to Graylog
Graylog is a popular log management solution, and it includes an alerting feature.
See the [Graylog docs] for more details.At XING we forward all Kubernetes cluster events to Graylog using our
[kubernetes-event-forwarder-gelf]. This allows us to configure alerts whenever a
`PreviousContainerWasOOMKilled` event generated by the `kubernetes-oom-event-generator`
occurs.### Using kube-state-metrics and Prometheus alerts
When [kube-state-metrics] is deployed in the cluster and a [Prometheus] installation
is scraping the metrics, you can alert on OOM-killed pods using the prometheus alert manager.Example alert:
alert: ComponentOutOfMemory
expr: sum_over_time(kube_pod_container_status_terminated_reason{reason="OOMKilled"}[5m])
> 0
for: 10s
labels:
severity: warning
annotations:
description: Critical Pod {{$labels.namespace}}/{{$labels.pod}} was OOMKilled.The downside is that `kube_pod_container_status_terminated_reason` always returns to 0 once
a container starts back up. See the introduction of
[`kube_pod_container_status_last_terminated_reason`] for more details.# Developing
You will need a working Go installation (1.11+) and the `make` program. You will also
need to clone the project to a place outside you normal go code hierarchy (usually
`~/go`), as it uses the new [Go module system].All build and install steps are managed in the central `Makefile`. `make test` will fetch
external dependencies, compile the code and run the tests. If all goes well, hack along
and submit a pull request. You might need to modify the `go.mod` to specify desired
constraints on dependencies.Make sure to run `go mod tidy` before you check in after changing dependencies in any way.
[Go module system]: https://github.com/golang/go/wiki/Modules
[`xingse/kubernetes-oom-event-generator`]: https://hub.docker.com/r/xingse/kubernetes-oom-event-generator
[Graylog docs]: https://docs.graylog.org/
[kubernetes-event-forwarder-gelf]: https://github.com/xing/kubernetes-event-forwarder-gelf
[kube-state-metrics]: https://github.com/kubernetes/kube-state-metrics
[Prometheus]: https://prometheus.io
[`kube_pod_container_status_last_terminated_reason`]: https://github.com/kubernetes/kube-state-metrics/pull/535## Releases
Releases are a two-step process, beginning with a manual step:
* Create a release commit
* Increase the version number in [kubernetes-oom-event-generator.go/VERSION](kubernetes-oom-event-generator.go#20)
* Adjust the [CHANGELOG](CHANGELOG.md)
* Run `make release`, which will create an image, retrieve the version from the
binary, create a git tag and push both your commit and the tagThe Travis CI run will then realize that the current tag refers to the current master commit and
will tag the built docker image accordingly.