https://github.com/basil/crisis-tools
Docker image with Linux crisis tools for debugging performance issues in Kubernetes production environments
https://github.com/basil/crisis-tools
Last synced: 4 months ago
JSON representation
Docker image with Linux crisis tools for debugging performance issues in Kubernetes production environments
- Host: GitHub
- URL: https://github.com/basil/crisis-tools
- Owner: basil
- License: apache-2.0
- Created: 2025-08-06T17:46:33.000Z (10 months ago)
- Default Branch: master
- Last Pushed: 2025-10-20T16:12:12.000Z (7 months ago)
- Last Synced: 2025-10-20T18:13:18.696Z (7 months ago)
- Language: Dockerfile
- Homepage: https://hub.docker.com/r/basilcrow/crisis-tools
- Size: 37.1 KB
- Stars: 1
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# Linux Crisis Tools
This repository provides a `Dockerfile` to create a container image with pre-installed Linux crisis tools, as recommended in Brendan Gregg’s blog post “[Linux Crisis Tools](https://www.brendangregg.com/blog/2024-03-24/linux-crisis-tools.html).” The image is designed for debugging performance issues in Kubernetes production environments, ensuring essential diagnostic tools are readily available without installation delays during outages.
## Purpose
When a performance issue causes an outage in a Kubernetes cluster, installing diagnostic tools on the fly can waste critical time. This Docker image pre-installs a comprehensive set of Linux crisis tools to enable rapid debugging of performance bottlenecks in Kubernetes production environments.
## Tools Included
The `Dockerfile` installs the following packages, as recommended by Brendan Gregg, on an Ubuntu base image:
| Package | Provides | Notes |
| ----------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------- |
| `procps` | `ps(1)`, `vmstat(1)`, `uptime(1)`, `top(1)` | Basic stats |
| `util-linux` | `dmesg(1)`, `lsblk(1)`, `lscpu(1)` | System log, device info |
| `sysstat` | `iostat(1)`, `mpstat(1)`, `pidstat(1)`, `sar(1)` | Device stats |
| `iproute2` | `ip(1)`, `ss(8)`, `nstat(8)`, `tc(8)` | Preferred net tools |
| `numactl` | `numastat(8)` | NUMA stats |
| `tcpdump` | `tcpdump(8)` | Network sniffer |
| `linux-tools-common`, `linux-tools-$(uname -r)` | `perf(1)`, `turbostat(8)` | Profiler and PMU stats |
| `bpfcc-tools` | `opensnoop(1)`, `execsnoop(8)`, `runqlat(8)`, `biotop(8)`, `biosnoop(8)`, `biolatency(8)`, `tcptop(8)`, `tcplife(8)`, `trace(8)`, `argdist(8)`, `funccount(8)`, `profile(8)`, etc. | Canned eBPF tools |
| `bpftrace` | `bpftrace(8)`, basic versions of `opensnoop(8)`, `execsnoop(8)`, `runqlat(8)`, `biosnoop(8)`, etc. | eBPF scripting |
| `trace-cmd` | `trace-cmd(1)` | Ftrace CLI |
| `nicstat` | `nicstat(1)` | Net device stats |
| `ethtool` | `ethtool(8)` | Net device info |
| `tiptop` | `tiptop(1)` | PMU/PMC top |
> [!NOTE]
> Some tools (e.g., `bpfcc-tools`, `bpftrace`) require kernel headers and specific privileges to function fully. Ensure your Kubernetes environment grants necessary permissions (e.g., `SYS_ADMIN` capabilities or privileged mode) for eBPF and tracing tools.
## Prerequisites
- Kubernetes cluster for deploying the container.
- Familiarity with Kubernetes debugging workflows (e.g., `kubectl exec` for accessing containers).
- For eBPF tools (e.g., `bpfcc-tools`, `bpftrace`), ensure the Kubernetes node kernel supports BPF and the container has appropriate capabilities.
## Usage in Kubernetes
To use this image in a Kubernetes cluster for debugging:
1. Create a pod manifest to run the image with appropriate permissions. For example:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: crisis-tools-pod
spec:
hostPID: true
hostUsers: true
securityContext:
seccompProfile:
type: Unconfined
volumes:
- name: sys-kernel-debug
hostPath:
path: /sys/kernel/debug
type: Directory
- name: sys-fs-bpf
hostPath:
path: /sys/fs/bpf
type: Directory
containers:
- name: crisis-tools
image: docker.io/basilcrow/crisis-tools:latest
securityContext:
privileged: true
volumeMounts:
- name: sys-kernel-debug
mountPath: /sys/kernel/debug
- name: sys-fs-bpf
mountPath: /sys/fs/bpf
command: ["sleep", "infinity"]
```
2. Apply the pod manifest:
```bash
$ kubectl apply -f crisis-tools-pod.yaml
```
3. Use `kubectl exec` to access the pod and run diagnostic commands:
```bash
$ kubectl exec -it crisis-tools-pod -- bash
```
4. Inside the pod, you can run diagnostic commands like:
```bash
# uptime
# dmesg | tail
# vmstat 1
# mpstat -P ALL 1
# pidstat 1
# iostat -xz 1
# free -m
# sar -n DEV 1
# sar -n TCP,ETCP 1
# top
# opensnoop.bt
# tcpdump -i eth0
# perf stat -a sleep 10
# bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
```
> [!WARNING]
> Running privileged containers or granting capabilities like `SYS_ADMIN` can pose security risks. Use this image only in controlled debugging scenarios and remove the pod after use.
## Contributing
If you believe additional crisis tools should be included or have improvements to the `Dockerfile`, please open an issue or submit a pull request. Ensure any suggested tools align with the goal of lightweight, essential diagnostics for Kubernetes production environments.
## License
This repository is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for details.