https://github.com/nudgebee/node-agent
Per-node observability agent for Kubernetes and Linux hosts. Gathers container and host metrics, logs, and L7 traffic via eBPF; exports to Prometheus and OpenTelemetry. Includes LLM API observability.
https://github.com/nudgebee/node-agent
ebpf golang kubernetes llm-observability monitoring node-agent observability opentelemetry prometheus sre
Last synced: 8 days ago
JSON representation
Per-node observability agent for Kubernetes and Linux hosts. Gathers container and host metrics, logs, and L7 traffic via eBPF; exports to Prometheus and OpenTelemetry. Includes LLM API observability.
- Host: GitHub
- URL: https://github.com/nudgebee/node-agent
- Owner: nudgebee
- License: apache-2.0
- Created: 2024-01-15T11:38:23.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2026-06-02T04:30:13.000Z (18 days ago)
- Last Synced: 2026-06-02T06:21:46.716Z (18 days ago)
- Topics: ebpf, golang, kubernetes, llm-observability, monitoring, node-agent, observability, opentelemetry, prometheus, sre
- Language: Go
- Size: 129 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md
- Notice: NOTICE
Awesome Lists containing this project
README
# Nudgebee Node Agent
[](https://github.com/nudgebee/node-agent/actions/workflows/ci.yml)
[](https://goreportcard.com/report/github.com/nudgebee/node-agent)
[](https://opensource.org/licenses/Apache-2.0)
A per-node observability agent for Kubernetes and Linux hosts. The agent
gathers container and host metrics, logs, and L7 traffic using eBPF and
exposes them in Prometheus format.
Minimum Linux kernel: **5.1** (eBPF CO-RE).
> This project is a fork of
> [coroot/coroot-node-agent](https://github.com/coroot/coroot-node-agent)
> with additional features developed at Nudgebee. See [NOTICE](NOTICE)
> for attribution and [CHANGELOG.md](CHANGELOG.md) for the list of
> divergences from upstream.
## Features
### Inherited from upstream
- **TCP connection tracing** — service map and inter-service latency,
derived from eBPF `connect()` / `accept()` / retransmit / RTT events.
- **Log pattern extraction** — clusters container logs into recurring
patterns at the node, drastically reducing log volume for analysis.
Reads from `/var/log/`, journald, dockerd, and containerd CRI logs.
- **Delay accounting** — per-container CPU and disk-wait metrics from
the kernel's [delay accounting](https://www.kernel.org/doc/html/latest/accounting/delay-accounting.html)
subsystem via Netlink.
- **OOM-kill events** — surfaces `container_oom_kills_total`.
- **Cloud instance metadata** — auto-detects AWS, GCP, Azure, Hetzner,
IBM Cloud, and Oracle Cloud and tags metrics with account ID,
instance type, region, AZ, lifecycle (spot/on-demand), and addresses.
- **GPU monitoring** (NVIDIA via NVML), JVM metrics (via JMX/jattach),
.NET and Node.js process detection.
### Added in this fork
- **[LLM observability](docs/llm-observability.md)** — detects calls
to OpenAI, Anthropic, AWS Bedrock, Google AI / Vertex, Azure OpenAI,
Cohere, and OpenAI-compatible endpoints from eBPF-traced TLS
sessions, and emits per-request metrics including model name, token
counts, latency, time-to-first-token, and estimated cost.
- **[IP-to-FQDN resolver](docs/ip-fqdn-resolver.md)** — enriches
outbound flows with hostnames and K8s workload identity by
combining Service/Pod/Node informers with the DNS cache, so the
service map shows `payments-api` instead of `10.0.5.41`.
- **Enhanced L7 protocol detection** — TLS SNI extraction, lightweight
HTTP/2 parsing with HPACK, improved Go TLS capture, Node.js TLS,
FoundationDB.
- **PSI cgroup metrics** — pressure-stall info for CPU, memory, and IO.
- **Stability fixes** — graceful shutdown, bounded caches, label
cardinality controls, panic guards in L7 parsers, OOM mitigations.
## Installation
### Kubernetes (DaemonSet)
```sh
kubectl apply -f https://raw.githubusercontent.com/nudgebee/node-agent/main/manifests/nudgebee-node-agent.yaml
```
This creates the `nudgebee` namespace and a privileged DaemonSet that
exposes `/metrics` on port 80.
### systemd (bare-metal)
```sh
curl -fsSL https://raw.githubusercontent.com/nudgebee/node-agent/main/install.sh | sudo sh -
```
Pass `-v vX.Y.Z` to pin to a specific release. The script writes a
systemd unit at `/etc/systemd/system/nudgebee-node-agent.service` and
starts it.
### Container image
Multi-arch images (linux/amd64, linux/arm64) are published to GHCR on
every tag:
```
ghcr.io/nudgebee/node-agent:
ghcr.io/nudgebee/node-agent:.
ghcr.io/nudgebee/node-agent:
```
## Metrics
The agent exposes Prometheus metrics on `:80/metrics`. Self-identifying
label: `job="nudgebee-node-agent"`.
- Nudgebee-specific metrics:
- [LLM observability](docs/llm-observability.md#metrics-emitted)
- [IP-to-FQDN resolver](docs/ip-fqdn-resolver.md)
- The full inherited metric catalogue is documented upstream at
[docs.coroot.com/metrics/node-agent](https://docs.coroot.com/metrics/node-agent).
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md). Bug reports and pull requests
are welcome.
## Security
Please report vulnerabilities privately per [SECURITY.md](SECURITY.md).
## License
This project is licensed under the [Apache License, Version 2.0](LICENSE).
The eBPF C code in `ebpftracer/ebpf/` is licensed under the GNU General
Public License, Version 2.0; see [LICENSES/GPL-2.0.txt](LICENSES/GPL-2.0.txt).
See [NOTICE](NOTICE) for attribution to the original
coroot/coroot-node-agent project.