https://github.com/silentspike/project-sentinel

Reference testbed for runtime governance of LLM coding agents — per-agent sandboxing (bwrap + Landlock + cgroups + netns), event-sourced audit trails, three control planes, 9/9-passing breakout tests.
https://github.com/silentspike/project-sentinel
agent-runtime agents anthropic bwrap ebpf ecs event-sourcing go governance landlock llm runtime-governance rust sandbox sandbox-isolation
Last synced: 22 days ago
JSON representation
Host: GitHub
URL: https://github.com/silentspike/project-sentinel
Owner: silentspike
License: mit
Created: 2026-02-11T00:39:41.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-05-28T19:23:23.000Z (25 days ago)
Last Synced: 2026-05-28T19:26:20.946Z (25 days ago)
Topics: agent-runtime, agents, anthropic, bwrap, ebpf, ecs, event-sourcing, go, governance, landlock, llm, runtime-governance, rust, sandbox, sandbox-isolation
Language: Rust
Size: 3.34 MB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 9
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
- Security: .github/SECURITY.md
- Governance: docs/governance.md
Awesome Lists containing this project

README

          # Project Sentinel

[![CI](https://github.com/silentspike/project-sentinel/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/silentspike/project-sentinel/actions/workflows/ci.yml)

[![CodeQL](https://github.com/silentspike/project-sentinel/actions/workflows/codeql.yml/badge.svg?branch=main)](https://github.com/silentspike/project-sentinel/actions/workflows/codeql.yml)

[![OSSF Scorecard](https://github.com/silentspike/project-sentinel/actions/workflows/scorecard.yml/badge.svg?branch=main)](https://github.com/silentspike/project-sentinel/actions/workflows/scorecard.yml)

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

[![Release](https://img.shields.io/github/v/release/silentspike/project-sentinel?include_prereleases&label=release)](https://github.com/silentspike/project-sentinel/releases)

[![Stack: Rust 1.93+ / Go 1.26+](https://img.shields.io/badge/stack-rust%201.93%2B%20%2F%20go%201.26%2B-orange.svg)](#)

A reference testbed for runtime governance of LLM coding agents:

sandbox each agent, audit each action, and verify failure modes before

customers run agents against production code.

When teams put LLM agents into real workflows, three operational questions

come back:

- How are they sandboxed?

- How are their actions audited?

- What happens when something goes wrong?

Project Sentinel makes those questions concrete. It runs a synthetic

office workload — sixty personas across three shifts, with real LLM calls —

and underneath it the runtime layer an organization would actually

operate: per-agent sandboxing (bwrap + Landlock + cgroups + netns),

event-sourced audit trails, three independent control planes, and a

9/9-passing breakout test report.

The full stack is documented as a TOGAF v22.1 architecture and runs on a

provisioned VM. The included docker demo is a deliberate behavioral

subset: it shows the workload and dashboard, but not the kernel-bound

parts (eBPF, Landlock, FUSE) that need a real host.

[Architecture Guide (TOGAF v22.1)](docs/architecture/togaf-architecture-guide.html) ·

[Sandbox Test Report (9/9)](docs/security-test-report.md) ·

[Demo](#demo-one-command)

## Why It Exists

Three things are hard to study without a believable, persistent, multi-agent

environment:

1. **Sandbox primitives at scale.** What does bwrap + Landlock + cgroups

   v2 + netns actually cost when 26 agents tick simultaneously? Where do

   the breakouts come from when nobody is looking? The

   [security test report](docs/security-test-report.md) records 9/9

   breakout tests passing.

2. **Controlplane design.** Three independent observe / decide / act /

   verify loops (Agent CP, Platform CP, API CP) co-exist. Each owns one

   decision domain, none reach across. See

   [docs/governance.md](docs/governance.md).

3. **Boundary detection.** Pattern detector for agent self-recognition (15

   regex + two-stage LLM judge) measures when a generation surfaces awareness

   markers; the synthesis engine intercepts ~70% of routine perceptions

   before they reach a real LLM call. See

   [Research Context](#research-context) for the narrative convention that

   underpins the workload.

## Architecture at a Glance

```mermaid

flowchart TB

  subgraph AGENTS["Agent Layer · 60 LLM personas"]

    A1["51 shift-bound (3 shifts × 17)"]

    A2["9 always-on duty staff"]

  end

  subgraph SANDBOX["Sandbox Stack (per agent)"]

    S1["bwrap (user-namespaces)"]

    S2["Landlock LSM"]

    S3["cgroups v2"]

    S4["netns + nftables"]

    S5["Wasmtime (tool runtime)"]

  end

  subgraph CP["Three Controlplanes — Observe → Decide → Act → Verify"]

    direction LR

    AGCP["Agent CP
(bio · perception)"]

    PLCP["Platform CP
(infra · health)"]

    APCP["API CP
(cost · routing)"]

  end

  STORE["Event Store
Limbo SQLite · append-only
Lamport ordering · hash-chain"]

  subgraph GATEWAY["Cortex Gateway (Go)"]

    G1["7-step proxy + guardrails"]

    G2["10-rule synthesis engine"]

  end

  subgraph BRIDGE["Quality + Memory Plane"]

    J1["Sentinel Judge
(NATS · drift · quality)"]

    J2["NATS Bridge
(Limbo → JetStream)"]

    J3["Hippocampus
(NMDA night-run)"]

  end

  DASH["Dashboard
Bun + Hono + WebSocket"]

  AGENTS -.->|"sandboxed in"| SANDBOX

  AGENTS -->|prompts| GATEWAY

  GATEWAY -->|emit events| STORE

  STORE -->|projections| DASH

  STORE -->|stream| BRIDGE

  CP -.->|govern| AGENTS

  CP -.->|govern| GATEWAY

  CP -.->|govern| STORE

  BRIDGE -->|alerts + metrics| DASH

```

| Layer            | Tech                                      |

|------------------|-------------------------------------------|

| World simulation | Rust workspace (15 crates), `bevy_ecs`    |

| LLM gateway      | Go (`cmd/cortex-gateway`)                 |

| Quality monitor  | Go (`services/sentinel-judge`)            |

| Dashboard        | Bun + Hono + vanilla-JS (`dashboard/`)    |

| Pub/Sub          | Zenoh (Rust SHM <10 µs) + NATS JetStream  |

| Storage          | redb (state) + Limbo SQLite (events)      |

For a terminal-friendly plain-text view of the same data flow see

[Architecture Details](#architecture-details) further down.

For per-cluster implementation status see

[docs/togaf-gap-v22.md](docs/togaf-gap-v22.md).

For deliberate deviations from the spec see

[docs/togaf-deviations-v22.md](docs/togaf-deviations-v22.md).

## Quick Start

### Prerequisites

| Tool        | Version  | Purpose                       |

|-------------|----------|-------------------------------|

| Rust        | 1.93+    | ECS world, all Rust crates    |

| Go          | 1.23+    | Gateway, judge, nats-bridge   |

| Bun         | 1.x      | Dashboard                     |

| cargo-remote (optional) | latest | Remote build server  |

| Docker + Compose | 24+ | Demo stack                    |

### Configure

Sentinel takes deployment-specific values from a single local file. Copy

the templates and fill in your own values:

```bash

cp .env.example .env

cp .make.local.example .make.local

```

The `.env` file holds runtime values (NATS URL, dashboard port). The

`.make.local` file holds build values (cargo remote server address, deploy

target). Neither file is committed.

### Generate a Company Config

Gaia can bootstrap a fresh Sentinel company configuration without LLM calls:

```bash

cargo run -p sentinel-gaia -- print-example-spec > /tmp/gaia-spec.toml

cargo run -p sentinel-gaia -- init --spec /tmp/gaia-spec.toml --output-dir /tmp/sentinel-config --yes

cargo run -p sentinel-gaia -- validate --output-dir /tmp/sentinel-config

```

For an operator-style smoke, build `sentinel-daemon` and add

`--daemon-dry-run --daemon-bin ` to `init`. Gaia

persists its own input as `gaia-spec.toml` and deliberately leaves

`company.toml` to the Gateway/company-context schema.

### Build

```bash

make ci          # full: fmt + clippy + test + cargo-deny + typos

make build       # workspace build

make test        # all tests

```

If you have cargo-remote configured for offload builds, those targets

transparently use it.

### Demo (one command)

![Sentinel demo dashboard](docs/images/sentinel-demo.gif)

*The dashboard surfaces runtime governance signals: control-plane decisions, sandbox enforcer status, audit-event throughput, and agent quality drift.*

```bash

make demo                                 # build binaries + image, then run

# or, step by step:

make demo-binaries                        # build sentinel-daemon + sentinel-nightrun

make demo-image                           # docker build

./scripts/demo.sh                         # run + open dashboard, tear down after 10 min

```

The Rust workspace is heavy. `make demo-binaries` uses `cargo-remote`

against a build server if `.cargo-remote.toml` is present, otherwise

falls back to a local `cargo build --release` (~8 GB RAM, ~20 min on

a developer laptop). See [CONTRIBUTING.md](CONTRIBUTING.md) for

cargo-remote setup if you want to offload the Rust compile.

Runs five agents through a 10-minute morning shift with the default

workload configuration. Dashboard: http://localhost:18000 (host port

18000 is used because 8000 is commonly bound by local nginx/dev servers;

adjust in `docker-compose.demo.yml` if you have 8000 free).

#### What the docker demo shows — and what it does not

The compose stack is deliberately a **behavioral demo**, not a full

production deployment. It is meant to give a recruiter or curious reader

a working dashboard in one command, not to reproduce the full sandbox

story.

| Feature                                 | Demo container | VM deploy |

|-----------------------------------------|----------------|-----------|

| ECS world, Bio-Engine, Physics          | yes            | yes       |

| Event sourcing + projections + dashboard| yes            | yes       |

| Cortex Gateway pipeline + synthesis     | yes            | yes       |

| NATS JetStream + sentinel-judge         | yes            | yes       |

| **bwrap + Landlock per-agent isolation**| no (warned)    | yes       |

| **cgroups v2 per-agent resource caps**  | no (warned)    | yes       |

| **netns + nftables agent network**      | no (warned)    | yes       |

| **eBPF probes (aya-rs)**                | no (warned)    | yes       |

| **sentinel-fs CAS-FUSE**                | no (warned)    | yes       |

| Zenoh SHM transport                     | no (TCP only)  | yes       |

These kernel-bound features need user namespaces, `CAP_BPF`,

`CAP_SYS_ADMIN`, `CAP_NET_ADMIN`, and a writeable bpf-fs / `/dev/fuse`.

A plain unprivileged container has none of those. The

`SandboxEnforcer` (`crates/sentinel-sandbox/src/enforcer.rs`) detects

the absence at boot and degrades gracefully — warnings in the daemon

log are the expected demo signal.

For the full stack with sandbox enforcement see

`deploy/systemd/*.service`, the deployment notes in

[docs/governance.md](docs/governance.md), and the

[TOGAF v22.1 Architecture Guide](docs/architecture/togaf-architecture-guide.html).

## Customer Workshop Path

For engineering leadership and DevSecOps teams evaluating runtime

governance for AI coding agents, the recommended walkthrough is a

45-minute hands-on session:

1. Architecture overview (10 min): TOGAF v22.1 guide, three control planes, sandbox stack.

2. Hands-on demo (15 min): start the demo stack, observe agent activity, replay events.

3. Sandbox-config inspection (10 min): bwrap + Landlock + cgroups policy walkthrough.

4. 9/9 breakout test report review (5 min): what the tests prove, what they don't.

5. Q&A + production deployment caveats (5 min).

Full agenda: [`docs/workshop-agent-runtime-governance.md`](docs/workshop-agent-runtime-governance.md).

## Demo: What it proves and what it doesn't

The included docker demo (`make demo`) is a deliberate behavioral

subset. It is meant to give a recruiter or curious reader a working

dashboard in one command, not to reproduce the full sandbox story.

### What the demo proves

- ECS world simulation, bio-engine, physics, room sim — 60-persona

  workload runs end-to-end on a 5-agent subset.

- Event sourcing (Limbo SQLite, idempotent, replayable) — full audit

  trail captured per agent.

- Cortex Gateway 7-step pipeline + 10-rule synthesis engine — agent

  reasoning is observable.

- Dashboard (Bun + Hono + WebSocket) — live agent activity, drift,

  quality metrics.

### What the demo does not exercise

The kernel-bound sandbox primitives (per-agent isolation) require

`CAP_BPF`, `CAP_SYS_ADMIN`, `CAP_NET_ADMIN`, user namespaces, and a

writeable bpf-fs / `/dev/fuse`. A plain unprivileged Docker container

has none of those. The `SandboxEnforcer`

(`crates/sentinel-sandbox/src/enforcer.rs`) detects the absence at boot

and degrades gracefully — warnings in the daemon log are the expected

demo signal.

For the full stack with sandbox enforcement (bwrap + Landlock + cgroups

+ netns + nftables + Wasmtime) see `deploy/systemd/*.service` and the

[TOGAF v22.1 architecture guide](docs/architecture/togaf-architecture-guide.html).

### Verified by external tests

[Sandbox Test Report](docs/security-test-report.md): 9/9 breakout

tests pass on a privileged host.

## Status — what works in this alpha, what doesn't yet

Kernel-bound features are **not missing** — they are *implemented + tested

but not deploy-able in the docker demo*. The VM deploy is the production

target; the docker demo is a deliberate behavioral subset.

| Area | Status | Demo-Container | VM-Deploy |

|------|--------|----------------|-----------|

| ECS world (bevy_ecs), bio + physics + room sim | ✅ implemented + exercised | yes | yes |

| Event sourcing (Limbo SQLite, idempotent, replayable) | ✅ implemented + exercised | yes | yes |

| Cortex Gateway 7-step pipeline + 10-rule synthesis engine | ✅ implemented + exercised | yes | yes |

| Dashboard (Bun + Hono + WebSocket) | ✅ implemented + exercised | yes | yes |

| sentinel-judge quality + drift monitoring (NATS streaming) | ✅ implemented + exercised | yes | yes |

| sentinel-projection CQRS read-models | ✅ implemented + exercised | yes | yes |

| sentinel-nightrun batch consolidation, deterministic replay | ✅ implemented, manual trigger | yes | yes |

| **bwrap + Landlock per-agent isolation** | ✅ implemented + 9/9 breakout-tested (`crates/sentinel-sandbox/`) | **no (kernel-caps)** | **yes** |

| **cgroups v2 per-agent caps** | ✅ implemented | **no (kernel-caps)** | **yes** |

| **netns + nftables agent network** | ✅ implemented | **no (kernel-caps)** | **yes** |

| **eBPF probes (aya-rs)** | ✅ implemented | **no (kernel-caps)** | **yes** |

| **sentinel-fs CAS-FUSE** | ✅ implemented | **no (FUSE)** | **yes** |

| TOGAF v22.1 architecture guide + per-cluster gap report | ✅ shipped in `docs/architecture/` | n/a | n/a |

| 60 LLM-persona agents (`config/agents/AGENT-*.toml`) | ✅ defined; demo runs a 5-agent subset | partial (5/60) | yes (full 60) |

| Pre-built demo binaries (linux-x86_64) on every release | ✅ since v0.1.0-alpha | yes | yes |

| CodeQL pipeline | ✅ green on main | n/a | n/a |

| Tag verified-badge on GitHub | ✅ verified=true (Ed25519) | n/a | n/a |

| OpenGraph social-preview image | ⏳ image in repo (`docs/images/opengraph-preview.png`); upload via repo Settings → Social preview pending (#351) | n/a | n/a |

| Demo binaries for arm64 / Apple Silicon | ⏳ planned (currently linux-x86_64 only) | n/a | n/a |

| Multi-tenant company configs (`sentinel-gaia`) | ✅ deterministic generator + CLI init/validate path | n/a | yes (offline config generation + daemon dry-run) |

See [docs/known-limitations.md](docs/known-limitations.md) for the full

caveat list.

## Repository Layout

| Path                         | Contents                                                    |

|------------------------------|-------------------------------------------------------------|

| `crates/`                    | 17 Rust crates (ECS, bio, physics, sandbox, eBPF, etc.)     |

| `services/sentinel-gaia/`    | Gaia company-config generator + CLI                         |

| `services/sentinel-daemon/`  | Daemon + controlplane                                       |

| `services/sentinel-judge/`   | Quality / drift monitor (Go)                                |

| `services/sentinel-nightrun/`| Nightly consolidation (Rust)                                |

| `services/sentinel-nats-bridge/` | NATS event bridge (Go)                                  |

| `cmd/cortex-gateway/`        | LLM proxy + synthesis (Go)                                  |

| `dashboard/`                 | Bun + Hono real-time UI                                     |

| `pkg/sentinel-go/`           | Shared Go package (judge heuristics, eventstore, messaging) |

| `config/`                    | Agent TOMLs, room layout, simulation parameters             |

| `docs/`                      | Architecture, governance, gap, deviations, glossary         |

| `deploy/`                    | systemd units, release manifest schema                      |

| `.github/workflows/`         | 16 CI workflows (build, test, security, supply chain)       |

## Documentation

| Doc                                                          | Purpose                                       |

|--------------------------------------------------------------|-----------------------------------------------|

| [llms.txt](llms.txt)                                         | LLM-friendly project index (read first)       |

| [docs/architecture/togaf-architecture-guide.html](docs/architecture/togaf-architecture-guide.html) | Authoritative architecture reference (v22.1) |

| [docs/governance.md](docs/governance.md)                     | Governance mechanisms ↔ code path mapping     |

| [docs/togaf-gap-v22.md](docs/togaf-gap-v22.md)               | Per-cluster implementation status             |

| [docs/togaf-deviations-v22.md](docs/togaf-deviations-v22.md) | Intentional deviations from the spec          |

| [docs/component-readmes.md](docs/component-readmes.md)        | Component-level README index for Rust/Go modules |

| [docs/glossary.md](docs/glossary.md)                         | Agent-persona narrative + agent-layer glossary |

| [docs/security-test-report.md](docs/security-test-report.md) | Sandbox breakout test results                 |

| [docs/workshop-agent-runtime-governance.md](docs/workshop-agent-runtime-governance.md) | 45-min hands-on workshop: how to evaluate runtime governance for LLM coding agents |

| [docs/research-context.md](docs/research-context.md)         | Synthetic-workload personality model + role taxonomy + ethics |

| [examples/](examples/)                                       | Copy-pasteable runtime-governance walkthroughs (sandbox policy, audit replay, control-plane isolation) |

| [CONTRIBUTING.md](CONTRIBUTING.md)                           | How to contribute                             |

| [SECURITY.md](SECURITY.md)                                   | Reporting vulnerabilities                     |

| [CHANGELOG.md](CHANGELOG.md)                                 | Release history                               |

## Architecture Details

Plain-text alternative to the [Mermaid diagram above](#architecture-at-a-glance),

useful for terminal-only viewers and screen-readers. Same data flow, lower

fidelity:

```

Deterministic (ECS)              Probabilistic (LLM)

┌─────────────────────┐          ┌──────────────────────────────────┐

│ bevy_ecs World      │          │ Cortex Gateway                   │

│ Bio / Physics       │ ───────> │ 7-step pipeline                  │

│ 60 agent slots      │ <─────── │ Synthesis engine                 │

│ Event Store         │          │ Self-recognition pattern detector│

└─────────────────────┘          └──────────────────────────────────┘

          │                                   │

          └─────────── Event Sourcing ────────┘

                 (sentinel-limbo, append-only)

```

For full architectural depth (clusters, controlplane internals, deviation

register) see the

[TOGAF v22.1 architecture guide](docs/architecture/togaf-architecture-guide.html)

and the gap report in [docs/togaf-gap-v22.md](docs/togaf-gap-v22.md).

## Release status

This is the first **public** release boundary. The project was developed

privately prior to `v0.1.0-alpha`; the tag marks the boundary between

private development and public visibility, not the start of the project.

CI on `main`: ci, lint, coverage, supply-chain (cargo-deny, npm-audit,

go-vuln, rust-audit), conventional-commits, dependency-freshness — green.

CodeQL goes green on the first scheduled run after the public flip

(GHAS gating). Security: dependency audit + `gitleaks` + `trufflehog` clean,

9/9 sandbox breakout tests passing on a privileged host.

See [docs/known-limitations.md](docs/known-limitations.md) for full caveats

and the [Status table above](#status--what-works-in-this-alpha-what-doesnt-yet)

for the per-feature picture.

## Research Context

The synthetic office workload is a deliberate stress-test for the runtime

layer. The personality model, role taxonomy, and bio-state mechanism are

documented in [docs/research-context.md](docs/research-context.md). The

platform underneath is the work; the workload is the evaluation.

## Why this proof matters

When customers evaluate AI coding agent deployment, three runtime

questions come back:

- *"How is the agent isolated from production?"* — sandbox stack

  (bwrap + Landlock + cgroups + netns), 9/9 breakout tests passing.

- *"What evidence remains for review?"* — event sourcing on Limbo

  SQLite, deterministic replay, hash-chained audit trail.

- *"Who decides what the agent can do?"* — three independent control

  planes (Agent CP, Platform CP, API CP), each owning a single

  decision domain.

This repo is not a product. It is a **reference implementation** that

makes those questions concrete. The TOGAF v22.1 architecture is the

contract; the docker demo is a reduced behavioral subset (see Demo

section above).

## License

See [LICENSE](LICENSE).
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/silentspike/project-sentinel

Awesome Lists containing this project

README