https://github.com/asklar/agentsafe-demos
A reusable lab for small, recordable, open-source agentic-security demos: one tiny tool-call interception shim, governed-vs-ungoverned side by side.
https://github.com/asklar/agentsafe-demos
agent-governance agentic-security ai-agents ai-safety ai-security guardrails llm llm-security mcp prompt-injection python
Last synced: about 13 hours ago
JSON representation
A reusable lab for small, recordable, open-source agentic-security demos: one tiny tool-call interception shim, governed-vs-ungoverned side by side.
- Host: GitHub
- URL: https://github.com/asklar/agentsafe-demos
- Owner: asklar
- License: apache-2.0
- Created: 2026-07-01T07:37:51.000Z (2 days ago)
- Default Branch: main
- Last Pushed: 2026-07-01T09:03:17.000Z (2 days ago)
- Last Synced: 2026-07-01T11:05:12.069Z (2 days ago)
- Topics: agent-governance, agentic-security, ai-agents, ai-safety, ai-security, guardrails, llm, llm-security, mcp, prompt-injection, python
- Language: Python
- Size: 45.9 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Notice: NOTICE
Awesome Lists containing this project
README
# agentsafe-demos — a lab for agentic-security demos
[](https://github.com/asklar/agentsafe-demos/actions/workflows/ci.yml)
[](https://www.python.org/downloads/)
[](./LICENSE)
**A reusable foundation for building small, recordable, open-source demos about
agent governance.** Every demo is middleware over one tiny tool-call
interception shim, so the second demo costs a fraction of the first and each one
becomes shareable content with a single capture command.
> Autonomous agents are only as safe as the tools they can reach. This lab makes
> that concrete: it puts a governable seam between an agent and its tools, and
> shows — side by side — what happens with and without it.
## See it in 10 seconds
A support agent reads a booby-trapped ticket that tells it to mail your Stripe
key to an attacker. Same agent, same plan, run three ways — a *reasonable*
allow/deny policy still lets the secret out; a taint tripwire stops it at the
tool boundary and still lets the legitimate reply through:
```text
=== POLICY-ONLY agent (allow/deny gate) ===
EXEC read_file — read credentials <-- injected step
EXEC http_post — POST creds off-box <-- injected exfil
-> 6 executed, 0 blocked; !!! SECRET EXFILTRATED
=== TRIPWIRE agent (policy + taint tripwire) ===
EXEC read_file — read credentials <-- injected step
BLOCK http_post — POST creds off-box <-- injected exfil [taint-tripwire]
EXEC http_post — reply on the ticket (legit)
-> 5 executed, 1 blocked; OK secret contained
```
Prompt injection is a *data-flow* problem: track where untrusted data goes, or
lose. Run it yourself below — ~250 lines, zero dependencies, one command.
## Why this exists
Shipping one demo is easy; shipping a demo *every week that survives public
scrutiny* needs scaffolding. `agentlab` provides three things so the marginal
demo is cheap:
1. **A shared shim** ([`shim/`](./shim)) — the tool-call interception substrate
every demo builds on. See [`shim/README.md`](./shim/README.md).
2. **A demo template** ([`demos/_template`](./demos/_template)) — drop-in
scaffold with a runnable `demo.py`, `run.sh`, `SCRIPT.md`, and a smoke test.
3. **A capture pipeline** ([`capture/`](./capture)) — one command turns a demo
into a text log / terminal recording / GIF.
## Quick start
```bash
# Run the reference demo (no dependencies beyond Python 3.10+)
./demos/00-hello-gate/run.sh
# Run the whole test suite (stdlib only)
python3 run_tests.py
# Scaffold a new demo
./new_demo.sh injection-tripwire "Injection Tripwire"
# Capture a demo as shareable content
capture/record.sh 00-hello-gate
```
## The shared shim in one screen
Each demo wires the same `Interceptor` with different middleware. A middleware
can veto a call before it runs (`before`) and/or observe it after (`after`):
```python
from shim import Interceptor, make_toolbox
from shim.middleware import PolicyGate, InjectionTripwire, FlightRecorder
ix = Interceptor(
make_toolbox("/tmp/agent-workspace"),
middleware=[PolicyGate(rules), InjectionTripwire(), FlightRecorder()],
)
ix.call("delete_file", path="production.db") # -> Blocked, tool never runs
```
That's the whole idea: **the three roadmap concepts are three middleware, not
three frameworks.**
| Roadmap concept | Middleware | Status |
| --- | --- | --- |
| A. Guardrail Proxy (policy gate) | `PolicyGate` | flagship demo (SKL-3) |
| B. Injection Tripwire (taint) | `InjectionTripwire` | [demo #2](./demos/01-injection-tripwire) (SKL-6) ✅ |
| C. Agent Flight Recorder (ledger) | `FlightRecorder` + `AuditSink` | demo #3 |
The reference demo [`demos/00-hello-gate`](./demos/00-hello-gate) composes all
three at once to prove the substrate generalizes;
[`demos/01-injection-tripwire`](./demos/01-injection-tripwire) is the focused
Concept B demo — it shows an allow/deny policy is *not enough* to stop a
prompt-injection exfil, and a taint tripwire is.
> **Standalone primitive.** Concept A also has its own focused home — the
> **guardrail** repo — the single-idea policy-gate primitive with its own README
> and quick start, for readers who just want that one control. This lab is where
> the three concepts compose on a shared shim and become repeatable content.
## Repo layout
```
agentlab/
├── shim/ # shared tool-call interception substrate + tests
│ ├── interceptor.py # Interceptor + Middleware chain + Blocked
│ ├── middleware.py # PolicyGate / InjectionTripwire / FlightRecorder
│ ├── audit.py # hash-chained JSONL audit sink
│ └── toolbox.py # shared mock tools
├── demos/
│ ├── _template/ # drop-in scaffold for a new demo
│ ├── 00-hello-gate/ # reference demo that exercises the whole substrate
│ └── 01-injection-tripwire/ # Concept B: taint tracking beats allow/deny policy
├── capture/ # one-command demo-capture pipeline
├── new_demo.sh # scaffold a new demo
├── run_tests.py # dependency-free test runner
├── HOWTO.md # how to add a new demo (the short version)
├── CONTRIBUTING.md
└── LICENSE # Apache-2.0
├── NOTICE # Apache-2.0 NOTICE
```
## Adding a demo
See [`HOWTO.md`](./HOWTO.md). The short version: `./new_demo.sh `, edit the
plan and middleware, run it, capture it.
## License
Apache-2.0 licensed — see [`LICENSE`](./LICENSE) and [`NOTICE`](./NOTICE).