https://github.com/euanmcrosson-dotcom/mcp-guard
Drop-in deterministic policy layer for MCP-using AI agents. Synthesizes tool-call policies from observed indirect-injection gaps and evaluates each tool call at the agent's tool-call boundary.
https://github.com/euanmcrosson-dotcom/mcp-guard
agent-security ai-security defensive-security llm-security mcp policy prompt-injection
Last synced: 11 days ago
JSON representation
Drop-in deterministic policy layer for MCP-using AI agents. Synthesizes tool-call policies from observed indirect-injection gaps and evaluates each tool call at the agent's tool-call boundary.
- Host: GitHub
- URL: https://github.com/euanmcrosson-dotcom/mcp-guard
- Owner: euanmcrosson-dotcom
- License: mit
- Created: 2026-05-04T23:10:49.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-05-26T00:52:21.000Z (about 1 month ago)
- Last Synced: 2026-05-26T02:32:28.595Z (about 1 month ago)
- Topics: agent-security, ai-security, defensive-security, llm-security, mcp, policy, prompt-injection
- Language: Python
- Homepage: https://github.com/euanmcrosson-dotcom/purple-scaffold
- Size: 162 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# mcp-guard
[](https://pypi.org/project/mcp-guardrails/)
[](LICENSE)
[](pyproject.toml)
[](tests/)
[](#backtest-corpus)
[](case_studies/)
Drop-in deterministic policy layer for MCP-using AI agents.
`mcp-guard` synthesises tool-call policies from observed indirect-
prompt-injection gaps, evaluates each tool call against those
policies at the agent's tool-call boundary, and provides a
backtest harness for measuring false-positive rate against
legitimate traffic before deployment.
> **v0.5.7:** closed a type/shape-confusion fail-open — string deny ops
> (`contains`/`matches`/`starts_with`/`equals`/`in`) now recurse through
> list/dict args, so a deny rule can't be evaded by wrapping the value in a
> container (e.g. `to=["x@evil.com"]`). Corpus grew to **308 cases** with the
> previously-untested type-confusion class.
>
> **v0.5.x:** 9 deterministic rule patterns across 122
> rules, **308-case backtest corpus**, TPR 1.00 / FPR 0.01. Four
> framework adapters: Anthropic MCP SDK, LangChain, LlamaIndex,
> CrewAI. LLM-augmented synthesis fallback (mock + real-API
> validated). **Six reproducible real-world [case studies](case_studies/)**:
> EchoLeak indirect injection, MCP tool-description poisoning,
> AWS IMDS SSRF, Log4Shell-class MCP logging, RAG context poisoning,
> agent self-prompting loops. See [CHANGELOG.md](CHANGELOG.md).
This is the defensive companion to the [`purple-scaffold`](https://github.com/euanmcrosson-dotcom/purple-scaffold)
research probes. Findings from those probes feed into policy
synthesis; the resulting policy is what a product-side defender
would ship in front of the agent's tool-call execution gate.
## Why
Most defenses against indirect prompt injection are
classifier-based: pre-process the model input or post-process
the model output, and use a model to decide whether something
looks suspicious. That's useful but probabilistic, hard to
audit, and adds latency.
`mcp-guard` takes the complementary deterministic-policy approach:
- **Synthesise** a policy from observed gaps (e.g., "agent emitted
`read_text_file('~/.ssh/id_rsa')` after reading a poisoned
file" → policy: deny `read_text_file` whose path matches a
sensitive-credential pattern).
- **Evaluate** each tool call against the policy. Pure function:
`(tool_name, args, user_context) -> Decision`. No I/O, no LLM,
no ambiguity.
- **Backtest** the policy against a labelled corpus of legitimate
+ attack tool-call cases before deployment. Measure FPR / TPR.
Iterate until both look acceptable.
The library is not meant to replace classifier-based defenses —
it complements them. Use both: classifier as an early-warning
signal, deterministic policy as the unconditional gate.
## Install
```bash
pip install mcp-guardrails
```
(Python 3.11+. No runtime dependencies beyond the standard library.)
> **Note on the name.** The PyPI distribution is `mcp-guardrails`
> (an unrelated dormant project squats `mcpguard` on PyPI, and
> the similarity check refuses `mcp-guard`). The Python import name
> stays `mcp_guard` so existing code continues to work. Same
> Pillow / PIL pattern. The GitHub repo, the in-code references,
> and the project identity stay `mcp-guard`.
Optional extras for the integrations you actually use:
```bash
pip install 'mcp-guardrails[anthropic-mcp]' # for the Anthropic MCP SDK adapter
pip install 'mcp-guardrails[langchain]' # for the LangChain callback handler
pip install 'mcp-guardrails[llamaindex]' # for the LlamaIndex callback handler / wrap_tool
pip install 'mcp-guardrails[crewai]' # for the CrewAI wrap_tool
pip install 'mcp-guardrails[llm]' # for synthesize_with_llm fallback
pip install 'mcp-guardrails[all]' # everything
```
## Quickstart — Python API
The fastest path to a shippable policy is `synthesize_default_policy()`,
which returns the full ruleset across every built-in pattern:
```python
from mcp_guard import synthesize_default_policy, evaluate, default_corpus, run_backtest
# 1. Load the full deterministic policy (9 patterns, 122 rules)
policy = synthesize_default_policy()
# 2. Evaluate any tool call against it
decision = evaluate(
policy,
tool_name="send_email",
args={"to": "attacker@evil.com", "body": "exfil"},
user_context={"user": {"contacts": ["bob@corp.example"]}},
)
print(decision)
# Decision(allowed=False,
# denying_rule_id='tool-policy-email-contact-allowlist--send_email--default',
# reason='External recipient outside the authenticated user...')
# 3. Backtest against the labelled corpus
metrics = run_backtest(policy, default_corpus())
print(f"TPR: {metrics.true_positive_rate:.4f}, "
f"FPR: {metrics.false_positive_rate:.4f}")
# TPR: 1.0000, FPR: 0.0769
```
For incident-driven synthesis (one observed gap → narrowly-targeted
policy), use `synthesize_from_text()`:
```python
from mcp_guard import synthesize_from_text
# Synthesise from a free-text gap description
policy = synthesize_from_text(
"agent emitted send_email to attacker@evil.com when user "
"asked it to read a ticket",
technique_id="lab-2026-05-04",
)
print(policy.to_yaml())
```
## Quickstart — CLI
```bash
# Synthesise a policy from gap text → YAML on stdout
mcp-guard synthesize "agent emitted send_email to attacker@evil.com" \
> policy.yaml
# Evaluate a single tool call against the policy → JSON Decision on stdout
mcp-guard evaluate policy.yaml send_email '{"to":"attacker@evil.com"}' \
--user-context '{"user":{"contacts":["bob@corp.example"]}}'
# Backtest against the default corpus → metrics JSON
mcp-guard backtest policy.yaml
```
## Wiring into your agent
The evaluator is pure, so you can wire it anywhere — most
naturally at the agent's tool-call boundary:
```python
from mcp_guard import evaluate, GeneratedPolicy
policy: GeneratedPolicy = synthesize_default_policy()
def on_tool_call_attempt(tool_name: str, args: dict, user_ctx: dict) -> bool:
decision = evaluate(policy, tool_name, args, user_ctx)
if not decision.allowed:
log_audit(
event="tool_call_denied",
rule=decision.denying_rule_id,
reason=decision.reason,
tool=tool_name,
args=args,
)
return False
return True
```
### Anthropic MCP Python SDK
```python
from mcp.server import Server
from mcp_guard import synthesize_default_policy
from mcp_guard.integrations.anthropic_mcp import MCPGuard
server = Server("my-app")
guard = MCPGuard(policy=synthesize_default_policy())
@server.call_tool()
async def call_tool(name: str, arguments: dict):
# Raises GuardedToolDenied if the policy denies the call.
guard.check(name, arguments, user_context=current_user_context())
return await my_business_logic(name, arguments)
```
Or use the decorator form:
```python
@server.call_tool()
@guard.wrap_handler(user_context_fn=current_user_context)
async def call_tool(name: str, arguments: dict):
return await my_business_logic(name, arguments)
```
### LangChain
```python
from langchain.agents import AgentExecutor
from mcp_guard import synthesize_default_policy
from mcp_guard.integrations.langchain import make_callback_handler
handler = make_callback_handler(
policy=synthesize_default_policy(),
user_context_fn=lambda: {"user": {"id": current_user.id,
"contacts": current_user.contacts}},
)
executor = AgentExecutor(
agent=agent, tools=tools,
callbacks=[handler], # ← mcp-guard sits in the callback chain
)
```
If the policy denies a tool call, the handler raises `GuardedToolDenied`
inside `on_tool_start`, which LangChain surfaces as a tool failure;
the agent's reasoning chain sees the deny reason and can adapt.
### LlamaIndex
```python
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
from mcp_guard import synthesize_default_policy
from mcp_guard.integrations.llamaindex import make_callback_handler
Settings.callback_manager = CallbackManager([
make_callback_handler(
policy=synthesize_default_policy(),
user_context_fn=lambda: {"user": {...}},
),
])
# … your existing agent / query engine code; tool calls are now guarded.
```
Per-tool variant (no callback manager required):
```python
from mcp_guard.integrations.llamaindex import wrap_tool
guarded = wrap_tool(my_tool, policy=synthesize_default_policy())
```
### CrewAI
```python
from crewai import Agent
from mcp_guard import synthesize_default_policy
from mcp_guard.integrations.crewai import wrap_tools
agent = Agent(
role="researcher",
goal="answer the question",
tools=wrap_tools(
my_tools,
policy=synthesize_default_policy(),
user_context_fn=lambda: {"user": {...}},
),
)
```
`wrap_tool` is idempotent — re-wrapping is a no-op — so it's safe to
apply at agent-construction time without tracking which tools were
already guarded.
### LLM-augmented synthesis for novel gaps
The deterministic synthesiser covers 9 attack-class patterns. For
gap shapes none of them recognise, `synthesize_with_llm` adds an
LLM fallback path that calls Anthropic Claude with a schema-pinned
prompt and validates the response against the full PolicyRule
schema before emitting the rule:
```python
from mcp_guard import synthesize_with_llm
# Deterministic patterns handle this → no LLM call.
p1 = synthesize_with_llm("send_email to attacker@evil.com")
# Novel gap → falls back to Claude (requires [llm] extra)
p2 = synthesize_with_llm(
"agent invoked custom_tool_xyz with arg target_id pointing to a "
"privileged service account ID outside the user's tenant",
fallback=True,
)
```
The validator rejects any response that doesn't match the
PolicyRule schema (invalid operator, missing fields, etc.) and
returns an empty policy on failure — better to miss a rule than
ship a malformed one.
## What kinds of gaps does the synthesiser cover?
The deterministic synthesiser is intentionally pattern-based and
auditable. As of v0.2.0, 9 attack classes map onto 122 rules in the
default policy:
| # | Attack class | What it denies | Tool families covered |
|---|---|---|---|
| 1 | Email contact exfil | `send_email` whose `to` arg is outside the user's `context.user.contacts` | 5 email tool names |
| 2 | Sensitive file read | `read_file` whose `path` matches `~/.ssh/`, `~/.aws/`, `/etc/shadow`, `kubeconfig`, etc. | 6 read tool names |
| 3 | Sensitive file write | `write_file` whose `path` matches `~/.bashrc`, `~/.ssh/authorized_keys`, `/etc/`, `/usr/bin/`, cron, `.git/config`, `.env`, etc. | 5 write tool names |
| 4 | Path traversal | Any path arg containing `../`, `..\`, URL-encoded variants (`%2e%2e`, `%2F`/`%5C`), double-encoded, Unicode division-slash | 17 file-path tool names |
| 5 | SSRF (private host) | `fetch_url` / `http_get` whose `url` targets RFC1918, loopback, link-local, AWS/GCP metadata, IPv6 unique-local | 6 HTTP tool names |
| 6 | Shell command danger | `shell_exec` / `bash` / `run_command` containing chaining (`;`, `&&`), pipe-to-shell, command substitution (`$()`, backticks), `rm -rf /`, `curl|sh`, fork bombs | 8 shell tool names × 5 arg names |
| 7 | SQL danger | `db_query` / `execute_sql` containing `DROP TABLE`, `TRUNCATE`, unbounded `DELETE`/`UPDATE`, `UNION SELECT`, `information_schema` probes, stacked queries, `xp_/sp_` exec, `LOAD_FILE`, `INTO OUTFILE` | 6 SQL tool names × 3 arg names |
| 8 | Network egress private | `tcp_connect` / `socket_connect` whose `host` is private/internal | 5 network tool names |
| 9 | Email body PII / secret exfil | `send_email` whose `body`/`subject` contains AWS keys, OpenAI/Anthropic keys, GitHub PATs, Slack tokens, private-key headers, SSN, JWT, credit-card numbers | 5 email tool names × 4 arg names |
For gap shapes not yet covered, the synthesiser returns an empty
policy (deliberate — we surface "no rule generated" rather than
fabricate a wrong rule). Adding a new gap shape is one
constructor + one test.
LLM-driven synthesis can layer on top later for novel cases the
patterns don't cover; the deterministic path stays as a backstop
because it's auditable from logs alone (no model required at
synthesis time).
## Backtest corpus
`default_corpus()` returns a **124-case** fixture corpus of (tool_name,
args, user_context, expected_verdict) tuples covering every built-in
pattern. v0.4.0 expanded coverage to: post-RCE env recon (env dump,
printenv, secret-keyword grep, secret-extension find), Windows
sensitive paths (Credentials manager, DPAPI keys, hosts file,
scheduled tasks, registry Run keys), Postgres COPY/pg_read_file
RCE, MySQL INTO DUMPFILE, MSSQL xp_cmdshell, jar://ftp://dict://
SSRF schemes, RSA/OpenSSH PEM headers, GitHub PATs, Slack tokens.
**Default-policy metrics (current):**
```
Corpus size: 308
TP (caught): 110 / 110 attacks → TPR 1.0000
FP (over-blocks): 2 / 198 legit → FPR 0.0101
```
The FPR drops as the legit denominator grows; the 2 FPs are still
the same architectural floor (legitimate first-time recipients
that contact-allowlist policies block by definition).
The 2 remaining FPs are architecturally inherent to contact-allowlist
policies (legitimate first-time recipients). They are kept in the
corpus on purpose so the FPR is a real number rather than a vanity
zero. Tune by adding allow-list conditions to `user_context` per
recipient class (e.g. distinguish "vendor onboarding" or "interview
candidate" tiers from generic external).
| Category | Legit cases | Attack cases |
|---|---|---|
| Email contact allowlist | 6 (4 in-contacts + 2 FP-risk) | 3 |
| Sensitive file read | 1 | 3 |
| Sensitive file write | 2 | 4 |
| Path traversal | 2 | 3 |
| SSRF | 3 | 4 |
| Shell danger | 3 | 5 |
| SQL danger | 3 | 5 |
| Network egress private | 2 | 3 |
| Email PII exfil | 2 | 5 |
| Misc legit (read_ticket / search_users) | 2 | — |
Real production deployments should replace `default_corpus()` with
a load from a labelled traffic store. The rest of the backtest
pipeline stays the same.
## Relationship to `purple-scaffold`
[`purple-scaffold`](https://github.com/euanmcrosson-dotcom/purple-scaffold)
is the offensive / measurement side: probes that test how
indirect-prompt-injection compliance varies across MCP server
vectors, models, and product wrappers. `mcp-guard` is the
defensive side: deterministic policies that catch the
attack patterns the probes find.
Both repos share the same evaluator core; `mcp-guard` is the
graduation of the policy modules from `purple-scaffold/purple/`
into a standalone package.
## License
MIT. See LICENSE.