An open API service indexing awesome lists of open source software.

https://github.com/garagon/aguara

Security scanner for AI agent skills and MCP servers. Static analysis, incident response, no LLM. One binary. Detection engine behind oktsec.
https://github.com/garagon/aguara

ai-agents ai-security claude data-exfiltration devsecops golang mcp mcp-server model-context-protocol prompt-injection sast security security-scanner static-analysis supply-chain-security

Last synced: about 1 month ago
JSON representation

Security scanner for AI agent skills and MCP servers. Static analysis, incident response, no LLM. One binary. Detection engine behind oktsec.

Awesome Lists containing this project

README

          


Aguara



Security scanner for AI agent skills and MCP servers.


Detect prompt injection, data exfiltration, and supply-chain attacks before they reach production.


CI
Go Report Card
Go Reference
GitHub Release
License
GitHub Stars
Docker
Homebrew


Installation
Quick Start
How It Works
Usage
Rules
Supply-Chain Check
Aguara MCP
Aguara Watch
Contributing

https://github.com/user-attachments/assets/851333be-048f-48fa-aaf3-f8cc1d4aa594

## Why Aguara?

AI agents and MCP servers run code on your behalf. A single malicious skill file can exfiltrate credentials, inject prompts, or install backdoors. Aguara catches these threats **before deployment** with static analysis that requires no API keys, no cloud, and no LLM.

- **193 detection rules across 13 categories** — prompt injection, data exfiltration, credential leaks, supply-chain attacks, MCP-specific threats, command execution, SSRF, unicode attacks, and more.
- **7 scan analyzers** — pattern matching, GitHub Actions trust-chain detection (ci-trust), npm package metadata (pkgmeta), JavaScript payload risk (jsrisk), NLP analysis, taint tracking, and rug-pull detection work together to catch threats that any single technique would miss. Separate `aguara check` / `aguara audit` commands flag installed npm and Python environments AND lockfiles for Go, Rust, PHP/Composer, Ruby/Bundler, Java/Maven/Gradle, and .NET/NuGet against the embedded threat-intel snapshot. Built from [OSV.dev](https://osv.dev) malicious-package records plus OpenSSF Malicious Packages and hand-curated emergency advisories. Offline by default.
- **8 decoders for encoded evasion** — base64, hex, URL encoding, Unicode escapes, HTML entities, hex escapes, base32, and C-style octal escapes. Obfuscated payloads are decoded and re-scanned automatically.
- **NLP on markdown, JSON, and YAML** — goldmark AST analysis for markdown files, plus string extraction and classification for JSON/YAML tool descriptions. Catches MCP tool poisoning in structured configs.
- **Cross-file toxic flow analysis** — detects dangerous capability combinations split across files in the same MCP server directory (e.g., one tool reads credentials, another sends to a webhook).
- **Aggregate risk score** — 0-100 score with diminishing returns across findings. Available in JSON, SARIF, and terminal output.
- **Context-aware scanning** — pass the tool name (`--tool-name Edit`) and the scanner automatically skips rules that are always false positives for that tool. Built-in exemptions for Edit, Write, WebFetch, Bash, and more.
- **Scan profiles** — `strict` (default), `content-aware`, or `minimal` enforcement. Findings are always preserved for audit; only the verdict (clean/flag/block) changes.
- **Evasion prevention** — NFKC normalization catches fullwidth character evasion. 8 decoders catch encoded payloads. Crypto address filtering prevents hex decoder false positives.
- **Dynamic confidence scoring** — every finding carries a confidence level (0.50-0.95) that reflects signal quality: pattern hit ratio, classifier score, and code-block awareness.
- **Remediation guidance** — all 193 rules include actionable fix suggestions, shown in every output format.
- **Deterministic** — same input, same output. Every scan is reproducible.
- **CI-ready** — JSON, SARIF, and Markdown output. GitHub Action. `--fail-on` threshold. `--changed` for incremental scans.
- **17 MCP clients supported** — auto-discover and scan configs from Claude Desktop, Cursor, VS Code, Windsurf, and 13 more.
- **Library API for embedding** — `WithDeduplicateMode()` preserves all cross-rule findings for verdict pipelines. `WithStateDir()` enables rug-pull detection for persistent consumers.
- **Extensible** — write custom rules in YAML. No code required.

## Installation

```bash
curl -fsSL https://raw.githubusercontent.com/garagon/aguara/main/install.sh | sh
```

Installs the latest binary to `~/.local/bin`. Customize with environment variables:

```bash
curl -fsSL https://raw.githubusercontent.com/garagon/aguara/main/install.sh | VERSION=v0.17.0 sh
curl -fsSL https://raw.githubusercontent.com/garagon/aguara/main/install.sh | INSTALL_DIR=/usr/local/bin sh
```

To update an existing install, rerun the installer. It downloads the selected release archive, verifies `checksums.txt`, and replaces the binary:

```bash
# Update to latest
curl -fsSL https://raw.githubusercontent.com/garagon/aguara/main/install.sh | sh

# Update/pin to a specific release
curl -fsSL https://raw.githubusercontent.com/garagon/aguara/main/install.sh | VERSION=v0.17.0 sh
```

### Alternative methods

**Homebrew** (macOS/Linux):

```bash
brew install garagon/tap/aguara
```

**Docker** (no install required):

```bash
# Scan current directory
docker run --rm -v "$(pwd)":/scan ghcr.io/garagon/aguara scan /scan

# Scan with options
docker run --rm -v "$(pwd)":/scan ghcr.io/garagon/aguara scan /scan --severity high --format json

# Use a specific version
docker run --rm -v "$(pwd)":/scan ghcr.io/garagon/aguara:0.17.0 scan /scan
```

**From source** (requires Go 1.25+):

```bash
go install github.com/garagon/aguara/cmd/aguara@latest
```

Pre-built binaries for Linux, macOS, and Windows are also available on the [Releases page](https://github.com/garagon/aguara/releases).

### Verifying signed releases

Starting with the next release after this section landed, every release is signed with [Cosign](https://github.com/sigstore/cosign) keyless, ships an SPDX SBOM per archive, and is built with `-trimpath` for reproducibility. The container image is signed at the digest and carries SBOM + SLSA provenance attestations.

**Verify the release archive**:

```bash
VERSION=vX.Y.Z
ARCHIVE=aguara_${VERSION#v}_linux_amd64.tar.gz

# Download archive, checksums, and the cosign bundle
curl -fsSLO https://github.com/garagon/aguara/releases/download/${VERSION}/${ARCHIVE}
curl -fsSLO https://github.com/garagon/aguara/releases/download/${VERSION}/checksums.txt
curl -fsSLO https://github.com/garagon/aguara/releases/download/${VERSION}/checksums.txt.bundle

# Verify checksums.txt was signed by the GitHub Actions release workflow
cosign verify-blob \
--bundle checksums.txt.bundle \
--certificate-identity "https://github.com/garagon/aguara/.github/workflows/release.yml@refs/tags/${VERSION}" \
--certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
checksums.txt

# Now verify the archive matches the signed checksums
sha256sum --check --ignore-missing checksums.txt
```

**Verify the container image**:

```bash
cosign verify ghcr.io/garagon/aguara:${VERSION#v} \
--certificate-identity "https://github.com/garagon/aguara/.github/workflows/docker.yml@refs/tags/${VERSION}" \
--certificate-oidc-issuer "https://token.actions.githubusercontent.com"
```

**Inspect the SBOM and provenance**:

```bash
# Release archive SBOM (SPDX 2.3)
curl -fsSL https://github.com/garagon/aguara/releases/download/${VERSION}/${ARCHIVE}.sbom.json | jq .

# Container image SBOM and SLSA build provenance.
# docker/build-push-action publishes these as BuildKit attestation manifests
# (in-toto / SLSA spec) attached to the OCI image index, not as cosign
# attestations. Use `docker buildx imagetools inspect` to read them:
docker buildx imagetools inspect ghcr.io/garagon/aguara:${VERSION#v} \
--format '{{ json .SBOM }}' | jq .
docker buildx imagetools inspect ghcr.io/garagon/aguara:${VERSION#v} \
--format '{{ json .Provenance }}' | jq .
```

`install.sh` performs SHA256 verification automatically and aborts if `sha256sum`/`shasum` is unavailable, so the curl-pipe install path is never silently downgraded.

## Quick Start

```bash
# Am I exposed to a compromised package?
aguara check

# Same, but refresh threat intel from OSV first
aguara check --fresh

# CI gate: fail if a compromised package is installed
aguara check --ci

# Audit: scan code + check packages, single verdict
aguara audit

# Is Aguara's threat intel fresh?
aguara status

# Pull the latest threat intel for future offline checks
aguara update
```

```bash
# Discover which MCP clients are configured on this machine
aguara discover

# Scan a skills directory (or any path) for content threats
aguara scan .claude/skills/

# CI mode for content scan: --fail-on high, no color
aguara scan .claude/skills/ --ci

# Auto-discover and scan all MCP configs on this machine
aguara scan --auto
```

Both `aguara check` and `aguara audit` run offline by default using the
threat intel baked into the binary. The network is used only when you
opt in with `--fresh` or `aguara update`.

## How It Works

Aguara runs 6 scan analyzers sequentially on every file by default; a 7th (Rug-Pull) joins when `--monitor` is enabled and a state store is configured. Each catches a different class of attack:

| Analyzer | Engine | What it catches |
|----------|--------|-----------------|
| **Pattern Matcher** | Regex + Aho-Corasick matching | Known attack signatures, credential patterns, dangerous commands. Aho-Corasick automaton for O(n+m) multi-pattern search. 8 decoders (base64, hex, URL encoding, Unicode escapes, HTML entities, hex escapes, base32, octal escapes) decode obfuscated payloads and re-scan. Code-block severity downgrade. Dynamic confidence based on pattern hit ratio. |
| **CI Trust** | GitHub Actions YAML parser | `pull_request_target` chains, cache poisoning across fork boundaries, OIDC token surface paired with install/build/test, persisted-credentials checkouts on PR head refs. |
| **PkgMeta** | `package.json` JSON parser | npm install-time lifecycle scripts plus git-sourced dependencies, optional-git deps with suspicious names, publish surfaces paired with trusted-publishing references. |
| **JSRisk** | JavaScript single-pass scanner | Obfuscator-shape payloads, install-time daemonization via `child_process`, CI secret harvesting through real `process.env` reads plus network/registry sinks, runner-process memory pivots to extract OIDC tokens, Claude Code / VS Code workspace persistence. |
| **NLP Analyzer** | Goldmark AST + JSON/YAML extraction | Prompt injection in markdown structure, plus tool poisoning in JSON/YAML description fields. Keyword classification with proximity weighting; clustered keywords score higher, sparse keywords in long text get penalized. |
| **Taint Tracker** | Source-to-sink flow analysis | Dangerous capability combinations within a single file and across files in the same directory. Detects credential reads paired with webhook sends, env vars flowing to shell execution, destructive plus exec combos across MCP server tools. |
| **Rug-Pull Detector** | SHA256 hash tracking | Tool descriptions that change between scans. CLI: `--monitor` flag. Library: `WithStateDir()` for persistent consumers. |

Separate `aguara check` and `aguara audit` commands inspect installed
package trees (Python `site-packages`, npm `node_modules` including the
pnpm `.pnpm` store) AND walk the repo recursively for Go, Rust, PHP,
Ruby, Java, and .NET lockfiles, matching every declared package against
the embedded threat-intel snapshot. See
[Supply-Chain Check](#supply-chain-check) for the full surface.

All content is NFKC-normalized before scanning to prevent Unicode evasion attacks. All layers report findings with severity, dynamic confidence score (0.50-0.95), matched text, file location with context lines, and remediation guidance. An aggregate risk score (0-100) summarizes overall threat level.

## Usage

```
aguara scan [path] [flags]

Flags:
--auto Auto-discover and scan all MCP client configs
--severity string Minimum severity to report: critical, high, medium, low, info (default "info")
--format string Output format: terminal, json, sarif, markdown (default "terminal")
-o, --output string Output file path (default: stdout)
--workers int Number of worker goroutines (default: NumCPU)
--rules string Additional rules directory
--disable-rule strings Rule IDs to disable (comma-separated, repeatable)
--max-file-size string Maximum file size to scan (e.g. 50MB, 100MB; default 50MB, range 1MB-500MB)
--tool-name string Tool context for false-positive reduction (e.g. Bash, Edit, WebFetch)
--profile string Scan profile: strict (default), content-aware, minimal
--no-color Disable colored output
--no-update-check Disable automatic update check (also: AGUARA_NO_UPDATE_CHECK=1)
--fail-on string Exit code 1 if findings at or above this severity
--ci CI mode: --fail-on high --no-color
--changed Only scan git-changed files
--monitor Enable rug-pull detection: track file hashes across runs
-v, --verbose Show rule descriptions, confidence scores, and remediation
-h, --help Help
```

### Output Formats

| Format | Flag | Use case |
|--------|------|----------|
| **Terminal** | `--format terminal` (default) | Human-readable with color, severity dashboard, top-files chart |
| **JSON** | `--format json` | Machine processing, API integration, custom tooling |
| **SARIF** | `--format sarif` | GitHub Code Scanning, IDE integrations, SAST dashboards |
| **Markdown** | `--format markdown` | GitHub Actions job summaries, PR comments |

### MCP Client Discovery

Aguara auto-detects MCP configurations across **17 clients**: Claude Desktop, Cursor, VS Code, Cline, Windsurf, OpenClaw, OpenCode, Zed, Amp, Gemini CLI, Copilot CLI, Amazon Q, Claude Code, Roo Code, Kilo Code, BoltAI, and JetBrains.

```bash
# List all detected MCP configs
aguara discover

# JSON output (sensitive env values are automatically redacted)
aguara discover --format json

# Markdown output
aguara discover --format markdown

# Discover + scan in one command
aguara scan --auto
```

### CI Integration

#### GitHub Action

```yaml
- uses: garagon/aguara@v0.17.0
with:
path: .
fail-on: high
version: v0.17.0
```

Both pins (the action ref AND the `version:` input) are required. The
action ref alone pins only the composite action and its install
script; `version:` pins the Aguara binary the action installs. Setting
both makes the workflow reproducible and dependabot-friendly: when
v0.17.1 lands, the bot updates both together.

Scans your repository, uploads findings to GitHub Code Scanning, and
optionally fails the build:

```yaml
- uses: garagon/aguara@v0.17.0
with:
path: ./mcp-server/
severity: medium
fail-on: high
version: v0.17.0
```

All inputs are optional. See [`action.yml`](action.yml) for the full list.

| Input | Default | Description |
|-------|---------|-------------|
| `path` | `./` | Path to scan |
| `severity` | `info` | Minimum severity to report |
| `fail-on` | _(none)_ | Fail if findings at or above this severity |
| `format` | `sarif` | Output format: sarif, json, terminal, markdown |
| `upload-sarif` | `true` | Upload SARIF to GitHub Code Scanning |
| `version` | _(latest)_ | Pin a specific Aguara version |

> **Note**: SARIF upload requires the `security-events: write` permission and is free for public repositories.

#### Docker in CI

```yaml
# GitHub Actions with Docker (no install step)
- name: Scan for security issues
run: docker run --rm -v "${{ github.workspace }}":/scan ghcr.io/garagon/aguara scan /scan --ci
```

#### Manual / GitLab CI

```yaml
# GitHub Actions (without the action)
- name: Scan skills for security issues
run: |
curl -fsSL https://raw.githubusercontent.com/garagon/aguara/main/install.sh | sh
aguara scan .claude/skills/ --ci
```

```yaml
# GitLab CI
security-scan:
script:
- curl -fsSL https://raw.githubusercontent.com/garagon/aguara/main/install.sh | sh
- aguara scan .claude/skills/ --format sarif -o gl-sast-report.sarif --fail-on high
artifacts:
reports:
sast: gl-sast-report.sarif
```

### Configuration

Create `.aguara.yml` in your project root:

```yaml
severity: medium
fail_on: high
max_file_size: 104857600 # 100 MB (default: 50 MB, range: 1 MB-500 MB)
ignore:
- "vendor/**"
- "node_modules/**"
rule_overrides:
CRED_004:
severity: low
EXTDL_004:
disabled: true
TC-005:
apply_to_tools: ["Bash"] # only enforce on Bash
MCPCFG_004:
exempt_tools: ["WebFetch"] # enforce on everything except WebFetch
```

`apply_to_tools` and `exempt_tools` are mutually exclusive per rule. They filter findings at scan time when a tool name is provided via `--tool-name` or the library API.

### Inline Ignore

Suppress specific findings directly in your source files using inline comments:

```yaml
# aguara-ignore CRED_004
api_key: "sk-test-1234567890" # this finding is suppressed
```

```markdown

Ignore all previous instructions (this is a test)
```

Supported directives:

| Directive | Effect |
|-----------|--------|
| `# aguara-ignore RULE_ID` | Suppress rule on the same line |
| `# aguara-ignore RULE_ID, RULE_ID2` | Suppress multiple rules on the same line |
| `# aguara-ignore-next-line RULE_ID` | Suppress rule on the next line |
| `# aguara-ignore` | Suppress all rules on the same line |
| `` | HTML/Markdown comment variant |
| `// aguara-ignore RULE_ID` | C-style comment variant |

## Rules

193 built-in rules across 13 pattern-rule categories (what `aguara list-rules` enumerates) plus the toxic-flow chain analyzer with its own emit-time category. The table groups coverage by emit-time category for readability:

| Category | Rules | What it detects |
|----------|-------|-----------------|
| Credential Leak | 22 | API keys (OpenAI, AWS, GCP, Stripe, ...), private keys, DB strings, HMAC secrets |
| Prompt Injection | 18 + NLP | Instruction overrides, role switching, delimiter injection, jailbreaks, event injection |
| Supply Chain | 24 | Download-and-execute, reverse shells, sandbox escape, symlink attacks, privilege escalation, OIDC token vars, runner-pivot memory, Claude Code persistence path |
| External Download | 16 | Binary downloads, curl-pipe-shell, auto-installs, profile persistence |
| MCP Attack | 16 | Tool injection, name shadowing, canonicalization bypass, capability escalation |
| Data Exfiltration | 16 + NLP | Webhook exfil, DNS tunneling, sensitive file reads, env var leaks |
| Command Execution | 16 | shell=True, eval, subprocess, child_process, PowerShell |
| MCP Config | 13 | Unpinned npx/uvx servers, hardcoded secrets, Docker cap-add, host networking, pip without hashes |
| Indirect Injection | 10 | Fetch-and-follow, remote config, DB-driven instructions, webhook registration |
| SSRF & Cloud | 11 | Cloud metadata, IMDS, Docker socket, internal IPs, redirect following |
| Third-Party Content | 10 | eval with external data, unsafe deserialization, missing SRI, HTTP downgrade |
| Unicode Attack | 10 | RTL override, bidi, homoglyphs, zero-width sequences, normalization bypass |
| Supply Chain Exfil | 11 | Credential file reads, .pth executable code, bulk env collection, K8s secrets access, systemd persistence, archive+POST exfil, Session-Network endpoints |
| Toxic Flow | 3 + cross-file | Single-file taint tracking plus cross-file correlation across MCP server directories |

See [RULES.md](RULES.md) for the complete rule catalog with IDs and severity levels.

### Remediation Guidance

All 193 rules include remediation text. It appears in every output format:

- **Terminal**: always shown for CRITICAL findings, shown for all severities with `--verbose`
- **JSON**: included in every finding object
- **SARIF**: mapped to the `help` field on each rule
- **Markdown**: shown for HIGH and CRITICAL findings
- **Explain**: `aguara explain RULE_ID` shows the full remediation text

```bash
# See remediation for a specific rule
aguara explain CRED_002

# Terminal output with remediation for all findings
aguara scan . --verbose
```

```json
{
"rule_id": "PROMPT_INJECTION_001",
"severity": 4,
"matched_text": "Ignore all previous instructions",
"remediation": "Remove instruction override text. If this is documentation, wrap it in a code block to indicate it is an example.",
"confidence": 0.95
}
```

### Custom Rules

```yaml
id: CUSTOM_001
name: "Internal API endpoint"
description: "Detects references to internal APIs"
severity: HIGH
category: custom
targets: ["*.md", "*.txt"]
match_mode: any
remediation: "Replace internal API URLs with the public endpoint or environment variable."
patterns:
- type: regex
value: "https?://internal\\.mycompany\\.com"
- type: contains
value: "api.internal"
exclude_patterns: # optional: suppress match in these contexts
- type: contains
value: "## documentation"
examples:
true_positive:
- "Fetch data from https://internal.mycompany.com/api/users"
false_positive:
- "Our public API is at https://api.mycompany.com"
```

`exclude_patterns` suppress a match when the matched line (or up to 3 lines before it) matches any exclude pattern. Useful for reducing false positives in documentation headings, installation guides, etc.

Custom rules are validated at load time: unknown YAML fields are rejected, and all rules require `id`, `name`, `category`, and at least one pattern.

```bash
aguara scan .claude/skills/ --rules ./my-rules/
```

## Supply-Chain Check

In v0.17, `aguara check .` walks the dependency surface of a modern repo
instead of stopping at npm/Python.

Aguara ships with native threat intel built from [OSV.dev](https://osv.dev)'s
public vulnerability database and OpenSSF Malicious Packages records, plus a
hand-curated list of high-priority emergency advisories (event-stream,
node-ipc 2022 and 2026, litellm). Both run **offline by default** — the
binary carries the snapshot. Runtime updates are opt-in.

The check commands are organised by user intent.

### `aguara check` — am I exposed?

```bash
# Run from a repo root. Aguara finds installed npm / Python environments
# AND lockfiles for Go, Rust, PHP, Ruby, Java, and .NET recursively under
# the path, then matches every declared package against the snapshot.
aguara check .

# Refresh threat intel from OSV first, then check. The only check mode
# that uses the network; the rest stay offline. --fresh refreshes only
# the ecosystems the plan actually touches.
aguara check --fresh

# CI gate: --fail-on critical, no color, exit 1 on compromised packages
aguara check --ci

# Constrain to specific ecosystems (repeatable or comma-separated)
aguara check --ecosystem go,ruby
aguara check --ecosystem maven --ecosystem nuget

# Machine-readable
aguara check --format json
```

What it checks:
- **Known compromised package versions** across all supported ecosystems
(manual advisories + OSV malicious-package records + OpenSSF Malicious
Packages origins).
- **`.pth` files with executable code** (import, subprocess, exec, eval).
Python-only.
- **pip/uv/npx caches** so a compromised package in the cache surfaces
even without a virtualenv. Python / npm only.
- **Persistence backdoors** (systemd user services, sysmon artifacts).
Python-only.
- **Credential files at risk** (SSH, AWS, K8s, git, npm, PyPI, databases).
Python-only.

Coverage by ecosystem:

| Ecosystem | Evidence read | Coverage today |
|---|---|---|
| npm | `node_modules` (incl. pnpm `.pnpm` store) | Strong malicious-package coverage |
| PyPI | `site-packages`, `.pth`, pip/uv/npx caches | Strong malicious-package + persistence coverage |
| RubyGems | `Gemfile.lock` | Strong malicious-package coverage |
| NuGet | `packages.lock.json`, `*.csproj`/`*.fsproj`/`*.vbproj` | Strong exact-version malicious-package coverage |
| Go | `go.sum`, `go.mod` | Parser ready; limited exact-version embedded matches today |
| crates.io | `Cargo.lock` (public registry only) | Parser ready; range-aware OSV matching deferred |
| Packagist | `composer.lock` | Parser ready; range-aware OSV matching deferred |
| Maven | `pom.xml`, `gradle.lockfile`, `gradle/dependency-locks/*` | Parser ready; range-aware OSV matching deferred |

Aguara is not claiming to be a full SCA vulnerability scanner yet. v0.17
adds offline malicious-package checks across ecosystems. General
CVE/range matching is the next layer.

Auto-detection rules (in order):
- If the path is or contains `node_modules`, run the npm check.
- Walk the path recursively for lockfiles (`go.sum`, `Cargo.lock`,
`composer.lock`, `Gemfile.lock`, `pom.xml`, `gradle.lockfile`,
`gradle/dependency-locks/*.lockfile`, `packages.lock.json`,
`*.csproj`/`*.fsproj`/`*.vbproj`). Skip `.git/`, `vendor/`,
`node_modules/`, `.aguara/`, `target/`, `bin/`, `obj/`, `.gradle/`.
- If the explicit `--path` looks like a Python install
(`site-packages` / `dist-packages` basename, or contains `*.dist-info`),
run the Python check.
- If `aguara check` runs with no flags and no signals are found, fall
back to global Python `site-packages` autodiscovery (legacy
behaviour).

An explicit `--path` with no signals returns a clean result with
`"ecosystems": []` and never silently falls back to the host's global
Python.

### `aguara audit` — code AND packages, one verdict

```bash
aguara audit # check + scan on the current directory
aguara audit --ci # CI gate: --fail-on critical, no color
aguara audit --fresh # refresh intel, then audit
```

`aguara audit` composes the supply-chain check and the content scan into a
single verdict. JSON output carries both sub-results (`.check` and `.scan`)
plus per-section counts so a dashboard can drill into either side.

### `aguara status` — is my threat intel fresh?

```bash
aguara status
```

Prints the Aguara version, the embedded snapshot's generated-at date and
record count, and whether a local cached snapshot exists from a prior
`aguara update` run. Does no network I/O.

### `aguara update` — refresh intel for future offline checks

```bash
aguara update # fetch every supported ecosystem, cache locally
aguara update --ecosystem npm # just npm
aguara update --ecosystem go,ruby # scope to Go + RubyGems
```

`aguara update` and `--fresh` are the only commands that use the network.
The default refreshes every ecosystem the registry supports (npm, PyPI,
Go, crates.io, Packagist, RubyGems, Maven, NuGet); scope with `--ecosystem`
(repeatable or comma-separated). The refreshed cache lives at
`~/.aguara/intel/snapshot.json`; subsequent `aguara check` runs layer it
over the embedded snapshot automatically and stay offline.

`aguara check --fresh` refreshes only the ecosystems the plan actually
touches, so `aguara check --fresh --ecosystem maven` does not pull npm,
PyPI, or the other six.

If a refresh returns zero records (upstream outage, schema shift), the
update is refused so cached intel cannot be silently wiped. Pass
`--allow-empty` to override during initial bootstrap.

### `aguara clean` — quarantine compromised packages

```bash
aguara clean # interactive confirmation
aguara clean --yes --purge-caches # non-interactive, also purge pip/uv caches
aguara clean --dry-run # preview
```

Files are quarantined to `/tmp/aguara-quarantine/`, not deleted. After
cleaning, Aguara prints a credential rotation checklist for every
credential file present on the system.

### Advanced: explicit ecosystem and path

Use these when auto-detection cannot find the environment you want to check:

```bash
aguara check --ecosystem python --path /opt/venv/lib/python3.12/site-packages/
aguara check --ecosystem npm --path ./node_modules
```

### Threat-intel sources

The embedded snapshot is built from two sources:

- **Manual** — a short hand-curated list of high-priority emergency
advisories. Takes display precedence when an advisory ID also appears
in OSV.
- **OSV.dev** — high-confidence records only: OpenSSF Malicious Packages
IDs (the `MAL-` namespace), records with
`database_specific.malicious-packages-origins`, plus keyword-qualified
records that carry exact affected versions. Generic CVE / DoS records
are filtered out at import time so Aguara stays focused on malicious
packages, not general SCA.

Built originally in response to the
[litellm supply chain attack](https://github.com/garagon/aguara/releases/tag/v0.11.0)
(March 2026), where malicious `.pth` files exfiltrated credentials and
installed K8s backdoors. The toolset grew from that incident into the
broader check + audit + update + status surface above.

## Aguara MCP

[Aguara MCP](https://github.com/garagon/aguara-mcp) is an MCP server that gives AI agents the ability to scan skills and configurations for security threats — before installing or running them. It imports Aguara as a Go library — one `go install`, no external binary needed.

```bash
# Install and register with Claude Code
go install github.com/garagon/aguara-mcp@latest
claude mcp add aguara -- aguara-mcp
```

Your agent gets 4 tools: `scan_content`, `check_mcp_config`, `list_rules`, and `explain_rule`. No network, no LLM, millisecond scans — the agent checks first, then decides.

## Aguara Watch

Aguara Watch is being reworked. The previous public observatory is stale, so we are not using it as a product signal for this release. The scanner, CLI, Docker image, and signed release artifacts remain the supported surfaces for v0.17.0.

## Go Library

Aguara exposes a public Go API for embedding the scanner in other tools. [Aguara MCP](https://github.com/garagon/aguara-mcp) uses this API.

```go
import "github.com/garagon/aguara"

// Scan a directory
result, err := aguara.Scan(ctx, "./skills/")

// Scan inline content (no disk I/O, NFKC-normalized)
result, err := aguara.ScanContent(ctx, content, "skill.md")

// Scan with tool context for false-positive reduction
result, err := aguara.ScanContentAs(ctx, content, "skill.md", "Edit")
// result.Verdict: aguara.VerdictClean, VerdictFlag, or VerdictBlock
// result.ToolName: "Edit"
// result.Findings: always preserved (even when verdict is clean)

// Scan with a profile
result, err := aguara.ScanContent(ctx, content, "skill.md",
aguara.WithToolName("Edit"),
aguara.WithScanProfile(aguara.ProfileContentAware),
)
// result.RiskScore: 0-100 aggregate risk score

// Preserve cross-rule findings (for verdict pipelines)
result, err := aguara.ScanContent(ctx, content, "skill.md",
aguara.WithDeduplicateMode(aguara.DeduplicateSameRuleOnly),
)

// Enable rug-pull detection with persistent state
result, err := aguara.ScanContent(ctx, content, "tool.md",
aguara.WithStateDir("/var/lib/myapp/aguara-state"),
)

// Discover all MCP client configs on the machine
discovered, err := aguara.Discover()
for _, client := range discovered.Clients {
fmt.Printf("%s: %d servers\n", client.Client, len(client.Servers))
}

// List rules, optionally filtered
rules := aguara.ListRules(aguara.WithCategory("prompt-injection"))

// Get rule details with remediation
detail, err := aguara.ExplainRule("PROMPT_INJECTION_001")
fmt.Println(detail.Remediation)
```

Options: `WithMinSeverity()`, `WithDisabledRules()`, `WithCustomRules()`, `WithRuleOverrides()`, `WithWorkers()`, `WithIgnorePatterns()`, `WithMaxFileSize()`, `WithCategory()`, `WithToolName()`, `WithScanProfile()`, `WithDeduplicateMode()`, `WithStateDir()`.

## Architecture

```
aguara.go Public API: Scan, ScanContent, ScanContentAs, Discover, ListRules, ExplainRule
options.go Functional options (WithToolName, WithStateDir, WithDeduplicateMode, ...)
discover/ MCP client discovery: 17 clients, config parsers, auto-detection
cmd/aguara/ CLI entry point (Cobra)
cmd/wasm/ WASM build for browser-based scanning
internal/
engine/
pattern/ Pattern matcher: Aho-Corasick + regex, 8 decoders (base64, hex, URL, Unicode, HTML, hex-escape, base32, octal-escape)
ci/ CI Trust: .github/workflows/ YAML parser, pwn-request / cache / OIDC / persisted-credentials chains
pkgmeta/ PkgMeta: package.json parser, npm lifecycle / git source / publish-surface chains
jsrisk/ JSRisk: .js / .mjs / .cjs scanner, obfuscation / daemonization / CI-secret-harvest / runner-pivot / agent-persistence
nlp/ NLP: markdown AST + JSON/YAML string extraction, proximity-weighted classifier
toxicflow/ Taint: single-file taint tracking + cross-file correlation across directories
rugpull/ Rug-pull: SHA256 change detection (CLI --monitor, library WithStateDir)
rules/ Rule engine: YAML loader, compiler, self-tester
builtin/ 193 embedded rules across 13 YAML files (go:embed)
scanner/ Orchestrator: file discovery, parallel analysis, inline ignore, result aggregation
exemptions.go Tool exemptions, scan profiles, verdict computation
meta/ Post-processing: configurable dedup, scoring, risk score, correlation, confidence
output/ Formatters: terminal (ANSI), JSON, SARIF, Markdown
config/ .aguara.yml loader (supports tool-scoped rules)
incident/ Incident response: compromised package detection, cleanup, quarantine
state/ Persistence for rug-pull detection (CLI and library mode)
types/ Shared types (Finding, Severity, ScanResult, Verdict, DeduplicateMode)
```

## Comparison

Aguara is purpose-built for AI agent content. General-purpose SAST tools target application source code, not the skill files, tool descriptions, and MCP configs that agents consume.

| Feature | Aguara | Semgrep | Snyk Code | CodeQL |
|---------|--------|---------|-----------|--------|
| AI agent skill scanning | Yes | No | No | No |
| MCP config analysis | Yes | No | No | No |
| Prompt injection detection | Yes (18 rules + NLP) | No | No | No |
| Rug-pull detection | Yes | No | No | No |
| Supply chain exfil detection | Yes (10 rules) | No | No | No |
| Incident response (check/clean) | Yes | No | No | No |
| Taint tracking for skills | Yes | Yes | Yes | Yes |
| Offline / no account | Yes | Partial | No | Partial |
| Custom YAML rules | Yes | Yes | No | No |
| SARIF output | Yes | Yes | Yes | Yes |
| Free & open source | Yes (Apache 2.0) | Partial | No | Partial |

Aguara complements traditional SAST - use Semgrep for your app code, Aguara for your agent skills and MCP servers.

## Contributing

Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, adding rules, and the PR process.

For security vulnerabilities, see [SECURITY.md](SECURITY.md).

## License

[Apache License 2.0](LICENSE)