https://github.com/thepixelabs/sanitai

Find secrets in your LLM chat history before someone else does.
https://github.com/thepixelabs/sanitai

chatgpt claude cli llm privacy rust secret-scanning security

Last synced: 2 months ago
JSON representation

Find secrets in your LLM chat history before someone else does.

Host: GitHub
URL: https://github.com/thepixelabs/sanitai
Owner: thepixelabs
License: mit
Created: 2026-04-10T14:24:18.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-04-12T03:15:49.000Z (3 months ago)
Last Synced: 2026-04-12T04:21:09.772Z (3 months ago)
Topics: chatgpt, claude, cli, llm, privacy, rust, secret-scanning, security
Language: Rust
Homepage: https://thepixelabs.github.io/sanitai/
Size: 15.4 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md

Awesome Lists containing this project

README

          # [SanitAI](https://sanitai.pixelabs.net)



**Find secrets in your LLM chat history before someone else does.**

SanitAI scans your Claude and ChatGPT conversation exports for leaked API keys, credentials, and personal data — entirely on your machine. No network calls. No cloud. No copies of your data leave the device.

[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

[![Platform: Linux macOS](https://img.shields.io/badge/platform-Linux%20%7C%20macOS-lightgrey.svg)](#install)

---

## The problem

You paste a `.env` file into Claude to debug a config issue. You ask ChatGPT to review a script that contains a database URL. Six months later, you export your conversation history and realise those secrets have been sitting in plain text inside a JSON file on your laptop — and in every backup, sync, and export you've made since.

SanitAI finds them. In seconds.

---

## Install

**Homebrew (macOS and Linux)**

```sh

brew install thepixelabs/tap/sanitai

```

**curl installer (Linux and macOS)**

```sh

curl -fsSL https://releases.sanitai.dev/install.sh | sh

```

Verify the binary signature before running anything (see [Trust model](#trust-model)).

**Cargo (build from source)**

```sh

cargo install sanitai

```

Requires Rust 1.78 or later.

---

## Your first scan

```sh

sanitai scan ~/Downloads/claude_export.zip

```

```

SanitAI v0.1.0 — local scan, no network

Parsing: claude_export.zip (Claude format)

Scanning 1,842 messages across 94 conversations...

FINDINGS (4)

────────────────────────────────────────────────────────────────

 HIGH  AWS Access Key        conversation_47.json:line 312

       AKIA[REDACTED]        matched: aws_access_key_id pattern

 HIGH  Generic API Key       conversation_12.json:line 88

       sk-[REDACTED]         matched: high-entropy bearer token

 MED   Email address         conversation_91.json:line 201

       j****@example.com     matched: RFC 5322 address heuristic

 LOW   Internal hostname     conversation_03.json:line 44

       db.internal.corp      matched: internal TLD heuristic

────────────────────────────────────────────────────────────────

4 findings in 1,842 messages (0.8 s)

Run `sanitai redact` to remove findings from a copy of the export.

```

No finding left the machine. The original file was not modified.

---

## Supported sources

| Source | Export format | Parser |

|---|---|---|

| Claude (claude.ai) | `conversations.json` ZIP export | Built-in |

| ChatGPT (chat.openai.com) | `conversations.json` ZIP export | Built-in |

Additional parsers are planned for future releases. The parser is separate from the detector — adding a new source does not change detection logic.

---

## Detection capabilities

| Category | Examples | Method |

|---|---|---|

| Cloud provider keys | AWS, GCP, Azure, Stripe, Twilio | Regex against known prefixes |

| Generic secrets | High-entropy strings in key=value context | Shannon entropy + heuristic |

| Bearer tokens | `Authorization:` headers, `sk-` prefixes | Regex + context heuristic |

| Database URLs | `postgres://`, `mysql://`, `mongodb+srv://` | Regex |

| Private keys | PEM blocks (`BEGIN RSA PRIVATE KEY`, etc.) | Regex |

| Email addresses | RFC 5322 addresses | Heuristic |

| Phone numbers | E.164 and regional formats | Regex + heuristic |

| Custom patterns | User-defined YAML rules | Regex + optional entropy |

Detectors run locally in process. No pattern or matched text is sent anywhere.

---

## Redaction modes

```sh

# Write a redacted copy alongside the original (default)

sanitai redact ~/Downloads/claude_export.zip

# Redact in place (replaces the original — use with caution)

sanitai redact --in-place ~/Downloads/claude_export.zip

# Replace matched text with a fixed mask

sanitai redact --mask "***REDACTED***" ~/Downloads/claude_export.zip

# Replace matched text with the finding type label

sanitai redact --mask-with-type ~/Downloads/claude_export.zip

# e.g. "[AWS_ACCESS_KEY]", "[EMAIL_ADDRESS]"

```

---

## Configuration

SanitAI reads `~/.config/sanitai/config.toml` (XDG-compliant). All fields are optional.

```toml

# ~/.config/sanitai/config.toml

[scan]

min_severity = "low"   # "low" | "medium" | "high"

disable_detectors = []

[redact]

mask = "[REDACTED]"

mask_with_type = false

[rules]

extra_rules_dirs = ["~/.config/sanitai/rules"]

```

Validate your config:

```sh

sanitai config validate

```

---

## Custom rules

```yaml

# ~/.config/sanitai/rules/acme-api-key.yaml

id: acme_api_key

name: "ACME Corp API Key"

severity: high

pattern: "acme_[a-zA-Z0-9]{32}"

min_entropy: 3.5

context_keywords: ["api_key", "apikey", "token"]

```

See [docs/custom-rules.md](docs/custom-rules.md) for the complete schema and examples.

---

## How it works

```

1. PARSE                  2. DETECT                  3. REPORT

─────────────             ─────────────              ─────────────

Read export ZIP     →     For each message:    →     Print findings

Identify format           run regex detectors         by severity

Extract message           run entropy scorer

text per turn             run heuristics

                          apply custom rules

```

Everything happens in a single local process. No state is persisted between runs.

---

## Trust model

**Offline by design.** SanitAI makes zero network connections at runtime. Verify:

```sh

# Linux

strace -e trace=network sanitai scan /path/to/export.zip 2>&1 | grep -v "^strace"

# macOS

sudo fs_usage -w -f network sanitai

```

A clean scan produces no output from either command.

**No telemetry.** No analytics, no crash reporting, no usage pinging.

**Signed binaries.** Release binaries are signed with [cosign](https://docs.sigstore.dev/cosign/overview/).

```sh

cosign verify-blob \

  --certificate sanitai-linux-amd64.cert \

  --signature sanitai-linux-amd64.sig \

  --certificate-identity \

    "https://github.com/sanitai/sanitai/.github/workflows/release.yml@refs/heads/main" \

  --certificate-oidc-issuer "https://token.actions.githubusercontent.com" \

  sanitai-linux-amd64

```

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md). Security vulnerabilities: see [SECURITY.md](SECURITY.md).

---

## License

MIT — see [LICENSE](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/thepixelabs/sanitai

Awesome Lists containing this project

README