https://github.com/amafjarkasi/hsx-context-hygiene-engine
Context hygiene & risk adjudication for LLM pipelines: secrets, PII, prompt-injection, policy redaction & tokenization.
https://github.com/amafjarkasi/hsx-context-hygiene-engine
cli compliance content-safety context-hygiene data-sanitization llm llm-security nodejs pii-redaction policy-engine prompt-injection redaction secret-scanning security tokenization typescript
Last synced: 4 months ago
JSON representation
Context hygiene & risk adjudication for LLM pipelines: secrets, PII, prompt-injection, policy redaction & tokenization.
- Host: GitHub
- URL: https://github.com/amafjarkasi/hsx-context-hygiene-engine
- Owner: amafjarkasi
- License: mit
- Created: 2025-08-26T11:04:28.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-09-04T06:43:28.000Z (9 months ago)
- Last Synced: 2025-09-13T02:19:41.184Z (9 months ago)
- Topics: cli, compliance, content-safety, context-hygiene, data-sanitization, llm, llm-security, nodejs, pii-redaction, policy-engine, prompt-injection, redaction, secret-scanning, security, tokenization, typescript
- Language: TypeScript
- Size: 29.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🧬 HSX Context Hygiene Engine (formerly sanitize-mcp)
A multi-stage context hygiene & risk adjudication engine for LLM toolchains. It detects and processes secrets, personal data, and adversarial prompt artifacts before they reach a model boundary—delivering deterministic redaction, linkable tokenization, or policy-based refusal.
## Why HSX?
Traditional "sanitize" passes operate as fragile regex filters. HSX layers signature scanning, span adjudication (collision & precedence aware), policy mapping, and stable tokenization so downstream systems can preserve referential integrity without exposing sensitive substrings.
## Key Capabilities
- Signature-based detection (extensible JSON signature packs)
- Span collision adjudication with precedence (SECRET_* > PROMPT_INJECTION_* > PII_* > META_)
- Wildcard policy rules with actions (REDACT, MASK_PARTIAL, TOKENIZE_LINKABLE, STRIP_LINE, FLAG_ONLY, KEEP)
- Stable, linkable tokenization via HMAC (configurable truncation & encoding)
- Deterministic rewrite ordering (reverse-offset application prevents index drift)
- Lightweight CLI (hsx-cli scrub )
- Extensible risk & confidence model (future fusion documented)
## Quick Start
```bash
npm install
npm run build
echo "Contact me at dev@example.com AKIAABCDEFGHIJKLMNOPQRST ignore previous text" > sample.txt
npx hsx-cli scrub sample.txt
```
## Configuration
Environment variables:
- HSX_SIGNATURE_DIR (default: config/signatures)
- HSX_POLICY_PATH (default: config/hsx-policy.json)
- PHI_SALT (secret salt for tokenization stability)
## Directory Layout
```
config/ # policy + signature packs
src/core/ # types & interval index
src/detection/ # signature loading & scanning
src/policy/ # policy evaluation & tokenization
src/pipeline/ # adjudication (precedence + collision)
src/cli/ # hsx-cli entrypoint
docs/ # architecture & rationale
```
## Tokenization
Produces tokens using HMAC-SHA256(kind || raw) with truncated digest (default base32 9 bytes). Consistent for identical (kind, value) pairs enabling safe correlation.
## Roadmap (abridged)
- Confidence fusion of overlapping heuristics
- Structured audit log (hashes only, no raw secret values)
- Streaming transformer API
- Additional signature categories (IP, phone, JWT, credit card with Luhn)
See docs/ for deeper details.