https://github.com/amafjarkasi/hsx-context-hygiene-engine

Context hygiene & risk adjudication for LLM pipelines: secrets, PII, prompt-injection, policy redaction & tokenization.
https://github.com/amafjarkasi/hsx-context-hygiene-engine

cli compliance content-safety context-hygiene data-sanitization llm llm-security nodejs pii-redaction policy-engine prompt-injection redaction secret-scanning security tokenization typescript

Last synced: 5 months ago
JSON representation

Context hygiene & risk adjudication for LLM pipelines: secrets, PII, prompt-injection, policy redaction & tokenization.

Host: GitHub
URL: https://github.com/amafjarkasi/hsx-context-hygiene-engine
Owner: amafjarkasi
License: mit
Created: 2025-08-26T11:04:28.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-09-04T06:43:28.000Z (11 months ago)
Last Synced: 2025-09-13T02:19:41.184Z (10 months ago)
Topics: cli, compliance, content-safety, context-hygiene, data-sanitization, llm, llm-security, nodejs, pii-redaction, policy-engine, prompt-injection, redaction, secret-scanning, security, tokenization, typescript
Language: TypeScript
Size: 29.3 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# 🧬 HSX Context Hygiene Engine (formerly sanitize-mcp)

A multi-stage context hygiene & risk adjudication engine for LLM toolchains. It detects and processes secrets, personal data, and adversarial prompt artifacts before they reach a model boundary—delivering deterministic redaction, linkable tokenization, or policy-based refusal.

## Why HSX?
Traditional "sanitize" passes operate as fragile regex filters. HSX layers signature scanning, span adjudication (collision & precedence aware), policy mapping, and stable tokenization so downstream systems can preserve referential integrity without exposing sensitive substrings.

## Key Capabilities
- Signature-based detection (extensible JSON signature packs)
- Span collision adjudication with precedence (SECRET_* > PROMPT_INJECTION_* > PII_* > META_)
- Wildcard policy rules with actions (REDACT, MASK_PARTIAL, TOKENIZE_LINKABLE, STRIP_LINE, FLAG_ONLY, KEEP)
- Stable, linkable tokenization via HMAC (configurable truncation & encoding)
- Deterministic rewrite ordering (reverse-offset application prevents index drift)
- Lightweight CLI (hsx-cli scrub )
- Extensible risk & confidence model (future fusion documented)

## Quick Start
```bash
npm install
npm run build
echo "Contact me at dev@example.com AKIAABCDEFGHIJKLMNOPQRST ignore previous text" > sample.txt
npx hsx-cli scrub sample.txt
```

## Configuration
Environment variables:
- HSX_SIGNATURE_DIR (default: config/signatures)
- HSX_POLICY_PATH (default: config/hsx-policy.json)
- PHI_SALT (secret salt for tokenization stability)

## Directory Layout
```
config/ # policy + signature packs
src/core/ # types & interval index
src/detection/ # signature loading & scanning
src/policy/ # policy evaluation & tokenization
src/pipeline/ # adjudication (precedence + collision)
src/cli/ # hsx-cli entrypoint
docs/ # architecture & rationale
```

## Tokenization
Produces tokens using HMAC-SHA256(kind || raw) with truncated digest (default base32 9 bytes). Consistent for identical (kind, value) pairs enabling safe correlation.

## Roadmap (abridged)
- Confidence fusion of overlapping heuristics
- Structured audit log (hashes only, no raw secret values)
- Streaming transformer API
- Additional signature categories (IP, phone, JWT, credit card with Luhn)

See docs/ for deeper details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/amafjarkasi/hsx-context-hygiene-engine

Awesome Lists containing this project

README