{"id":50454137,"url":"https://github.com/mizcausevic-dev/rag-sentinel","last_synced_at":"2026-06-01T01:05:41.178Z","repository":{"id":357461753,"uuid":"1232385604","full_name":"mizcausevic-dev/rag-sentinel","owner":"mizcausevic-dev","description":"Governance and observability layer for enterprise RAG systems. Chunk quality scoring, source freshness audits, retrieval drift detection, hallucination signals, and PII leakage scanning across every collection.","archived":false,"fork":false,"pushed_at":"2026-05-12T21:46:00.000Z","size":525,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-12T23:14:10.587Z","etag":null,"topics":["ai-governance","ai-platform","backend","embedding-drift","express","hallucination-detection","observability","platform-engineering","rag","retrieval-augmented-generation","typescript","vector-database"],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mizcausevic-dev.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-07T22:03:11.000Z","updated_at":"2026-05-12T21:46:04.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mizcausevic-dev/rag-sentinel","commit_stats":null,"previous_names":["mizcausevic-dev/rag-sentinel"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/mizcausevic-dev/rag-sentinel","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Frag-sentinel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Frag-sentinel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Frag-sentinel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Frag-sentinel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mizcausevic-dev","download_url":"https://codeload.github.com/mizcausevic-dev/rag-sentinel/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Frag-sentinel/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33755379,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-governance","ai-platform","backend","embedding-drift","express","hallucination-detection","observability","platform-engineering","rag","retrieval-augmented-generation","typescript","vector-database"],"created_at":"2026-06-01T01:05:41.115Z","updated_at":"2026-06-01T01:05:41.169Z","avatar_url":"https://github.com/mizcausevic-dev.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RAG Sentinel\n\n[![CI](https://github.com/mizcausevic-dev/rag-sentinel/actions/workflows/ci.yml/badge.svg)](https://github.com/mizcausevic-dev/rag-sentinel/actions/workflows/ci.yml)\n[![Node](https://img.shields.io/badge/node-20%2B-339933?logo=node.js\u0026logoColor=white)](https://nodejs.org)\n[![TypeScript](https://img.shields.io/badge/typescript-5.6-3178C6?logo=typescript\u0026logoColor=white)](https://www.typescriptlang.org)\n[![License: MIT](https://img.shields.io/badge/license-MIT-66FCF1)](LICENSE)\n\nGovernance and observability layer for **enterprise RAG systems**: chunk quality scoring, source freshness audits, retrieval drift detection, hallucination signals, and PII leakage scanning across every indexed collection.\n\n\u003e **What this repo proves**\n\u003e\n\u003e RAG reliability is not just a model-quality problem. It is an evidence, retrieval, and governance problem, and this repo treats it that way.\n\n## Why This Exists\n\nMost enterprise AI failures aren't model failures â€” they're **retrieval failures**. Stale documentation that contradicts current product behavior. Chunks that start mid-sentence and degrade relevance. API keys accidentally indexed into the vector store. Top-K results silently shifting after an embedding model upgrade. None of this is visible until a customer sees something wrong in production.\n\nRAG Sentinel is the layer that watches all of it. It scores chunks at index time, audits sources for staleness, detects retrieval drift across snapshots, evaluates answer grounding, and scans indexed content for PII. Output is operator-friendly: per-collection posture scores, blocked-content lists, top issues by frequency, and a Monday-morning dashboard that fits on one screen.\n\n## Where This Sits in the Portfolio\n\n| Repo | Surface | Question it answers |\n|---|---|---|\n| [`mcp-sentinel`](https://github.com/mizcausevic-dev/mcp-sentinel) | Tool calls | *What MCP tools are exposed and how risky are they?* |\n| **`rag-sentinel`** | **Retrieval** | ***What's in the vector store and how trustworthy is it?*** |\n| [`agent-codex`](https://github.com/mizcausevic-dev/agent-codex) | Decisions | *Under what policies are decisions allowed?* |\n| [`agentobserve`](https://github.com/mizcausevic-dev/agentobserve) | Runtime | *What did agents actually do â€” cost, latency, outcomes?* |\n| [`kinetic-flightdeck`](https://github.com/mizcausevic-dev/kinetic-flightdeck) | Operator | *Are we OK right now? Who do I call?* |\n\n## Project Overview\n\n| Attribute | Detail |\n|---|---|\n| Runtime | Node.js + TypeScript |\n| Framework | Express 5 |\n| Domain | Enterprise RAG governance and observability |\n| Validation Areas | Chunk quality Â· Source freshness Â· Retrieval drift Â· Hallucination signals Â· PII/sensitive content |\n| Operational Outputs | Per-chunk scores Â· Per-collection posture Â· Drift comparisons Â· Blocked-content lists Â· Open-incident view |\n| Docs | OpenAPI spec embedded; routes self-documented |\n\n## Five Governance Pillars\n\n### 1. Chunk Quality Scoring\n\nBad chunks lead to bad retrievals. Scored at index time:\n- Token count posture (too small loses context, too large exceeds embedding context)\n- Sentence boundary respect (chunks should not start/end mid-sentence)\n- Metadata completeness (source, title, last_updated minimum)\n- Boilerplate detection (low lexical diversity flagged)\n- Empty/whitespace content protection\n\n### 2. Source Freshness Audit\n\nStale RAG content is the silent killer. Bucket distribution + weighted score:\n- `fresh` (â‰¤30 days, weight 100)\n- `aging` (31â€“90 days, weight 75)\n- `stale` (91â€“365 days, weight 30)\n- `ancient` (\u003e365 days, weight 0)\n\n### 3. Retrieval Drift Detection\n\nSame query returning different results over time. Compares two retrieval snapshots:\n- Top-K overlap ratio\n- Spearman-style rank correlation\n- New / dropped chunk identification\n- Embedding-model-change detection (drift expected, validation required)\n\nDrift levels: `minimal` Â· `moderate` Â· `significant` Â· `severe`\n\n### 4. Hallucination Signals\n\nHeuristic grounding analysis on RAG answers:\n- Citation coverage (% of substantive claims with attribution)\n- Source-claim alignment (do quoted snippets actually appear in retrieved sources?)\n- Ungrounded number/date generation\n- Refusal recognition (refusing with no relevant sources is a positive signal)\n- Empty-retrieval-with-non-empty-answer guard\n\n### 5. PII / Sensitive Content Scanning\n\nCatches leakage before it ends up in retrieval results:\n- Private key blocks (PEM)\n- API/secret key prefixes (sk-, pk-, sk-proj-, etc)\n- AWS access keys (AKIA pattern)\n- JWT tokens\n- SSN, credit card, IBAN\n- Emails and US phone (low-severity awareness)\n\nSeverity-weighted blocking decision: `critical` and `high` hits trigger automatic block.\n\n### 5b. Tokenize-before-index (Skyyflow integration)\n\nThe blocking model above answers *\"is this content safe to index?\"* The Skyyflow vault integration answers a different question: *\"can this content be **made** safe to index by replacing the PII with tokens?\"*\n\nWhen a buyer publishes an [AI Procurement Decision Card v0.2](https://github.com/mizcausevic-dev/ai-procurement-decision-spec) that lists `data_vault_targets[]` with `vendor: \"skyyflow\"`, rag-sentinel:\n\n1. **At index time** — replaces the matched PII values (email, phone, SSN, credit card, IBAN — never credentials or auth secrets) with opaque vault tokens. The chunk text is rewritten with tokens; the embedding stays semantically valid; the vector store never sees raw PII.\n2. **At query time** — calls `detokenize()` with the caller's roles. If any of the caller's roles is in the Decision Card's `reveal_roles[]`, tokens become raw values; otherwise the response carries tokens through and an audit event is emitted.\n\nTwo vault implementations ship in the box:\n\n| Implementation | When it's selected | Notes |\n|---|---|---|\n| `MockSkyyflowVault` | Default — when env is not configured | In-memory + deterministic. Same input → same token across calls. For tests, screenshots, demos. |\n| `RealSkyyflowVault` | When `SKYYFLOW_VAULT_URL` + `SKYYFLOW_ACCESS_TOKEN` + `SKYYFLOW_VAULT_ID` are set | HTTP adapter to a hosted vault. Caller is responsible for token refresh. |\n\nCredentials and auth secrets (private keys, AWS access keys, JWT tokens, API keys) are **never tokenized** — they continue to block the chunk regardless of Decision Card target. That distinction is intentional: tokenizing a credential just makes it slightly harder to find while shipping it into a vector store anyway.\n\nTwo HTTP endpoints expose the integration:\n\n```\nGET  /api/vault/status                      mock vs real vault, vault id, env-toggle hint\nPOST /api/vault/preview                     decisionCard + chunks → vaulted text + substitution audit\nPOST /api/vault/detokenize-preview          decisionCard + tokens + callerRoles → reveal disposition\n```\n\nExample — `POST /api/vault/preview`:\n\n```jsonc\n// Request\n{\n  \"decisionCard\": { /* a Decision Card v0.2 document */ },\n  \"chunks\": [\n    { \"chunkId\": \"c1\", \"text\": \"Contact jane@example.com; SSN 123-45-6789 on file.\" }\n  ]\n}\n\n// Response (mock vault, truncated tokens for readability)\n{\n  \"decisionId\": \"DEMO-1\",\n  \"decisionCardVersion\": \"0.2\",\n  \"vaultVendor\": \"skyyflow\",\n  \"vaultMode\": \"mock\",\n  \"vaultId\": \"v_demo\",\n  \"fieldsAuthorized\": [\"email\", \"ssn\"],\n  \"revealRoles\": [\"principal\"],\n  \"chunks\": [\n    {\n      \"chunkId\": \"c1\",\n      \"vaultedText\": \"Contact skyy_c845…; SSN skyy_277a… on file.\",\n      \"substitutions\": [\n        { \"patternName\": \"email\",  \"token\": \"skyy_c845…\", \"field\": \"email\" },\n        { \"patternName\": \"ssn-us\", \"token\": \"skyy_277a…\", \"field\": \"ssn\"   }\n      ],\n      \"unauthorizedHits\": [],\n      \"shouldBlock\": false\n    }\n  ]\n}\n```\n\n## Composite Posture Methodology\n\n| Pillar | Weight | Rationale |\n|---|---|---|\n| Sensitive content | 0.25 | Leakage is binary; one critical hit blocks |\n| Hallucination | 0.25 | Grounding is the user-facing trust contract |\n| Freshness | 0.20 | Stale content silently degrades retrieval |\n| Chunk quality | 0.15 | Index-time investment |\n| Retrieval drift | 0.15 | Detected stability of the surface |\n\nOverride logic: a single critical signal (PII crisis, freshness crisis, hallucination crisis) **forces blocked status** regardless of composite â€” the same \"platform thinking\" doctrine used in `mcp-sentinel` and `kinetic-flightdeck`.\n\n## API Endpoints\n\n### Read\n\n| Method | Endpoint | Purpose |\n|---|---|---|\n| GET | `/health` | Service status and uptime |\n| GET | `/api/collections` | List registered RAG collections |\n| GET | `/api/collections/:id` | Single collection metadata + metrics |\n| GET | `/api/collections/:id/posture` | Composite posture score for collection |\n| GET | `/api/incidents` | Filtered incident feed (collectionId, severity, status, category) |\n| GET | `/api/vault/status` | Mock vs real Skyyflow vault, vault id, env-toggle hint |\n| POST | `/api/vault/preview` | Decision Card v0.2 + chunks → vaulted text + substitution audit |\n| POST | `/api/vault/detokenize-preview` | Decision Card v0.2 + tokens + callerRoles → reveal disposition |\n| GET | `/api/dashboard/summary` | Operator headline view |\n\n### Validate\n\n| Method | Endpoint | Purpose |\n|---|---|---|\n| POST | `/api/validate/chunks` | Score a batch of chunks at index time |\n| POST | `/api/validate/freshness` | Audit a collection's source freshness |\n| POST | `/api/validate/drift` | Compare two retrieval snapshots |\n| POST | `/api/validate/answer` | Evaluate an answer for hallucination signals |\n| POST | `/api/validate/pii-scan` | Scan a chunk batch for sensitive content |\n\n## Sample: Hallucination Evaluation\n\n```json\nPOST /api/validate/answer\n{\n  \"answerText\": \"The system uses Diffie-Hellman key exchange. This was confirmed in the 2024 audit.\",\n  \"citationsClaimed\": [\n    { \"sourceId\": \"s1\", \"quote\": \"Diffie-Hellman key exchange used since 2019\" }\n  ],\n  \"retrievedSources\": [\n    { \"sourceId\": \"s1\", \"text\": \"The cryptographic stack uses RSA-2048 for transport.\" }\n  ]\n}\n```\n\n```json\n{\n  \"groundingScore\": 25,\n  \"citationCoverage\": 50,\n  \"signals\": [\n    \"1 citation(s) reference content not in retrieved sources.\",\n    \"1 numeric/date claim(s) not present in retrieved sources: 2024.\"\n  ],\n  \"unsupportedCitations\": [\n    { \"sourceId\": \"s1\", \"quote\": \"Diffie-Hellman key exchange used since 2019\" }\n  ],\n  \"recommendedNextAction\": \"Block answer from production output; investigate retrieval quality and prompt grounding instructions.\"\n}\n```\n\n## Sample: PII Scan\n\n```json\nPOST /api/validate/pii-scan\n{\n  \"chunks\": [\n    { \"chunkId\": \"c_429\", \"text\": \"Use sk-proj-AbCdEf1234567890XyZpQrStUvWxYz123456 to authenticate.\" }\n  ]\n}\n```\n\n```json\n{\n  \"totalChunks\": 1,\n  \"flaggedChunks\": 1,\n  \"blockedChunks\": 1,\n  \"hitsBySeverity\": { \"critical\": 1, \"high\": 0, \"medium\": 0, \"low\": 0 },\n  \"hitsByPattern\": { \"api-key-prefix\": 1 },\n  \"perChunk\": [\n    {\n      \"chunkId\": \"c_429\",\n      \"hits\": [\n        {\n          \"patternName\": \"api-key-prefix\",\n          \"severity\": \"critical\",\n          \"description\": \"API/secret key with conventional prefix detected.\",\n          \"matchedSnippet\": \"sk-p****56\"\n        }\n      ],\n      \"highestSeverity\": \"critical\",\n      \"shouldBlock\": true\n    }\n  ]\n}\n```\n\n## Operator Console Preview\n\n![RAG Sentinel operator console â€” KPIs, collection posture, retrieval drift, freshness, and incident timeline](docs/hero.png)\n\n## Getting Started\n\n### Prerequisites\n\n- Node.js 20+\n- npm\n\n### Setup\n\n```bash\ngit clone https://github.com/mizcausevic-dev/rag-sentinel.git\ncd rag-sentinel\nnpm install\nnpm run dev\n```\n\nVisit:\n\n- `http://localhost:3000/health`\n- `http://localhost:3000/api/dashboard/summary`\n- `http://localhost:3000/api/collections`\n\n### Run Tests\n\n```bash\nnpm test\n```\n\n35 unit tests covering chunk quality, freshness buckets, retrieval drift edge cases, hallucination heuristics, PII pattern coverage, and composite posture override logic.\n\n## What This Demonstrates\n\n- RAG governance translated into enforceable, testable backend rules\n- Heuristic-but-defensible analysis of grounding without requiring LLM calls in the loop\n- Composite scoring that respects platform-engineering doctrine (sensitive content + hallucination dominate)\n- Override logic â€” a single critical signal blocks regardless of good composites\n- Pluggable validation endpoints designed to wire into indexing pipelines and answer pipelines\n- Strict-mode TypeScript with full test coverage; CI matrix on Node 20 + 22\n\n## Future Enhancements\n\n- Real-time polling agent for vector stores (Pinecone, Qdrant, Weaviate, pgvector)\n- LLM-based grounding cross-check (heuristics + judge model)\n- Streaming chunk validation for ingestion pipelines\n- Per-collection scoring history with PostgreSQL + Grafana\n- Alert routing to PagerDuty, Slack, and SIEMs\n- Multi-tenant control plane for managed-service deployment\n\n## Tech Stack\n\n- Node.js, TypeScript, Express, Zod\n- Helmet, CORS, Morgan\n- Node test runner\n\n## Portfolio Links\n\n- [LinkedIn](https://www.linkedin.com/in/mizcausevic/)\n- [Skills Page](https://mizcausevic.com/skills)\n- [Medium](https://medium.com/@mizcausevic)\n- [GitHub](https://github.com/mizcausevic-dev)\n\nPart of [mizcausevic-dev's GitHub portfolio](https://github.com/mizcausevic-dev) â€” AI Platform Engineering quintet.\n\n---\n\n**Connect:** [LinkedIn](https://www.linkedin.com/in/mirzacausevic/) Â· [Kinetic Gain](https://kineticgain.com) Â· [Medium](https://medium.com/@mizcausevic/) Â· [Skills](https://mizcausevic.com/skills/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmizcausevic-dev%2Frag-sentinel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmizcausevic-dev%2Frag-sentinel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmizcausevic-dev%2Frag-sentinel/lists"}