{"id":50454182,"url":"https://github.com/mizcausevic-dev/shadow-ai-detector","last_synced_at":"2026-06-01T01:05:43.892Z","repository":{"id":357280952,"uuid":"1232423200","full_name":"mizcausevic-dev/shadow-ai-detector","owner":"mizcausevic-dev","description":"Detect unauthorized LLM usage across enterprise networks. Endpoint catalog, traffic pattern analysis, payload sensitivity scanning, department-level shadow-AI exposure rollups, and CISO-ready incident reporting.","archived":false,"fork":false,"pushed_at":"2026-05-12T04:26:45.000Z","size":509,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-12T06:27:34.289Z","etag":null,"topics":["ai-governance","ai-security","ciso-tooling","cybersecurity","data-loss-prevention","express","llm-security","platform-engineering","shadow-ai","typescript"],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mizcausevic-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-07T23:20:03.000Z","updated_at":"2026-05-12T04:26:49.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mizcausevic-dev/shadow-ai-detector","commit_stats":null,"previous_names":["mizcausevic-dev/shadow-ai-detector"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/mizcausevic-dev/shadow-ai-detector","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Fshadow-ai-detector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Fshadow-ai-detector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Fshadow-ai-detector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Fshadow-ai-detector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mizcausevic-dev","download_url":"https://codeload.github.com/mizcausevic-dev/shadow-ai-detector/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Fshadow-ai-detector/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33755379,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-governance","ai-security","ciso-tooling","cybersecurity","data-loss-prevention","express","llm-security","platform-engineering","shadow-ai","typescript"],"created_at":"2026-06-01T01:05:43.827Z","updated_at":"2026-06-01T01:05:43.886Z","avatar_url":"https://github.com/mizcausevic-dev.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Shadow AI Detector\n\n[![CI](https://github.com/mizcausevic-dev/shadow-ai-detector/actions/workflows/ci.yml/badge.svg)](https://github.com/mizcausevic-dev/shadow-ai-detector/actions/workflows/ci.yml)\n[![Node](https://img.shields.io/badge/node-20%2B-339933?logo=node.js\u0026logoColor=white)](https://nodejs.org)\n[![TypeScript](https://img.shields.io/badge/typescript-5.6-3178C6?logo=typescript\u0026logoColor=white)](https://www.typescriptlang.org)\n[![License: MIT](https://img.shields.io/badge/license-MIT-66FCF1)](LICENSE)\n\nDetect unauthorized LLM usage across enterprise networks. Endpoint catalog, traffic pattern analysis, payload sensitivity scanning, department-level shadow-AI exposure rollups, and CISO-ready incident reporting.\n\n\u003e Recruiter takeaway:\n\u003e\n\u003e *\"This person built the thing CISOs are asking the platform team to build right now — and it ships with a 30-endpoint catalog, payload PII/credential scanning, country-of-origin risk weighting, and department-level exposure scoring. CyberArk pedigree shows in every module.\"*\n\n## Why This Exists\n\nBy 2026, every enterprise has a shadow-AI problem. Sales teams paste customer PII into chatgpt.com. Engineers paste production credentials into Together AI. Executives type M\u0026A codenames into claude.ai. Marketing tries Yandex GPT for \"free translation.\" Most companies have no detection layer at all — they just hope their DLP catches the symptoms.\n\nThis repo is the detection layer. It ingests proxy/firewall traffic, classifies each request against a catalog of 30+ known LLM endpoints (commercial APIs, inference hosts, consumer web interfaces, self-hosted patterns), scans payloads for credentials/PII/classified markers, applies country-of-origin and volume signals, and rolls everything up into department-level exposure scores that a CISO can take to a board meeting.\n\nIt's a CISO tool. Built by someone who's already worked enterprise security platforms.\n\n## Where This Sits in the Portfolio\n\n| Repo | Surface | Question it answers |\n|---|---|---|\n| [`mcp-sentinel`](https://github.com/mizcausevic-dev/mcp-sentinel) | Tool calls | What MCP tools are exposed and how risky? |\n| [`rag-sentinel`](https://github.com/mizcausevic-dev/rag-sentinel) | Retrieval | What's in the vector store and how trustworthy? |\n| [`agent-codex`](https://github.com/mizcausevic-dev/agent-codex) | Decisions | Under what policies are decisions allowed? |\n| [`agent-eval-arena`](https://github.com/mizcausevic-dev/agent-eval-arena) | Pre-prod | Should this model promotion ship? |\n| [`agentobserve`](https://github.com/mizcausevic-dev/agentobserve) | Runtime | What did agents actually do? |\n| [`kinetic-flightdeck`](https://github.com/mizcausevic-dev/kinetic-flightdeck) | Operator | Are we OK right now? |\n| **`shadow-ai-detector`** | **Egress** | ***Who's leaking what to whom — and from what dept?*** |\n\n## What It Detects\n\n### Endpoint Catalog (30+ providers)\n\n| Tier | Examples |\n|---|---|\n| Frontier APIs | Anthropic, OpenAI (chat + DALL·E), Google GenAI/Vertex, Azure OpenAI, AWS Bedrock |\n| Mainstream APIs | Cohere, Mistral, Voyage |\n| Inference hosts | Together AI, Replicate, Fireworks, Groq, Hugging Face Inference |\n| Image / voice | Stability AI, ElevenLabs |\n| Consumer web (shadow-AI red flags) | chatgpt.com, claude.ai, gemini.google.com, perplexity.ai, character.ai |\n| High-risk regions | DeepSeek (CN), Alibaba Qwen (CN), Moonshot Kimi (CN), Yandex (RU) |\n| Self-hosted | Ollama, vLLM (private IP detection) |\n\nEach endpoint carries default risk band, source country, capability classification, and notes.\n\n### Payload Scanner — 21 patterns across 6 categories\n\n| Category | Patterns | Example signals |\n|---|---|---|\n| `credential` | Private key blocks, AWS access keys, API key prefixes, JWTs, GitHub PATs, Slack tokens, inline passwords |\n| `pii` | US SSN, IBAN, phone, email, DOB markers |\n| `pci` | Credit card patterns, CVV markers |\n| `health` | MRN markers, ICD codes |\n| `internal-marker` | CONFIDENTIAL/SECRET/INTERNAL ONLY/RESTRICTED, M\u0026A codename patterns |\n| `source-code` | AWS SDK creds, database connection strings with embedded passwords |\n\nMatches return redacted snippets — never raw secrets in the output.\n\n### Risk Scorer — composite per event\n\n```\nscore = endpoint_default_risk_band\n      + sanction_penalty (sanctioned: 0, unsanctioned: +35, unknown: +25)\n      + country_residency_penalty (CN/RU: +15)\n      + payload_severity_sum (critical: +35 each, high: +20, medium: +10, low: +3)\n      + volume_anomaly (\u003e256KB upload: +10)\n```\n\n| Score | Tier | Recommended action |\n|---|---|---|\n| 0-24 | minimal | No action; normal sanctioned traffic |\n| 25-49 | elevated | Log for weekly review |\n| 50-74 | high | Quarantine session; require justification |\n| 75-100 | critical | Block egress; alert CISO + manager; preserve for forensics |\n\n### Department Rollup\n\nFor each department: total events, LLM events, unique users/providers, tier distribution, top provider, unsanctioned rate, and a composite `exposureScore` that weights criticals heavily and unsanctioned-rate proportionally. Recommended actions scale: weekly monitoring → dept-level review → all-hands briefing within 48h for hotspots.\n\n## API Endpoints\n\n| Method | Endpoint | Purpose |\n|---|---|---|\n| GET | `/health` | Service status |\n| GET | `/api/endpoints` | Full LLM catalog + sanctioned list + provider metadata |\n| POST | `/api/endpoints/classify` | Classify a single URL/host |\n| POST | `/api/analyze/payload` | Scan a payload for sensitive content |\n| POST | `/api/analyze/event` | Assess a single traffic event end-to-end |\n| POST | `/api/analyze/traffic` | Bulk-assess events; return summary + per-dept rollup + per-event verdicts |\n| GET | `/api/incidents` | List incidents (filter by `?status=` and `?severity=`) |\n| GET | `/api/incidents/:id` | Single incident |\n| GET | `/api/dashboard/summary` | CISO summary against the demo dataset |\n| GET | `/api/dashboard/exposure` | Department exposure rankings |\n\n## Sample: Single Event Assessment\n\n```json\nPOST /api/analyze/event\n{\n  \"event\": {\n    \"eventId\": \"evt_001\",\n    \"timestamp\": \"2026-05-07T11:02:30Z\",\n    \"url\": \"https://api.deepseek.com/chat/completions\",\n    \"method\": \"POST\",\n    \"payloadSnippet\": \"Translate technical spec. INTERNAL ONLY material attached.\",\n    \"user\": \"grace.intern@corp.com\",\n    \"department\": \"product\",\n    \"sourceHost\": \"10.7.40.56\",\n    \"bytesUp\": 32768,\n    \"bytesDown\": 8192\n  }\n}\n```\n\n```json\n{\n  \"eventId\": \"evt_001\",\n  \"matched\": true,\n  \"endpointId\": \"deepseek-api\",\n  \"provider\": \"DeepSeek\",\n  \"sanctionStatus\": \"unsanctioned\",\n  \"riskScore\": 100,\n  \"riskTier\": \"critical\",\n  \"signals\": [\n    \"Endpoint deepseek-api classified as high default risk.\",\n    \"Endpoint not on org sanctioned list.\",\n    \"Data residency / export-control concern.\",\n    \"Provider hosted in CN; data residency / export-control concern.\",\n    \"internal-marker pattern detected: classified-marker (critical).\",\n    \"Payload contains content recommended for block.\"\n  ],\n  \"recommendedAction\": \"Block egress; alert CISO + user manager; preserve traffic for forensics.\"\n}\n```\n\n## Operator Console Preview\n\n![Shadow AI Detector dashboard — endpoint catalog, risk distribution, department exposure, open incidents](docs/hero.png)\n\n## Getting Started\n\n### Prerequisites\n\n- Node.js 20+\n- npm\n\n### Setup\n\n```bash\ngit clone https://github.com/mizcausevic-dev/shadow-ai-detector.git\ncd shadow-ai-detector\nnpm install\nnpm run dev\n```\n\nVisit:\n\n- `http://localhost:3000/health`\n- `http://localhost:3000/api/dashboard/summary`\n- `http://localhost:3000/api/endpoints`\n\n### Run Tests\n\n```bash\nnpm test\n```\n\n36 unit tests across endpoint classification, payload scanning, risk scoring, and department rollup.\n\n## What This Demonstrates\n\n- Security-first thinking applied to AI infrastructure (CyberArk pedigree, applied)\n- Pattern catalog work — non-trivial regex hardening, redacted snippet output, severity-aware aggregation\n- Country-of-origin / sanctions awareness as first-class signal\n- Department-level rollup that produces board-ready exposure metrics\n- Strict-mode TypeScript with full test coverage; CI matrix on Node 20 + 22\n- Heuristic-first detection — no LLM-as-judge in the hot path; deterministic, testable, cheap\n\n## Future Enhancements\n\n- Wire to actual proxy/firewall log streams (Zscaler, Netskope, syslog tail)\n- ML-based anomaly detection layered on top of pattern catalog\n- User-level baseline + drift detection (per-user normal usage profile)\n- Integration with SIEM (Splunk, Datadog, Elastic) for incident enrichment\n- Auto-block plumbing via firewall API\n- DLP rule generator — emit Zscaler/Netskope policy from sanctioned list\n- Quarterly board-ready PDF exposure report\n\n## Tech Stack\n\n- Node.js, TypeScript, Express, Zod\n- Helmet, CORS, Morgan\n- Node test runner\n\n## Portfolio Links\n\n- [LinkedIn](https://www.linkedin.com/in/mizcausevic/)\n- [Skills Page](https://mizcausevic.com/skills)\n- [Medium](https://medium.com/@mizcausevic)\n- [GitHub](https://github.com/mizcausevic-dev)\n\nPart of [mizcausevic-dev's GitHub portfolio](https://github.com/mizcausevic-dev) — AI Platform Engineering doctrine.\n\n---\n\n**Connect:** [LinkedIn](https://www.linkedin.com/in/mirzacausevic/) · [Kinetic Gain](https://kineticgain.com) · [Medium](https://medium.com/@mizcausevic/) · [Skills](https://mizcausevic.com/skills/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmizcausevic-dev%2Fshadow-ai-detector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmizcausevic-dev%2Fshadow-ai-detector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmizcausevic-dev%2Fshadow-ai-detector/lists"}