{"id":19511477,"url":"https://github.com/datafog/datafog","last_synced_at":"2026-06-15T11:31:54.633Z","repository":{"id":340501900,"uuid":"888309707","full_name":"DataFog/datafog","owner":"DataFog","description":"Policy-first API and policy gate for AI/CLI workflows: detect sensitive data, decide allow/transform/deny, and emit auditable receipts.","archived":false,"fork":false,"pushed_at":"2026-02-25T06:28:45.000Z","size":285,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-25T10:46:52.487Z","etag":null,"topics":["ai-security","cli","data-governance","go","llm","pii-detection","policy-enforcement","policy-engine","privacy"],"latest_commit_sha":null,"homepage":"https://datafog.vercel.app","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DataFog.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"docs/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2024-11-14T07:14:31.000Z","updated_at":"2026-02-25T06:28:49.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/DataFog/datafog","commit_stats":null,"previous_names":["datafog/datafog"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/DataFog/datafog","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataFog%2Fdatafog","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataFog%2Fdatafog/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataFog%2Fdatafog/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataFog%2Fdatafog/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DataFog","download_url":"https://codeload.github.com/DataFog/datafog/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataFog%2Fdatafog/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34358717,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-15T02:00:07.085Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-security","cli","data-governance","go","llm","pii-detection","policy-enforcement","policy-engine","privacy"],"created_at":"2024-11-10T23:21:09.236Z","updated_at":"2026-06-15T11:31:54.620Z","avatar_url":"https://github.com/DataFog.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DataFog\n\n**The data firewall for agents and developer tools.**\n\nDataFog is a runtime **data governance layer** for AI agents and developer tooling.\n\nIt runs a single in-process policy loop: **detect → decide → enforce**.\nFor each payload crossing a process boundary (command execution, file read/write, or API action),\nit detects sensitive entities, evaluates policy, and enforces the result before the action proceeds.\n\nThis repo has two runtime pieces:\n\n- `datafog-api` – HTTP API for scan/decide/transform/receipts.\n- `datafog-shim` – optional runtime policy gate wrapper for CLI-style execution.\n\nThe wrapper process is still named `datafog-shim` for compatibility, but we describe its role as a *policy gate*.\n\n## What DataFog does (technical)\n\n1. **Detect** sensitive entities in text and payload context (`/v1/scan`).\n2. **Decide** using adapter-aware policy rules (`/v1/decide`) from `policy.json`.\n3. **Enforce** the decision before execution (`allow`, `transform`, `allow_with_redaction`, or `deny`) in consuming runtimes.\n4. **Transform or tokenize** matched data deterministically when a policy asks for it (`/v1/transform`, `/v1/anonymize`).\n5. **Emit an auditable receipt** for every enforcement decision (`/v1/receipts/{id}`).\n6. **Optionally emit decision events** (`/v1/events`) when `DATAFOG_EVENTS_PATH` is set.\n\n## What it does not do\n\n- It does not secure every layer of your platform for you.\n- It does not continuously discover vulnerabilities.\n- It does not manage policy editing UI or dynamic policy updates through the API.\n- It does not guarantee zero false positives/negatives from detection (detectors are deterministic and regex/heuristic based).\n\n## Use cases\n\n- Prevent sensitive data from crossing process boundaries before it leaves the machine (for example: a shell command exposing credentials or a script writing secret-bearing files).\n- Enforce policy-specific transformations such as masking, tokenization, or redaction at runtime.\n- Add pre-execution guardrails to AI agents and CLI workflows.\n- Keep auditable receipts/events for every policy decision.\n\n## Positioning\n\n- **Developers and agent builders:** DataFog is a **data-aware policy enforcement layer** for CLI tools and AI agents. It sits in your PATH or runtime, inspects data flowing through commands, and enforces policy before sensitive actions execute.\n- **Security/compliance buyers:** DataFog maps closely to runtime DLP for developer workstations, but without the legacy footprint: policy is programmable (OPA-style), decision-aware, and process-bound.\n- **Broader view:** DataFog is the **data plane for agent governance** — detect, decide, enforce, and audit.\n\n## Repository layout\n\n- `cmd/datafog-api`: API server.\n- `cmd/datafog-shim`: policy-gate wrapper CLI.\n- `internal/policy`: policy parsing and matching.\n- `internal/scan`: entity detectors.\n- `internal/transform`: deterministic redaction/masking/tokenization/anonymization.\n- `internal/receipts`: receipt persistence.\n- `internal/server`: HTTP handlers and middleware.\n- `internal/shim`: decision + execution adapters.\n- `config/policy.json`: starter policy used by default.\n- `docs/`: API contract and operational docs.\n\n## Prerequisites\n\n- Go **1.22+**\n- Optional: Docker (for container workflow)\n- Optional: `jq` for pretty-printing JSON\n\n## Quick start (API only)\n\n```sh\ngo mod download\ngo run ./cmd/datafog-api\n```\n\nThe API listens on `:8080` by default and requires a valid policy file at `config/policy.json`.\n\nVerify service is up:\n\n```sh\ncurl -i http://localhost:8080/health\n```\n\nIf you set `DATAFOG_API_TOKEN`, send it on every request using:\n\n- `Authorization: Bearer \u003ctoken\u003e` header, or\n- `X-API-Key: \u003ctoken\u003e` header.\n\n## Configuration\n\n| Variable | Default | Description |\n|---|---:|---|\n| `DATAFOG_POLICY_PATH` | `config/policy.json` | Policy snapshot loaded at startup |\n| `DATAFOG_RECEIPT_PATH` | `datafog_receipts.jsonl` | Append-only receipts file |\n| `DATAFOG_EVENTS_PATH` | *(unset)* | NDJSON event log for decision events |\n| `DATAFOG_ADDR` | `:8080` | HTTP listen address |\n| `DATAFOG_API_TOKEN` | *(unset)* | Optional API auth token |\n| `DATAFOG_RATE_LIMIT_RPS` | `0` | Global request cap in RPS (`0` disables) |\n| `DATAFOG_READ_TIMEOUT` | `5s` | HTTP read timeout |\n| `DATAFOG_WRITE_TIMEOUT` | `10s` | HTTP write timeout |\n| `DATAFOG_READ_HEADER_TIMEOUT` | `2s` | Request-header parse timeout |\n| `DATAFOG_IDLE_TIMEOUT` | `30s` | Idle keep-alive timeout |\n| `DATAFOG_SHUTDOWN_TIMEOUT` | `10s` | Graceful shutdown timeout |\n| `GOMAXPROCS` | *(runtime default)* | Auto-tuned at startup to detected CPU limit; set explicitly to override |\n| `DATAFOG_PPROF_ADDR` | *(unset)* | If set, starts optional profiling server on this address (example `localhost:6060`) |\n| `DATAFOG_FGPROF` | `false` | Add `/debug/fgprof` endpoint to the profiling server |\n| `DATAFOG_ENABLE_DEMO` | *(unset)* | Enable `/demo*` endpoints |\n| `DATAFOG_DEMO_HTML` | `docs/demo.html` | Path to demo HTML |\n\nDuration values use Go duration syntax, for example `1s`, `500ms`, `2m`.\n\n## API surface\n\nBase URL defaults to `http://localhost:8080`.\n\n| Method | Path | What it does |\n|---|---|---|\n| `GET` | `/health` | Health plus policy identity + start time |\n| `GET` | `/v1/policy/version` | Current policy id/version |\n| `POST` | `/v1/scan` | Run detector set on text |\n| `POST` | `/v1/decide` | Evaluate an action + findings and get a decision |\n| `POST` | `/v1/transform` | Apply requested transform mode(s) |\n| `POST` | `/v1/anonymize` | Apply irreversible anonymization |\n| `GET` | `/v1/receipts/{id}` | Read a decision receipt |\n| `GET` | `/v1/events` | List recent decision events |\n| `GET` | `/metrics` | In-process metrics counters |\n\nOptional demo routes (only when demo mode is enabled):\n\n- `GET /demo`\n- `POST /demo/exec`\n- `POST /demo/write-file`\n- `POST /demo/read-file`\n- `POST /demo/seed`\n- `GET /demo/sandbox`\n\n## Optional profiling endpoints\n\nFor production debugging, set `DATAFOG_PPROF_ADDR` to run an auxiliary profiling server:\n\n- `/debug/pprof/` (standard net/http/pprof handlers: profiles, goroutines, heap, trace)\n- `/debug/fgprof` when `DATAFOG_FGPROF=true` (low-overhead flame graph style profiler)\n\nRecommended values:\n\n- `DATAFOG_PPROF_ADDR=:6060`\n\nThe profiling server is disabled by default and should be exposed only on trusted networks.\n\n## Decisions and idempotency\n\nEndpoints that accept `idempotency_key`:\n\n- `/v1/scan`\n- `/v1/decide`\n- `/v1/transform`\n- `/v1/anonymize`\n\nRepeat requests with the same key and identical payload should return the same body and status.\nIf the same key is reused with a different payload, response is `409` + `idempotency_conflict`.\n\n## Basic examples\n\n### Scan for entities\n\n```sh\ncurl -X POST http://localhost:8080/v1/scan \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"text\":\"alice@example.com - API key: SK8x... and 555-123-4567\"}'\n```\n\n### Decide action\n\n```sh\ncurl -X POST http://localhost:8080/v1/decide \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"action\": {\n      \"type\": \"file.write\",\n      \"resource\": \"notes.txt\"\n    },\n    \"text\": \"customer email is alice@example.com\"\n  }'\n```\n\n### Transform detected sensitive data in text\n\n```sh\ncurl -X POST http://localhost:8080/v1/transform \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"text\": \"customer email is alice@example.com\",\n    \"findings\": [{\"entity_type\":\"email\",\"value\":\"alice@example.com\",\"start\":18,\"end\":34,\"confidence\":0.99}],\n    \"mode\":\"mask\"\n  }'\n```\n\n### Fetch a receipt\n\n```sh\ncurl -s http://localhost:8080/v1/receipts/\u003creceipt-id\u003e | jq .\n```\n\n### Query events (optional)\n\n```sh\ncurl 'http://localhost:8080/v1/events?limit=20\u0026decision=deny'\n```\n\n## Enforcement policy gate (`datafog-shim`)\n\n`datafog-shim` is an optional runtime layer for CLI-style workflows.\nIt sends action details to DataFog (`/v1/decide`) before executing shell/file actions.\n\nBuild it:\n\n```sh\ngo build -o datafog-shim ./cmd/datafog-shim\n```\n\nUse direct shell mode:\n\n```sh\n./datafog-shim --policy-url http://localhost:8080 shell rm -rf /tmp/test\n```\n\nInstall a managed wrapper:\n\n```sh\ndatafog-shim hooks install --target /usr/bin/git git\n```\n\nRoute wrappers through `PATH`:\n\n```sh\nexport PATH=\"$HOME/.datafog/shims:$PATH\"\n```\n\nCommon env vars for the policy gate:\n\n- `DATAFOG_SHIM_POLICY_URL` (required)\n- `DATAFOG_SHIM_API_TOKEN` (required if API token is enabled)\n- `DATAFOG_SHIM_MODE` (`enforced` or `observe`)\n- `DATAFOG_SHIM_EVENT_SINK` (optional NDJSON sink)\n- `DATAFOG_SHIM_ENFORCE_POLICY_ERRORS` (`true` to block on policy service errors even in observe mode)\n\nWhen using `enforced` mode, a blocked action exits non-zero.\nIn `observe` mode, it logs decisions but allows execution to continue.\n\nPolicy gate receipts are logged to `stderr` in a compact format:\n\n```text\nreceipt=\u003cid\u003e decision=\u003callow|transform|allow_with_redaction|deny\u003e\n```\n\n## Policy file behavior and limits\n\n- Policies live in JSON at `DATAFOG_POLICY_PATH`.\n- The policy is loaded on startup only; file edits require restart.\n- A restart is the only reload path for policy changes in this version.\n- Invalid or malformed JSON blocks startup.\n\n`config/policy.json` in this repo is a runnable example with basic allow/deny/redact behavior.\n\n## Limitations and operational notes\n\n- Detection defaults are fast and deterministic, with bounded coverage.\n  - Good for common formats (e.g., email, phone, SSN, API keys, credit cards) and lightweight heuristic NER.\n  - Not a full privacy ML detector.\n- Receipt log and event log are file-based and must be writable.\n- Large volumes of receipts/events need external retention/rotation strategy.\n- `/v1/receipts/{id}` and `/v1/events` are read APIs; there is no policy mutate endpoint.\n\n## Container quick start\n\n```sh\ndocker build -t datafog-api:latest .\n\ndocker run --rm -p 8080:8080 \\\n  -e DATAFOG_API_TOKEN=changeme \\\n  -e DATAFOG_RATE_LIMIT_RPS=50 \\\n  -e DATAFOG_RECEIPT_PATH=/var/lib/datafog/datafog_receipts.jsonl \\\n  -v \"$(pwd)/config:/app/config:ro\" \\\n  -v datafog-receipts:/var/lib/datafog \\\n  datafog-api:latest\n```\n\n## Verify setup end-to-end\n\n```sh\n# health check\ncurl -i http://localhost:8080/health\n\n# decision + receipt loop\nRECEIPT_ID=$(curl -s -X POST http://localhost:8080/v1/decide \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"action\":{\"type\":\"shell.exec\",\"command\":\"git\"},\"text\":\"no pii here\"}' \\\n| jq -r '.receipt_id')\n\ncurl -s http://localhost:8080/v1/receipts/$RECEIPT_ID | jq .\n```\n\nExpected outcome: the first request returns a decision and receipt id; second call should return the saved receipt.\n\n## Kubernetes deployment example\n\n```yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: datafog-api\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app: datafog-api\n  template:\n    metadata:\n      labels:\n        app: datafog-api\n    spec:\n      securityContext:\n        runAsNonRoot: true\n        runAsUser: 65532\n        runAsGroup: 65532\n        fsGroup: 65532\n      containers:\n      - name: datafog-api\n        image: ghcr.io/datafog/datafog-api:v2\n        ports:\n        - containerPort: 8080\n        env:\n        - name: DATAFOG_ADDR\n          value: \":8080\"\n        - name: DATAFOG_POLICY_PATH\n          value: \"/app/config/policy.json\"\n        - name: DATAFOG_RECEIPT_PATH\n          value: \"/var/lib/datafog/datafog_receipts.jsonl\"\n        - name: DATAFOG_EVENTS_PATH\n          value: \"/var/lib/datafog/datafog_events.ndjson\"\n        - name: DATAFOG_RATE_LIMIT_RPS\n          value: \"100\"\n        - name: DATAFOG_SHUTDOWN_TIMEOUT\n          value: \"10s\"\n        volumeMounts:\n        - name: policy\n          mountPath: /app/config\n          readOnly: true\n        - name: receipts\n          mountPath: /var/lib/datafog\n        securityContext:\n          allowPrivilegeEscalation: false\n          readOnlyRootFilesystem: true\n          capabilities:\n            drop: [\"ALL\"]\n      volumes:\n      - name: policy\n        configMap:\n          name: datafog-policy\n      - name: receipts\n        persistentVolumeClaim:\n          claimName: datafog-receipts\n```\n\n## Documentation map\n\n- API contract: `docs/contracts/datafog-api-contract.md`\n- Architecture/module map: `docs/ARCHITECTURE.md`\n- Security and operations:\n  - `docs/SECURITY.md`\n  - `docs/RELIABILITY.md`\n  - `docs/OBSERVABILITY.md`\n  - `docs/DOMAIN_DOCS.md`\n- Design/product context:\n  - `docs/DESIGN.md`\n  - `docs/PRODUCT_SENSE.md`\n\n## If something fails, check these first\n\n1. `go test ./...` (build/runtime validation before changing policy)\n2. `go test -race ./...` (check race conditions on concurrency-sensitive paths)\n3. `/health` response for policy id/version mismatch\n4. Environment variables are set and files are writable\n5. API token/header if `DATAFOG_API_TOKEN` is configured\n6. Policy JSON is valid and rules match expected action fields\n7. Optional benchmark sweep: `scripts/run-benchmarks.sh` (writes `/tmp/bench/benchmark-current.txt`; if `scripts/benchmark-baseline.txt` exists, also writes `/tmp/bench/benchmark-trend.txt` with benchstat deltas)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatafog%2Fdatafog","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatafog%2Fdatafog","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatafog%2Fdatafog/lists"}