{"id":51176148,"url":"https://github.com/raintree-technology/policystrata","last_synced_at":"2026-06-27T04:00:42.240Z","repository":{"id":367420143,"uuid":"1279645599","full_name":"raintree-technology/policystrata","owner":"raintree-technology","description":"Cross-layer policy regression testing for LLM data-agent stacks","archived":false,"fork":false,"pushed_at":"2026-06-25T23:30:38.000Z","size":165,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-26T00:13:09.450Z","etag":null,"topics":["access-control","benchmark","data-agents","data-governance","evals","llm","llm-agents","policy-regression","policy-testing","postgresql","pytest","python","research-artifact","rls","row-level-security","sql","testing","text-to-sql"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/raintree-technology.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-06-24T22:15:06.000Z","updated_at":"2026-06-25T22:36:23.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/raintree-technology/policystrata","commit_stats":null,"previous_names":["raintree-technology/policystrata"],"tags_count":2,"template":true,"template_full_name":null,"purl":"pkg:github/raintree-technology/policystrata","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raintree-technology%2Fpolicystrata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raintree-technology%2Fpolicystrata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raintree-technology%2Fpolicystrata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raintree-technology%2Fpolicystrata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/raintree-technology","download_url":"https://codeload.github.com/raintree-technology/policystrata/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raintree-technology%2Fpolicystrata/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34840899,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-27T02:00:06.362Z","response_time":126,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["access-control","benchmark","data-agents","data-governance","evals","llm","llm-agents","policy-regression","policy-testing","postgresql","pytest","python","research-artifact","rls","row-level-security","sql","testing","text-to-sql"],"created_at":"2026-06-27T04:00:41.743Z","updated_at":"2026-06-27T04:00:42.234Z","avatar_url":"https://github.com/raintree-technology.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PolicyStrata\n\nPolicyStrata is a deterministic regression-testing framework for cross-layer policy drift in LLM\ndata-agent stacks.\n\nIt generates principals, requests, semantic plans, database states, lowered queries, and release\ndecisions; compares each layer against a canonical reference policy; and minimizes failures into\nsmall reproducible witnesses.\n\nUse it when you are building text-to-SQL agents, BI copilots, internal analytics agents, warehouse\nchat systems, or governed enterprise LLM tools and need to know whether prompts, manifests,\nsemantic plans, validators, SQL compilers, database controls, and output filters still agree about\npolicy.\n\nPolicyStrata is not an authorization boundary, and it is not another generic text-to-SQL benchmark.\nIt is a reproducible research artifact and regression gate for finding reachable disagreements\nbetween layers.\n\n## Paper And Artifact\n\n- Paper PDF: \u003chttps://raintree.technology/papers/PolicyStrata.pdf\u003e\n- Release note: \u003chttps://raintree.technology/blog/policystrata-release\u003e\n- GitHub release: \u003chttps://github.com/raintree-technology/policystrata/releases/tag/policystrata-paper-2026-06-26\u003e\n- Artifact zip: \u003chttps://github.com/raintree-technology/policystrata/releases/download/policystrata-paper-2026-06-26/policystrata-submission-kit-2026-06-26.zip\u003e\n- Website mirror: \u003chttps://raintree.technology/artifacts/policystrata/policystrata-submission-kit-2026-06-26.zip\u003e\n\nSHA256:\n\n```text\n9a4da81d78c37fd81e9ab6b36f094756e61e6b88cfcd74dcab51cfdd8e5bbcd9  PolicyStrata.pdf\n961778c1e8affc04f76e27ce7958572d904e69d7701494e0047e4f7bffbc466d  policystrata-submission-kit-2026-06-26.zip\n```\n\nReproduce the paper-facing artifact run:\n\n```bash\nPOLICYSTRATA_RUN_ROOT=/tmp/policystrata-final ./scripts/reproduce-final.sh\n```\n\nThe paper reports deterministic artifact-suite coverage: 1720/1720 non-clean injected cases and\n0 false positives on 80 clean controls. This is not a production-recall claim and not an\nauthorization boundary.\n\n## Quick Start\n\nFrom PyPI:\n\n```bash\nuvx policystrata demo\npipx run policystrata demo\n```\n\nFrom a source checkout:\n\n```bash\nuv sync --extra dev\nuv run policystrata demo\n```\n\nThe demo runs the built-in `support_saas` fixture, writes traces and minimized witnesses to\n`runs/demo`, and prints the drift classes it found. Use `--out` to choose another output directory:\n\n```bash\nuv run policystrata demo --out runs/demo\n```\n\nNo LLM API key is required for deterministic tests, benchmark runs, or the built-in demo.\n\n## Install\n\nPolicyStrata is a CLI-first Python package. The public package provides the `policystrata` console\nscript and importable Python modules.\n\n```bash\npython -m pip install policystrata\npolicystrata demo\n```\n\nFor one-off CLI use without managing an environment:\n\n```bash\nuvx policystrata demo\npipx run policystrata demo\n```\n\nRepository examples under `examples/`, Docker Compose fixtures, and evidence scripts are available\nfrom a GitHub checkout or source distribution. The wheel installs the runtime package, built-in\ndomain fixtures, and packaged scanner examples reachable through `policystrata init-scan`.\n\n## Use As A Template\n\nClick **Use this template** on GitHub, then start with the deterministic fixtures:\n\n```bash\nuv sync --extra dev\nuv run policystrata run --domain support_saas --suite seeded --out runs/example\nuv run policystrata summarize runs/example\n```\n\nTo copy a built-in domain fixture into your tree:\n\n```bash\nuv run policystrata init-domain support_saas --out examples/my-policystrata-domain\n```\n\nKeep custom integrations as adapters. The policy oracle should stay independent from SQL compiler\nbehavior, external eval frameworks, and model-provider behavior.\n\n## What It Tests\n\nThe core failure class is cross-layer policy drift:\n\n```text\nCanonical policy:\n  Analysts may view tenant-scoped aggregate ticket counts, but not customer-level PII.\n\nModel-visible manifest or grammar:\n  Accidentally exposes customer_email as a dimension.\n\nSQL compiler:\n  Accidentally drops the tenant predicate while lowering an authorized aggregate.\n\nOutput layer:\n  Releases the result because the final answer looks like a summary.\n\nPolicyStrata result:\n  A minimized witness localizes the violated layer and failed obligation.\n```\n\nPolicyStrata does not assume every layer should behave identically. Each surface has a declared\nresponsibility:\n\n- `manifest`: expose model-visible capabilities without stale or forbidden options.\n- `grammar`: parse the declared intent space and preserve untrusted intent for validation.\n- `validator`: authorize semantic queries and bind principal, tenant, time, and budget obligations.\n- `compiler`: lower authorized semantic IR into SQL while preserving metric, tenant, time, and row\n  obligations.\n- `database`: contain row access with RLS and other database-side controls.\n- `release`: withhold contained or unauthorized results.\n\nSee [docs/failure-taxonomy.md](docs/failure-taxonomy.md) for how witness classes map to concrete\npolicy-drift failures.\n\n## Run Benchmarks\n\nPolicyStrata ships with deterministic `support_saas`, `finance_saas`, and\n`analytics_clickhouse` benchmarks, generated mutation suites, held-out suite support, clean\ncontrols, minimized witnesses, JSONL traces, baseline comparisons, and evidence tables.\n\n```bash\nuv run policystrata run --domain support_saas --suite seeded --out runs/example\nuv run policystrata run \\\n  --domain support_saas \\\n  --suite generated \\\n  --count 500 \\\n  --seed 1729 \\\n  --out runs/generated\nuv run policystrata run --domain finance_saas --suite seeded --out runs/finance\nuv run policystrata freeze-benchmark --domain support_saas --suite heldout_v1 --count 500 --seed 260626 --out runs/freeze/support-heldout-v1.json\nuv run policystrata run --domain support_saas --suite heldout_v1 --count 500 --seed 260626 --freeze-manifest runs/freeze/support-heldout-v1.json --out runs/support-heldout-v1\nuv run policystrata baselines runs/example runs/support-heldout-v1\nuv run policystrata ablations runs/example runs/support-heldout-v1\n```\n\nThe default `run` command writes:\n\n```text\nruns/\u003cid\u003e/traces.jsonl\nruns/\u003cid\u003e/summary.json\nruns/\u003cid\u003e/metadata.json\nruns/\u003cid\u003e/benchmark_manifest.json  # for frozen runs\nruns/\u003cid\u003e/witnesses/*.json\n```\n\n`metadata.json` records the mutation operator set, suite provenance, evidence level, and\ndetector-freeze status. Frozen runs verify the manifest before writing traces. Static suite YAML can\ndeclare `suite_metadata` so externally authored, detector-frozen, or incident-reconstruction cases\nstay separate from public/generated benchmark scores.\n\nRegenerate paper-style evidence tables with:\n\n```bash\nscripts/reproduce-evidence.sh\nscripts/reproduce-final.sh\n```\n\nGenerate reviewer-facing artifact metrics for a run:\n\n```bash\nuv run policystrata artifact-report runs/repro/seeded\n```\n\nCurrent benchmark details are in [docs/evidence.md](docs/evidence.md), with methodology and claim\nboundaries in [docs/methodology.md](docs/methodology.md) and [EVAL_CARD.md](EVAL_CARD.md).\n\n## Run The Scanner\n\n`policystrata scan` is the production-oriented path. It treats PolicyStrata as a scanner and\nrelease gate, not as the authorization boundary.\n\nCreate a scanner scaffold for an application:\n\n```bash\nuv run policystrata init-scan --out policystrata\nuv run policystrata scan --config policystrata/policystrata.yaml --out runs/policystrata-smoke\n```\n\nThe scaffold writes `policystrata.yaml`, `domain/policy.yaml`, `domain/surfaces.yaml`, and\n`traces.example.jsonl`. Replace the example trace with exported SQL/tool-call traces from your app.\nUse `--source-domain finance_saas` to scaffold the finance policy and a matching finance trace\ninstead of the default support SaaS example.\n\nCopy a packaged Postgres/dbt scanner example from an installed wheel:\n\n```bash\nuvx policystrata init-scan postgres_dbt --out policystrata-example\nuvx policystrata scan --config policystrata-example/policystrata_clean.yaml --out runs/scan-clean\n```\n\n`policystrata doctor` audits only the config you pass. In the copied `postgres_dbt` example,\n`policystrata_clean.yaml` is a minimal clean smoke config, so doctor reports database and dbt\nwiring as missing. Use `policystrata_real_db_clean.yaml` for the DB/RLS readiness audit, and\ncombine the dbt and database sections in your own app config when one strict readiness gate should\ncover both.\n\nClean smoke test:\n\n```bash\nuv run policystrata scan --config examples/postgres_dbt/policystrata_clean.yaml --out runs/scan-clean\n```\n\nIntentional gate-failure fixture:\n\n```bash\nuv run policystrata scan --config examples/postgres_dbt/policystrata.yaml --out runs/scan\n```\n\nThat fixture should exit `1` because it contains imported traces with known authorization,\nunsafe-release, and tenant-scope findings.\n\nScanner outputs include:\n\n```text\nruns/scan-clean/scan.json\nruns/scan-clean/findings.jsonl\nruns/scan-clean/summary.json\nruns/scan-clean/report.md\nruns/scan-clean/witnesses/*.json\nruns/scan-clean/scan.sarif  # when sarif: true\n```\n\nFor a scanner run that also executes imported SQL beside canonical compiler SQL against the\nDocker/PostgreSQL fixture:\n\n```bash\ndocker compose up -d postgres\nuv run policystrata scan --config examples/postgres_dbt/policystrata_real_db_clean.yaml --out runs/scan-real-db-clean\n```\n\nPostgres access goes through Python/`psycopg`; host `psql` is not required. See\n[docs/scanner.md](docs/scanner.md) for scanner configuration, gate behavior, tenancy config,\nremediation fields, state assertions, and real-database fixture details. See\n[docs/trace-contract.md](docs/trace-contract.md), [docs/trace-adapters.md](docs/trace-adapters.md),\nand [docs/testing-ai-data-assistant.md](docs/testing-ai-data-assistant.md) for imported-trace\ncontracts and framework recipes.\n\n## Audit Stack Wiring\n\n`policystrata doctor` without a config still checks local reproducibility dependencies. With a\nscanner config, it audits what a deployment has wired and what is missing:\n\n```bash\nuv run policystrata doctor --format markdown\nuv run policystrata doctor --config examples/postgres_dbt/policystrata_real_db_clean.yaml \\\n  --format markdown --out runs/doctor.md\n```\n\nThe audit inventories policy/domain YAML, surface contracts, SQL traces, dbt semantic models,\nPostgreSQL schema/RLS/grant/view/index metadata, privacy policies, terms of service, DPA/security/\nretention/internal-policy documents, prompt/tool manifests, source maps, release coverage, and CI\ngates. It classifies policy-document obligation signals, compares JSON/YAML prompt manifests\nagainst the canonical policy for stale exposed metrics or dimensions, and emits remediation todos\nwith owners, files, expected tests, and gate commands. Use `--strict` when missing, partial, or\ninvalid wiring should fail CI.\n\n## GitHub Action\n\nUse the first-party action to run `policystrata scan` as a pull-request or release policy-drift\ngate. The example pins the current release tag:\n\n```yaml\nname: PolicyStrata\n\non:\n  pull_request:\n  push:\n    branches: [main]\n\njobs:\n  scan:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n\n      - uses: raintree-technology/policystrata@v1.0.0\n        with:\n          config: policystrata.yaml\n          out: runs/policystrata\n\n      - name: Implementation readiness gate\n        if: always()\n        run: policystrata doctor --config policystrata.yaml --strict\n```\n\nIn CI, keep both gates: `policystrata scan` catches high-confidence policy drift, while\n`policystrata doctor --strict` catches missing, partial, or invalid stack wiring before release.\n\nSee [docs/github-action.md](docs/github-action.md) for inputs, artifact upload, and database\nfixture guidance.\n\n## Integrations And Exports\n\nPolicyStrata keeps core execution independent from external eval frameworks. Adapter exports are\navailable for downstream systems:\n\n```bash\nuv run policystrata export runs/example --format inspect --out runs/example/inspect.jsonl\nuv run policystrata export runs/example --format benchflow --out runs/example/benchflow.json\n```\n\nThe repo also includes a small dbt Semantic Layer adapter and fixture:\n\n```bash\nuv run policystrata check-integration dbt-semantic \\\n  --domain finance_saas \\\n  --path examples/integrations/dbt_semantic/finance_saas/semantic_models.yml\n```\n\nSee [docs/trace-interop.md](docs/trace-interop.md) for adapter field mappings.\n\n## TypeScript / Node SDK\n\nThe repository includes a first-party TypeScript recorder under `packages/node` for Next.js,\nDrizzle, and other Node agent stacks:\n\n```ts\nimport { createPolicyStrataRecorder } from \"policystrata/node\";\n\nconst recorder = createPolicyStrataRecorder({\n  service: \"betteroff-ask-ai\",\n  out: \".policystrata/traces.jsonl\",\n  tenancy: {\n    tenantColumns: [\"transactions.household_id\", \"accounts.household_id\"],\n  },\n});\n```\n\n`wrapTool()` records sanitized tool executions, `captureQuery()` captures Drizzle `.toSQL()` output\nwhen available, and read-tool SQL records can be scanned with `policystrata scan`.\n\n## Reference Docs\n\n- [docs/benchmark-reference.md](docs/benchmark-reference.md): domains, generated mutants,\n  baselines, and witness shape.\n- [docs/scanner.md](docs/scanner.md): scanner inputs, gates, state assertions, and PostgreSQL\n  fixture use.\n- [docs/github-action.md](docs/github-action.md): CI wrapper for `policystrata scan`.\n- [docs/distribution-roadmap.md](docs/distribution-roadmap.md): CLI, GitHub Action, SDK, MCP, and\n  GitHub CLI extension sequence.\n- [docs/evidence.md](docs/evidence.md): current evidence snapshot and reproduction commands.\n- [docs/methodology.md](docs/methodology.md): claims, limitations, mutant definitions, and witness\n  minimization.\n- [EVAL_CARD.md](EVAL_CARD.md): benchmark provenance, evidence levels, and eval boundaries.\n- [docs/open-source-commercial-strategy.md](docs/open-source-commercial-strategy.md): packaging and\n  product boundary.\n\n## Development\n\n```bash\nuv run pytest\nuv run ruff check .\nuv run mypy src\n```\n\nThe built-in `support_saas` domain is deterministic and seed-driven. Preserve JSON/YAML trace\nstability when extending artifacts; add fields compatibly.\n\n## Status\n\nPolicyStrata is an early research artifact. It is useful for reproducing the paper's core failure\nmodel and for building regression gates around real stacks. It does not prove recall on unknown\nproduction incidents, and it should not be represented as a production security scanner by itself.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraintree-technology%2Fpolicystrata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fraintree-technology%2Fpolicystrata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraintree-technology%2Fpolicystrata/lists"}