{"id":47900512,"url":"https://github.com/scttfrdmn/quick-suite-claws","last_synced_at":"2026-04-04T04:01:55.671Z","repository":{"id":348439436,"uuid":"1198086493","full_name":"scttfrdmn/quick-suite-claws","owner":"scttfrdmn","description":"Policy-gated data excavation for AI agents on Amazon Bedrock AgentCore. Cedar + Bedrock Guardrails enforce cost limits, PII protection, and read-only access across Athena, OpenSearch, S3, and MCP. Apache 2.0 CDK reference architecture.","archived":false,"fork":false,"pushed_at":"2026-04-02T04:58:49.000Z","size":337,"stargazers_count":0,"open_issues_count":16,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-02T16:50:56.006Z","etag":null,"topics":["ai-agents","amazon-athena","amazon-bedrock","aws-cdk","aws-lambda","bedrock-agentcore","bedrock-guardrails","cedar-policy","data-governance","generative-ai","llm-safety","mcp","opensearch","pii-protection","python","reference-architecture","serverless","tool-use"],"latest_commit_sha":null,"homepage":"https://github.com/scttfrdmn/claws/blob/main/docs/architecture.md","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scttfrdmn.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-01T05:35:28.000Z","updated_at":"2026-04-02T04:58:53.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/scttfrdmn/quick-suite-claws","commit_stats":null,"previous_names":["scttfrdmn/claws","scttfrdmn/quick-suite-claws"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/scttfrdmn/quick-suite-claws","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scttfrdmn%2Fquick-suite-claws","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scttfrdmn%2Fquick-suite-claws/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scttfrdmn%2Fquick-suite-claws/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scttfrdmn%2Fquick-suite-claws/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scttfrdmn","download_url":"https://codeload.github.com/scttfrdmn/quick-suite-claws/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scttfrdmn%2Fquick-suite-claws/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31387024,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T01:22:39.193Z","status":"online","status_checked_at":"2026-04-04T02:00:07.569Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","amazon-athena","amazon-bedrock","aws-cdk","aws-lambda","bedrock-agentcore","bedrock-guardrails","cedar-policy","data-governance","generative-ai","llm-safety","mcp","opensearch","pii-protection","python","reference-architecture","serverless","tool-use"],"created_at":"2026-04-04T04:01:51.962Z","updated_at":"2026-04-04T04:01:55.665Z","avatar_url":"https://github.com/scttfrdmn.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# clAWS — Controlled Excavation Tools for Agents\n\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)\n[![Python](https://img.shields.io/badge/python-3.12%2B-blue.svg)](https://www.python.org/downloads/)\n[![Platform](https://img.shields.io/badge/platform-Amazon%20Bedrock%20AgentCore-FF9900.svg?logo=amazon-aws\u0026logoColor=white)](https://aws.amazon.com/bedrock/)\n[![Policy](https://img.shields.io/badge/policy-Cedar-green.svg)](https://www.cedarpolicy.com/)\n[![CDK](https://img.shields.io/badge/infra-CDK%20v2-FF9900.svg?logo=amazon-aws\u0026logoColor=white)](https://docs.aws.amazon.com/cdk/v2/guide/home.html)\n\n**Safe, auditable, policy-gated data queries for AI agents — without opening your databases to arbitrary SQL.**\n\nWhen an AI agent can generate and execute arbitrary SQL against a production database,\na few things go wrong very quickly. Query costs are unbounded — a careless `SELECT *`\nagainst a 2-billion-row Athena table runs up hundreds of dollars in seconds. PII leaks\nare unchecked — a query that joins student records with grades and addresses might surface\na combination of fields that no individual query was supposed to expose. And there's no\naudit trail — if a compliance officer asks \"what did the AI query, and why was it\nallowed?\", the answer is a Lambda execution log, not a governed record.\n\nclAWS is a deployable tool plane that goes between any agent and your data stores —\nAthena tables, OpenSearch indices, S3 files — and enforces structured access policies,\ncost limits, and content safety on every query. The agent proposes what it wants in plain\nlanguage. clAWS translates that into a reviewable, cost-estimated query plan. Cedar\npolicies gate whether that plan is permitted. Bedrock Guardrails scan what comes back.\nOnly then does anything reach the agent.\n\nUniversities use this when the question involves restricted data: financial aid records,\nstudent PII, research databases with IRB constraints, or sponsored program financials.\nThe answer the analyst or agent gets is correct and governed — and there's a complete\nrecord of every step for compliance review.\n\n## What clAWS Is and Isn't\n\n| clAWS is... | clAWS is not... |\n|-------------|-----------------|\n| A deployable CDK application on real AWS | A managed service or hosted product |\n| A policy-gated tool plane an agent calls | An agent framework — reasoning happens outside |\n| Safe access to Athena, OpenSearch, and S3 | A query builder, BI tool, or SQL IDE |\n| A reference architecture for governed data access | A product with a support contract |\n| Open source under Apache 2.0 | Specific to Quick Suite — any AgentCore agent can use it |\n\n## The Core Safety Principle\n\n**LLM reasoning never happens inside a tool.** The `plan` tool is the only tool that\naccepts free-text input. It returns a concrete, reviewable execution plan: the actual\nSQL, a cost estimate, and the output schema. The `excavate` tool takes that plan verbatim\nand runs it. Cedar policies gate the concrete query, not the intent.\n\nThis means the query that runs is always the query that was approved. A separate SQL\nstring cannot be substituted after Cedar validates the plan — `excavate` verifies the\nsubmitted query matches the stored plan byte for byte before executing anything.\n\n```\nAgent (reasons here)                  clAWS tool plane (executes here)\n┌──────────────────────────┐          ┌────────────────────────────────┐\n│ \"Find BRCA1 pathogenic    │          │                                │\n│  variants by cohort\"      │──plan──▶ │ Returns SQL + cost + schema    │\n│                           │          │ Cedar validates: permitted?    │\n│  Reviews plan ────────────┼──────────│▶                               │\n│                           │──exec──▶ │ Runs the exact approved query  │\n│  Receives results ◀───────│──────────│ Guardrails scan the output     │\n└──────────────────────────┘          └────────────────────────────────┘\n        │                                          │\n        └──────── Cedar + Bedrock Guardrails ───────┘\n                  enforced at both boundaries\n```\n\n## The Tool Pipeline\n\n**Tool Lambdas (AgentCore targets):**\n\n| Tool | What it does |\n|------|-------------|\n| `discover` | Find data sources in approved domains (Glue catalog, OpenSearch, S3, source registry) |\n| `probe` | Inspect schema, sample rows, and cost estimates; ApplyGuardrail scans samples for PHI |\n| `plan` | Translate a free-text objective into a concrete query — the only tool with free-text input |\n| `excavate` | Execute the exact query from the plan; results scanned by ApplyGuardrail |\n| `refine` | Deduplicate, rank, and summarize results with a grounding guardrail |\n| `export` | Write results to S3, EventBridge, or Quick Sight with a provenance chain |\n| `team_plans` | List all plans for a team_id (read-only summaries) |\n| `share_plan` | Grant or revoke another principal's access to a plan |\n| `watch` | Create, update, or delete a scheduled watch on a locked plan |\n| `watches` | List active watches and their last-run status |\n\n**Internal Lambdas (not AgentCore targets):**\n\n| Lambda | What it does |\n|--------|-------------|\n| `approve_plan` | IRB reviewer approves a `pending_approval` plan; validates approver allowlist; blocks self-approval |\n| `audit_export` | Exports CloudWatch audit records to NDJSON in S3 with SHA-256-hashed I/O fields |\n| `claws-watch-runner` | Scheduled Lambda invoked by EventBridge Scheduler; executes locked plans without LLM involvement |\n\nA typical session through the pipeline:\n\n*\"Which financial aid records for the 2024 cohort have missing FAFSA completion dates,\nbroken down by demographic category?\"*\n\n1. `discover` — finds the financial aid Athena table in the `institutional` domain\n2. `probe` — previews the schema; GuardRail scans for SSN exposure in samples\n3. `plan` — translates the question into SQL; Cedar confirms the financial aid team can run this query\n4. `excavate` — runs the query; Guardrails scan the row-level output\n5. `refine` — produces a clean, deduplicated summary by category\n6. `export` — writes to S3 with a `.provenance.json` sidecar recording the full chain\n\nThe compliance office gets the export plus a documented record of what was queried, by\nwhom, under which Cedar policy, and when — automatically, without anyone building a\nseparate audit workflow.\n\n## Two Independent Safety Layers\n\n**Cedar (structural, deterministic)** — evaluated at the AgentCore Gateway boundary\nbefore any Lambda runs. Cedar policies express rules like:\n- \"The IR team can query enrollment tables but not the SSN column\"\n- \"The financial aid office can aggregate, but not export row-level student records\"\n- \"Any query must have `read_only: true` and `max_cost_dollars` ≤ 10\"\n\nCedar either allows or denies — no probabilistic judgment. If a policy denies, the\npipeline stops before a single Athena byte is scanned.\n\n**Bedrock Guardrails (semantic, ML-based)** — applied at LLM I/O (the `plan` tool) and\nvia the `ApplyGuardrail` API directly on data (probe samples, excavation results, export\npayloads). Catches things Cedar can't: a query result that technically passes the column\nallowlist but whose combination of fields reconstructs PII, or a result summary that\ncontains injection-style content from the data itself.\n\nAn attacker who wanted to bypass both layers would need to simultaneously fool a\ndeterministic policy engine and a content safety model. In practice, the two layers\ncatch entirely different threat classes.\n\n## Compliance Features\n\nclAWS is designed for regulated research and institutional data environments.\n\n**IRB Workflow** — When a `plan` call includes `requires_irb: true`, the plan status is set\nto `pending_approval` instead of `ready`. The `excavate` tool blocks execution until an\nauthorized IRB reviewer approves the plan via the `approve_plan` internal Lambda. Reviewers\nare configured via the `CLAWS_IRB_APPROVERS` environment variable. Self-approval is blocked.\n\n**FERPA Guardrail Preset** — Deploy `guardrails/ferpa/ferpa_guardrail.json` by setting\n`enable_ferpa_guardrail: true` in CDK context. The preset blocks five denied topic categories\n(student PII export, FERPA evasion attempts, grade disclosure, directory waiver bypass, and\nbulk education record extraction) and blocks on SSN and student ID regex patterns.\n\n**Cedar Policy Templates** — Four pre-built templates in `policies/templates/`:\n- `read-only.cedar` — metadata-only access; no excavate or export\n- `no-pii-export.cedar` — allows excavation but forbids export when data is classified as PII\n- `approved-domains-only.cedar` — locks principals to a pre-approved domain list\n- `phi-approved.cedar` — PHI data access with clearance level ≥ 3, IRB approval, and HITL token\n\n**Compliance Audit Export** — The `audit_export` internal Lambda scans CloudWatch Logs and\nwrites NDJSON audit records to `s3://claws-runs-{account}/audit-exports/`. All input and\noutput fields are SHA-256-hashed — no raw data or PII appears in the export. Fields include\n`principal`, `tool`, `inputs_hash`, `outputs_hash`, `cost_usd`, `guardrail_trace`, `timestamp`.\n\nSee [docs/compliance.md](docs/compliance.md) for the full compliance deployment guide.\n\n## Requirements\n\n- Python 3.12+\n- [uv](https://docs.astral.sh/uv/) (package manager)\n- Node.js 18+ and AWS CDK v2: `npm install -g aws-cdk`\n- AWS account with Bedrock model access enabled for your region\n- AWS credentials configured (`aws configure` or an IAM role)\n\n## Quick Start\n\n```bash\ngit clone https://github.com/scttfrdmn/claws.git\ncd claws\nuv sync --extra dev --extra cdk\n\ncd infra/cdk\ncdk deploy --all\n```\n\nCDK deploys five stacks in dependency order:\n\n| Stack | What it creates |\n|-------|----------------|\n| `ClawsStorageStack` | S3 buckets, DynamoDB tables (plans, schemas, lookup) |\n| `ClawsGuardrailsStack` | Bedrock Guardrail with content filters, PII detection, injection blocking |\n| `ClawsToolsStack` | Six Lambda functions, shared IAM role, Athena workgroup |\n| `ClawsGatewayStack` | AgentCore Gateway with one Lambda target per tool |\n| `ClawsPolicyStack` | Cedar policy deployment and gateway association |\n\nDeployment takes 5–10 minutes. Save the Gateway ID from the outputs — you'll need it\nif deploying other Quick Suite extensions that share this gateway.\n\n## Capstone Deployment (Shared Gateway)\n\nWhen deploying alongside the other Quick Suite extensions, clAWS can attach to an\nexisting shared AgentCore Gateway rather than creating its own:\n\n```bash\ncdk deploy --all -c CLAWS_GATEWAY_ID=agr-abc123\n```\n\nGet the Gateway ID from the Router stack's CloudFormation outputs:\n\n```bash\naws cloudformation describe-stacks \\\n  --stack-name QuickSuiteRouterStack \\\n  --query 'Stacks[0].Outputs[?OutputKey==`GatewayId`].OutputValue' \\\n  --output text\n```\n\n## Adding Your Data Sources\n\nAfter deploying, tag your Glue tables so `discover` can find them:\n\n```bash\naws glue tag-resource \\\n  --resource-arn arn:aws:glue:us-east-1:123456789012:table/mydb/mytable \\\n  --tags-to-add \"claws:space=your-space-name\"\n```\n\nThen add the space to the principal's `approved_spaces` list in your Cedar policy:\n\n```cedar\npermit(\n  principal in Group::\"institutional-research\",\n  action == Action::\"excavate\",\n  resource\n) when {\n  context.source.space in [\"enrollment\", \"your-space-name\"]\n};\n```\n\nSee [docs/user-guide.md](docs/user-guide.md) for the full Cedar policy authoring guide.\n\n## Development and Testing\n\n```bash\n# Run the full test suite (247 tests, no AWS credentials required)\nuv run pytest tools/ -v\n\n# Lint and format\nuv run ruff check tools/\nuv run ruff format tools/\n\n# Type check\nuv run mypy tools/\n```\n\nFor live integration tests against real AWS resources (manual, pre-release):\n\n```bash\nexport CLAWS_TEST_ATHENA_DB=your_db\nexport CLAWS_TEST_ATHENA_TABLE=your_table\nexport CLAWS_TEST_ATHENA_OUTPUT=s3://your-bucket/results/\nexport CLAWS_TEST_RUNS_BUCKET=your-claws-runs-bucket\nuv run pytest tools/tests/live/ -v -m live\n```\n\n## Cost\n\nclAWS itself has minimal infrastructure cost — under $3/month at idle. The meaningful\ncost is Athena query charges ($5 per TB scanned), which Cedar cost limits and partition\npruning keep in check. A well-written Cedar policy with `max_cost_dollars: 1.00` ensures\nno single query can spend more than a dollar regardless of how large the table is.\n\n## Documentation\n\n| Doc | What it covers |\n|-----|---------------|\n| [docs/getting-started.md](docs/getting-started.md) | Deploy and run your first excavation, step by step |\n| [docs/user-guide.md](docs/user-guide.md) | Tool reference, Cedar policy authoring, guardrail customization, team and IRB workflows |\n| [docs/compliance.md](docs/compliance.md) | IRB workflow, FERPA preset, Cedar policy templates, audit export |\n| [docs/architecture.md](docs/architecture.md) | CDK stacks, storage layout, executor details |\n| [docs/safety-model.md](docs/safety-model.md) | Cedar vs Guardrails — the threat model and attachment points |\n| [docs/mcp-integration.md](docs/mcp-integration.md) | MCP server registry, transport options, pipeline walkthrough |\n| [docs/capstone-deployment.md](docs/capstone-deployment.md) | Standalone vs shared-Gateway (Capstone) deployment |\n| [docs/quick-suite-integration.md](docs/quick-suite-integration.md) | Quick Suite operator surface: Flows, Automate, dashboards |\n\n## Examples\n\n| Example | Data source | Scenario |\n|---------|-------------|---------|\n| [genomics-excavation](examples/genomics-excavation/) | Athena | BRCA1 pathogenic variants by cohort |\n| [log-analysis](examples/log-analysis/) | OpenSearch | Top error patterns across microservices |\n| [document-mining](examples/document-mining/) | S3 Select / Parquet | Indemnification clauses with uncapped liability |\n\n## Contributing\n\nContributions welcome — see [CONTRIBUTING.md](CONTRIBUTING.md) for branch conventions,\ncommit style, and the PR process.\n\n## License\n\nApache-2.0 — Copyright 2026 Scott Friedman\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscttfrdmn%2Fquick-suite-claws","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscttfrdmn%2Fquick-suite-claws","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscttfrdmn%2Fquick-suite-claws/lists"}