{"id":47705087,"url":"https://github.com/flyersworder/agentic-data-contracts","last_synced_at":"2026-04-24T21:00:33.100Z","repository":{"id":347403334,"uuid":"1193744522","full_name":"flyersworder/agentic-data-contracts","owner":"flyersworder","description":"YAML-first, domain-driven data governance for AI agents — teach agents your business domains, metrics, and rules before they write SQL","archived":false,"fork":false,"pushed_at":"2026-04-17T18:04:24.000Z","size":759,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-17T19:36:10.621Z","etag":null,"topics":["agent-sdk","ai-agents","analytics","claude","data-contracts","data-engineering","data-governance","dbt","domain-driven","llm","pydantic","python","semantic-layer","sql-validation","sqlglot","yaml"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/agentic-data-contracts/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/flyersworder.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-27T14:36:59.000Z","updated_at":"2026-04-17T18:04:28.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/flyersworder/agentic-data-contracts","commit_stats":null,"previous_names":["flyersworder/agentic-data-contracts"],"tags_count":20,"template":false,"template_full_name":null,"purl":"pkg:github/flyersworder/agentic-data-contracts","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flyersworder%2Fagentic-data-contracts","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flyersworder%2Fagentic-data-contracts/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flyersworder%2Fagentic-data-contracts/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flyersworder%2Fagentic-data-contracts/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/flyersworder","download_url":"https://codeload.github.com/flyersworder/agentic-data-contracts/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flyersworder%2Fagentic-data-contracts/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32240613,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-24T13:21:15.438Z","status":"ssl_error","status_checked_at":"2026-04-24T13:21:15.005Z","response_time":64,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-sdk","ai-agents","analytics","claude","data-contracts","data-engineering","data-governance","dbt","domain-driven","llm","pydantic","python","semantic-layer","sql-validation","sqlglot","yaml"],"created_at":"2026-04-02T17:53:11.080Z","updated_at":"2026-04-24T21:00:33.093Z","avatar_url":"https://github.com/flyersworder.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# agentic-data-contracts\n\n[![PyPI version](https://img.shields.io/pypi/v/agentic-data-contracts.svg)](https://pypi.org/project/agentic-data-contracts/)\n[![CI](https://github.com/flyersworder/agentic-data-contracts/actions/workflows/ci.yml/badge.svg)](https://github.com/flyersworder/agentic-data-contracts/actions/workflows/ci.yml)\n[![Python 3.12+](https://img.shields.io/badge/python-3.12%2B-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n**YAML-first, domain-driven data governance for AI agents.**\n\n`agentic-data-contracts` takes a domain-driven approach to AI agent governance: instead of letting agents figure out your data landscape by trial and error, you teach them your business domains, metrics, and rules upfront — in YAML. The agent starts by understanding *what* a business domain means, then discovers *which* metrics to use, then builds queries that comply with your governance rules. All enforced automatically at query time via SQL validation powered by [sqlglot](https://github.com/tobymao/sqlglot).\n\n**Why domain-driven?** AI agents querying databases face three problems: **resource runaway** (unbounded compute, endless retries, cost overruns), **semantic inconsistency** (wrong tables, missing filters, ad-hoc metric definitions), and **lack of business context** (the agent doesn't know what \"revenue\" means in *your* company). This library addresses all three with a single YAML contract that combines governance rules with business domain knowledge.\n\n**Works with:** [Claude Agent SDK](https://github.com/anthropics/claude-agent-sdk-python) (primary target), or any Python agent framework. Optionally integrates with [ai-agent-contracts](https://pypi.org/project/ai-agent-contracts/) for formal resource governance.\n\n## How It Works\n\nThe agent follows a domain-driven workflow — understanding business context before writing SQL:\n\n```\n1. Agent receives: \"How is revenue trending?\"\n2. lookup_domain(\"revenue\")     → \"Revenue is recognized at fulfillment, not booking\"\n3. lookup_metric(\"total_revenue\") → SUM(amount) FILTER (WHERE status = 'completed')\n4. Agent writes SQL using the metric definition\n5. inspect_query(sql)           → {\"valid\": true, \"estimated_cost_usd\": 0.0, ...}\n6. run_query(sql)               → results returned\n```\n\nGovernance rules are enforced automatically at query time:\n\n```\nAgent: \"SELECT * FROM analytics.orders\"\n  -\u003e BLOCKED (no SELECT * — specify explicit columns)\n\nAgent: \"SELECT order_id, amount FROM analytics.orders\"\n  -\u003e BLOCKED (missing required filter: tenant_id)\n\nAgent: \"SELECT order_id, amount FROM analytics.orders WHERE tenant_id = 'acme'\"\n  -\u003e PASSED + WARN (consider using semantic revenue definition)\n```\n\nThe contract defines the domains, metrics, and rules. The library enforces them — before the query ever reaches the database.\n\n## Installation\n\n```bash\nuv add agentic-data-contracts\n# or\npip install agentic-data-contracts\n```\n\nWith optional database adapters:\n\n```bash\nuv add \"agentic-data-contracts[duckdb]\"      # DuckDB\nuv add \"agentic-data-contracts[bigquery]\"    # BigQuery\nuv add \"agentic-data-contracts[snowflake]\"   # Snowflake\nuv add \"agentic-data-contracts[postgres]\"    # PostgreSQL\nuv add \"agentic-data-contracts[agent-sdk]\"   # Claude Agent SDK integration\n```\n\n## Quick Start\n\n### 1. Write a YAML contract\n\n```yaml\n# contract.yml\nversion: \"1.0\"\nname: revenue-analysis\n\nsemantic:\n  source:\n    type: yaml\n    path: \"./semantic.yml\"\n  allowed_tables:\n    - schema: analytics\n      description: \"Curated analytics tables — prefer for reporting\"\n      preferred: true\n      tables: [\"*\"]          # all tables in schema (discovered from database)\n    - schema: marketing\n      tables: [campaigns]    # or list specific tables\n  forbidden_operations: [DELETE, DROP, TRUNCATE, UPDATE, INSERT]\n  domains:\n    - name: revenue\n      summary: \"Financial metrics from completed orders\"\n      description: \u003e\n        Revenue is recognized at fulfillment, not at booking.\n        Excludes refunds and chargebacks unless stated.\n      metrics: [total_revenue]\n  rules:\n    - name: tenant_isolation\n      description: \"All queries must filter by tenant_id\"\n      enforcement: block\n      query_check:\n        required_filter: tenant_id\n    - name: no_select_star\n      description: \"Must specify explicit columns\"\n      enforcement: block\n      query_check:\n        no_select_star: true\n\nresources:\n  cost_limit_usd: 5.00\n  max_retries: 3\n  token_budget: 50000\n\ntemporal:\n  max_duration_seconds: 300\n```\n\n### 2. Load the contract and create tools\n\n```python\nfrom agentic_data_contracts import DataContract, create_tools\nfrom agentic_data_contracts.adapters.duckdb import DuckDBAdapter\n\ndc = DataContract.from_yaml(\"contract.yml\")\nadapter = DuckDBAdapter(\"analytics.duckdb\")\n\n# Semantic source is auto-loaded from contract config (source.type + source.path)\ntools = create_tools(dc, adapter=adapter)\n```\n\n### 3. Use with the Claude Agent SDK (requires `claude-agent-sdk\u003e=0.1.52`)\n\n```python\nimport asyncio\nfrom agentic_data_contracts import create_sdk_mcp_server\nfrom claude_agent_sdk import (\n    ClaudeAgentOptions,\n    AssistantMessage,\n    TextBlock,\n    query,\n)\n\n# One-liner: wraps all 9 tools and bundles into an SDK MCP server\nserver = create_sdk_mcp_server(dc, adapter=adapter)\n\noptions = ClaudeAgentOptions(\n    model=\"claude-sonnet-4-6\",\n    system_prompt=f\"You are a revenue analytics assistant.\\n\\n{dc.to_system_prompt()}\",\n    mcp_servers={\"dc\": server},\n    **dc.to_sdk_config(),  # token_budget → task_budget, max_retries → max_turns\n)\n\nasync def run(prompt: str) -\u003e None:\n    async for message in query(prompt=prompt, options=options):\n        if isinstance(message, AssistantMessage):\n            for block in message.content:\n                if isinstance(block, TextBlock):\n                    print(block.text)\n\nasyncio.run(run(\"What was total revenue by region in Q1 2025?\"))\n```\n\n### 4. Or use the tools directly (no SDK required)\n\n```python\nimport asyncio\n\nasync def demo() -\u003e None:\n    # Inspect a query without executing. Response is structured JSON.\n    inspect = next(t for t in tools if t.name == \"inspect_query\")\n    result = await inspect.callable(\n        {\"sql\": \"SELECT id, amount FROM analytics.orders WHERE tenant_id = 'acme'\"}\n    )\n    print(result[\"content\"][0][\"text\"])\n    # {\"valid\": true, \"violations\": [], \"warnings\": [], \"log_messages\": [],\n    #  \"schema_valid\": true, \"explain_errors\": [], \"pending_result_checks\": [...]}\n\n    # Blocked query\n    result = await inspect.callable({\"sql\": \"SELECT * FROM analytics.orders\"})\n    print(result[\"content\"][0][\"text\"])\n    # {\"valid\": false,\n    #  \"violations\": [\"SELECT * is not allowed — specify explicit columns\", ...],\n    #  \"warnings\": [], ...}\n\nasyncio.run(demo())\n```\n\n## The 9 Tools\n\n| Tool | Description |\n|------|-------------|\n| `describe_table` | Get full column details for an allowed table |\n| `preview_table` | Preview sample rows from an allowed table |\n| `list_metrics` | List metric definitions, optionally filtered by domain, tier, or indicator_kind |\n| `lookup_metric` | Get a metric definition (SQL, tier, indicator_kind, impacts, impacted_by); fuzzy search fallback when no exact match |\n| `lookup_domain` | Get full domain context (description, metrics, tables); fuzzy search fallback |\n| `lookup_relationships` | Look up join paths for a table; finds multi-hop paths when given a target table |\n| `trace_metric_impacts` | Walk the metric-impact graph upstream (drivers) or downstream (affected metrics) from a starting metric |\n| `inspect_query` | Validate a SQL query and estimate its cost via EXPLAIN without executing |\n| `run_query` | Validate and execute a SQL query, returning results |\n\n## Domain-Driven Agent Workflow\n\nThe core design principle: **agents should understand the business domain before writing SQL.** Instead of dumping table schemas and hoping for the best, the contract teaches the agent your business vocabulary through progressive disclosure:\n\n```\n1. Domain context     →  \"What does 'revenue' mean here?\"\n2. Metric definitions →  \"How is 'total_revenue' calculated?\"\n3. Query execution    →  \"Run the validated SQL\"\n```\n\n### Defining domains\n\nEach domain carries a description that teaches the agent your business rules — things the SQL alone can't express:\n\n```yaml\nsemantic:\n  domains:\n    - name: acquisition\n      summary: \"Customer acquisition costs and conversion metrics\"\n      description: \u003e\n        Acquisition metrics track the cost and efficiency of\n        acquiring new customers across all channels.\n        CAC is calculated using fully-loaded cost, not just ad spend.\n      metrics: [CAC, CPA, CPL, click_through_rate]\n    - name: retention\n      summary: \"Customer retention, churn, and lifetime value\"\n      description: \u003e\n        Retention metrics measure how well we keep customers.\n        Churn is measured on a 30-day rolling window.\n        A customer is \"active\" if they had at least one qualifying\n        action in the window.\n      metrics: [churn_rate, LTV, retention_30d]\n```\n\n### How the agent uses domains\n\nThe system prompt gives the agent a compact domain index. When a user asks a domain-specific question, the agent explores progressively:\n\n```\nlookup_domain(\"acquisition\")        → business context + metric descriptions\nlookup_metric(\"CAC\")                → SQL expression, source table, filters\nlookup_metric(\"acquisition cost\")   → fuzzy match, returns [CAC, CPA] as candidates\nlist_metrics(domain=\"retention\")    → all metrics in the retention domain\n```\n\nThis means the agent knows that \"revenue is recognized at fulfillment, not at booking\" *before* it writes a single line of SQL — reducing hallucinated metrics and incorrect calculations.\n\n### Why progressive disclosure works\n\nThis pattern — compact index in the prompt, detailed context on demand — is the same philosophy validated by agent skill systems, MCP tool servers, and RAG architectures. Instead of overloading the agent's context window with everything upfront, you give it just enough to know *where to look*, then let it pull details when needed. The result is better token efficiency, more focused reasoning, and fewer hallucinations from context overload.\n\n## Contract Rules\n\nRules are enforced at three levels:\n\n- **`block`** — query is rejected and an error is returned to the agent\n- **`warn`** — query proceeds and a `WARNINGS:` preamble is prepended to the `run_query` response (also in `inspect_query` under `warnings`)\n- **`log`** — query proceeds and a `LOG:` preamble is prepended to the `run_query` response (also in `inspect_query` under `log_messages`); rules at this level are omitted from the system prompt so the agent can't adapt behavior to avoid triggering them\n\nEach rule carries a `query_check` (pre-execution) or `result_check` (post-execution) block. Rules with neither are advisory — they appear in the system prompt but don't enforce anything. Every rule can be scoped to a specific table or applied globally.\n\n**Built-in query checks** (pre-execution, validated against SQL AST):\n\n| Check | Description |\n|-------|-------------|\n| `required_filter` | Require a column in WHERE clause (e.g., `tenant_id`) |\n| `no_select_star` | Forbid `SELECT *` — require explicit columns |\n| `blocked_columns` | Forbid specific columns in SELECT (e.g., PII) |\n| `require_limit` | Require a LIMIT clause |\n| `max_joins` | Cap the number of JOINs |\n\n**Built-in result checks** (post-execution, validated against query output):\n\n| Check | Description |\n|-------|-------------|\n| `min_value` / `max_value` | Numeric bounds on a column's values |\n| `not_null` | Column must not contain nulls |\n| `min_rows` / `max_rows` | Row count bounds on the result set |\n\nExample with table scoping and both check types:\n\n```yaml\nrules:\n  - name: tenant_isolation\n    description: \"Orders must filter by tenant_id\"\n    enforcement: block\n    table: \"analytics.orders\"      # only applies to this table\n    query_check:\n      required_filter: tenant_id\n\n  - name: hide_pii\n    description: \"Do not select PII columns from customers\"\n    enforcement: block\n    table: \"analytics.customers\"\n    query_check:\n      blocked_columns: [ssn, email, phone]\n\n  - name: wau_sanity\n    description: \"WAU should not exceed world population\"\n    enforcement: warn\n    table: \"analytics.user_metrics\"\n    result_check:\n      column: wau\n      max_value: 8_000_000_000\n\n  - name: no_negative_revenue\n    description: \"Revenue must not be negative\"\n    enforcement: block\n    result_check:\n      column: revenue\n      min_value: 0\n```\n\n## Semantic Sources\n\nA semantic source provides metric, table schema, and relationship metadata to the agent. Paths are resolved relative to the contract file's directory (not the process CWD).\n\n**YAML** (built-in):\n```yaml\n# semantic.yml\nmetrics:\n  - name: total_revenue\n    description: \"Total revenue from completed orders\"\n    sql_expression: \"SUM(amount) FILTER (WHERE status = 'completed')\"\n    source_model: analytics.orders\n    domains: [revenue]                 # optional — see \"Metric Impacts\" below\n    tier: [north_star, department_kpi] # optional — north_star / department_kpi / team_kpi\n    indicator_kind: lagging            # optional — leading | lagging\n\ntables:\n  - schema: analytics\n    table: orders\n    columns:\n      - name: id\n        type: INTEGER\n      - name: amount\n        type: DECIMAL\n      - name: tenant_id\n        type: VARCHAR\n```\n\n`tier`, `indicator_kind`, and `domains` are all optional. For dbt and Cube sources, these fields live under the metric's `meta:` block and are read through the same field names.\n\n**dbt** — point to a `manifest.json`:\n```yaml\nsemantic:\n  source:\n    type: dbt\n    path: \"./dbt/manifest.json\"\n```\n\n**Cube** — point to a Cube schema file:\n```yaml\nsemantic:\n  source:\n    type: cube\n    path: \"./cube/schema.yml\"\n```\n\n## Table Relationships\n\nDefine join paths so the agent knows how to combine tables correctly:\n\n```yaml\n# semantic.yml\nrelationships:\n  - from: analytics.orders.customer_id\n    to: analytics.customers.id\n    type: many_to_one\n    description: \u003e\n      Join orders to customers for region-level breakdowns.\n      Every order has exactly one customer.\n\n  - from: analytics.bdg_attribution.contact_id\n    to: analytics.contacts.contact_id\n    type: many_to_one\n    description: \"Bridge table — filter to avoid fan-out from multiple attribution records.\"\n    required_filter: \"attribution_model = 'last_touch_attribution'\"\n```\n\n| Field | Required | Description |\n|-------|----------|-------------|\n| `from` / `to` | Yes | Fully qualified column references (`schema.table.column`) |\n| `type` | No | Cardinality: `many_to_one` (default), `one_to_one`, `many_to_many` |\n| `description` | No | Free-text context for the agent (join guidance, caveats, data quality notes) |\n| `required_filter` | No | SQL condition that **must** be applied when using this join (e.g., bridge table disambiguation) |\n\nThe agent sees these in its system prompt and uses them to write correct JOINs instead of guessing from column names.\n\n### Relationship Validation\n\nWhen a `SemanticSource` is passed to the `Validator`, declared relationships are actively validated against the agent's SQL:\n\n| Check | Trigger | Warning |\n|-------|---------|---------|\n| **Join-key correctness** | Agent joins on wrong columns for a declared relationship | \"uses `email` but declared relationship specifies `customer_id → id`\" |\n| **Required-filter missing** | Join has `required_filter` but WHERE clause doesn't include it | \"has required filter `status != 'cancelled'` but query does not filter on: status\" |\n| **Fan-out risk** | Aggregation (SUM, COUNT, etc.) across a `one_to_many` join | \"Results may be inflated by row multiplication\" |\n\nAll relationship checks are **advisory only** (warnings, never blocks). Undeclared joins are silently ignored — the checker only validates relationships you've explicitly defined.\n\n## Metric Impacts\n\nTable relationships tell the agent *how to join*. Metric impacts tell the agent *what drives what* — the causal / economic graph between KPIs. When an agent is asked \"why did revenue drop?\", an impact graph lets it walk upstream to the drivers (conversion rate, active customers, traffic) rather than blindly querying revenue again. When it's asked to recommend an action, it can cite verified evidence rather than hand-waving.\n\nDeclare impacts at the top level of the semantic YAML, alongside `metrics:` and `relationships:`:\n\n```yaml\n# semantic.yml\nmetric_impacts:\n  - from: active_customers\n    to: total_revenue\n    direction: positive           # positive | negative\n    confidence: verified          # verified | correlated | hypothesized\n    evidence: \"A/B test exp-042 (Q3 2025), +3.2% revenue lift, p\u003c0.01\"\n    description: \"Retained customers drive repeat purchases.\"\n```\n\n| Field | Required | Description |\n|-------|----------|-------------|\n| `from` / `to` | Yes | Metric names (must match a metric declared in the same contract) |\n| `direction` | No | `positive` (default) or `negative` |\n| `confidence` | No | `hypothesized` (default), `correlated`, or `verified` — lets the agent prioritize backed-up drivers over hunches |\n| `evidence` | No | Free text — study reference, A/B test ID, anything the agent should quote when making a recommendation |\n| `description` | No | Optional elaboration |\n\nEdges are directional. There's no `domains` field on the edge itself: an impact surfaces whenever either endpoint is in the agent's active domain, so cross-domain drivers (Checkout → Revenue) get discovered for free.\n\n### How the agent uses impacts\n\n`lookup_metric` surfaces an enriched response: each metric carries `impacts` (outgoing edges) and `impacted_by` (incoming edges), each rendered as a one-line citation string:\n\n```\n\"positive impact on total_revenue (verified): A/B test exp-042 (Q3 2025), +3.2% revenue lift, p\u003c0.01\"\n```\n\nThe agent can quote this verbatim in its answer — structured enough to reason over, readable enough to paste.\n\n`trace_metric_impacts` walks the graph via BFS:\n\n```python\nawait trace.callable({\n    \"metric_name\": \"total_revenue\",\n    \"direction\": \"upstream\",     # upstream = drivers, downstream = affected\n    \"max_depth\": 2,\n})\n# Returns: {\"edges\": [{\"depth\": 1, \"from\": \"active_customers\", \"to\": \"total_revenue\",\n#                       \"direction\": \"positive\", \"confidence\": \"verified\",\n#                       \"evidence\": \"A/B test exp-042...\"}]}\n```\n\nImpacts declared in contract YAML reference metric names regardless of where the metric itself is defined, so this works even for dbt and Cube-sourced metrics — neither semantic layer has a native causal-graph concept. Unknown metric references in `metric_impacts` emit a warning at tool-creation time (same pattern as domain validation).\n\n## Custom Prompt Rendering\n\nThe system prompt is generated by a `PromptRenderer`. The default `ClaudePromptRenderer` produces XML-structured output optimized for Claude models:\n\n```python\ndc = DataContract.from_yaml(\"contract.yml\")\nprint(dc.to_system_prompt())  # XML output, optimized for Claude\n```\n\nFor other models (GPT-4, Gemini, Llama), implement the `PromptRenderer` protocol:\n\n```python\nfrom agentic_data_contracts import PromptRenderer, DataContract\n\nclass MarkdownRenderer:\n    def render(self, contract, semantic_source=None):\n        tables = \"\\n\".join(f\"- {t}\" for t in contract.allowed_table_names())\n        return f\"## {contract.name}\\n\\nAllowed tables:\\n{tables}\"\n\ndc = DataContract.from_yaml(\"contract.yml\")\nprint(dc.to_system_prompt(renderer=MarkdownRenderer()))\n```\n\n## Scaling to Large Organizations\n\nTested for 200+ tables, 300+ metrics, 50+ relationships across multiple schemas.\n\n| Concern | How it scales |\n|---|---|\n| **System prompt size** | With domains: compact index (name + summary + count). Without domains: \u003e20 metrics auto-switches to count. \u003e30 relationships: per-table join counts with `lookup_relationships` hint |\n| **Relationship lookup** | `lookup_relationships(table=...)` returns joins for a table on demand. With `target_table`, finds shortest multi-hop join path via BFS (up to 3 hops) |\n| **Wildcard schemas** | `tables: [\"*\"]` discovers tables from the database. Resolution is cached — no repeated queries |\n| **Metric lookup** | Fuzzy search via `thefuzz` (C++ backed) — sub-millisecond even with 1000+ metrics |\n| **SQL validation** | Set-based allowlist check — O(1) per table reference regardless of allowlist size |\n\n## Resource Limits\n\n```yaml\nresources:\n  cost_limit_usd: 5.00          # max estimated query cost\n  max_retries: 3                 # max blocked queries per session\n  token_budget: 50000            # max tokens consumed\n  max_query_time_seconds: 30     # max wall-clock query time\n  max_rows_scanned: 1000000      # max rows an EXPLAIN may estimate\n```\n\n## Optional Dependencies\n\n| Extra | Package | Purpose |\n|-------|---------|---------|\n| `duckdb` | `duckdb` | DuckDB adapter |\n| `bigquery` | `google-cloud-bigquery` | BigQuery adapter |\n| `snowflake` | `snowflake-connector-python` | Snowflake adapter |\n| `postgres` | `psycopg2-binary` | PostgreSQL adapter |\n| `agent-sdk` | `claude-agent-sdk` | Claude Agent SDK integration |\n| `agent-contracts` | `ai-agent-contracts\u003e=0.2.0` | ai-agent-contracts bridge |\n\n## Optional: Formal Governance with ai-agent-contracts\n\nThe library works standalone with lightweight enforcement. Install [`ai-agent-contracts`](https://pypi.org/project/ai-agent-contracts/) to upgrade to the formal governance framework:\n\n```bash\npip install \"agentic-data-contracts[agent-contracts]\"\n```\n\n```python\nfrom agentic_data_contracts.bridge.compiler import compile_to_contract\n\ncontract = compile_to_contract(dc)  # YAML → formal 7-tuple Contract\n```\n\n**What you get with the bridge:**\n\n| Concern | Standalone | With ai-agent-contracts |\n|---|---|---|\n| Resource tracking | Manual counters | Formal `ResourceConstraints` with auto-enforcement |\n| Rule violations | Exception + retry | `TerminationCondition` with contract state machine |\n| Success evaluation | Log-based | Weighted `SuccessCriterion` scoring, LLM judge support |\n| Contract lifecycle | None | `DRAFTED → ACTIVE → FULFILLED / VIOLATED / TERMINATED` |\n| Framework support | Claude Agent SDK | + LiteLLM, LangChain, LangGraph, Google ADK |\n| Multi-agent | Single agent | Coordination patterns (sequential, parallel, hierarchical) |\n\n**When to use it:** formal audit trails, success scoring, multi-agent coordination, or integration with non-Claude agent frameworks.\n\n## Example\n\nSee [`examples/revenue_agent/`](examples/revenue_agent/) for a complete working example with a DuckDB database, YAML semantic source, and Claude Agent SDK integration.\n\n```bash\nuv run python examples/revenue_agent/setup_db.py\nuv run python examples/revenue_agent/agent.py \"What was Q1 revenue by region?\"\n```\n\n## Architecture\n\nSee [`docs/architecture.md`](docs/architecture.md) for the full design spec covering the layered architecture, YAML schema, validation pipeline, tool design, semantic sources, database adapters, and the optional `ai-agent-contracts` bridge.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflyersworder%2Fagentic-data-contracts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fflyersworder%2Fagentic-data-contracts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflyersworder%2Fagentic-data-contracts/lists"}