An open API service indexing awesome lists of open source software.

https://github.com/marquesantero/cfa

Governed execution kernel for AI-native systems with typed intent resolution, policy evaluation, state projection, and auditability.
https://github.com/marquesantero/cfa

agentic-systems ai architecture data-engineering governance llm orchestration python

Last synced: 4 days ago
JSON representation

Governed execution kernel for AI-native systems with typed intent resolution, policy evaluation, state projection, and auditability.

Awesome Lists containing this project

README

          

# CFA — Contextual Flux Architecture

[![CI](https://github.com/marquesantero/cfa/actions/workflows/ci.yml/badge.svg)](https://github.com/marquesantero/cfa/actions/workflows/ci.yml)
[![codecov](https://codecov.io/github/marquesantero/cfa/graph/badge.svg?token=P5NFQBZGYT)](https://codecov.io/github/marquesantero/cfa)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Tests](https://img.shields.io/badge/tests-536%20passed-brightgreen)](https://github.com/marquesantero/cfa/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/cfa-kernel)](https://pypi.org/project/cfa-kernel/)
[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](./LICENSE)
[![Docs](https://img.shields.io/badge/docs-docusaurus-blue.svg)](https://marquesantero.github.io/cfa/)

**A typed, pre-execution governance gate for AI agents and data pipelines.**

You declare what you intend to do as a `StateSignature`. CFA answers
`approve`, `replan(remediations)`, or `block(reason)` — deterministically —
in **under 3 ms p99** on a warm kernel, and writes the decision into a
SHA-256 hash chain you can verify offline with `cfa audit verify`. No
network. No server. No keys.

## Why CFA exists

Six things CFA does today that no adjacent tool gives you together:

1. **Structured remediation, not just yes/no.** When a fixable rule fails,
CFA returns the fix as data. The caller — an LLM agent, a CI step, a
human — applies it and retries. The recovery loop is part of the
contract, bounded at three attempts, and audited.
```json
{ "action": "replan",
"faults": [{"code": "GOVERNANCE_RAW_PII_IN_PROTECTED_LAYER", ...}],
"interventions": [
"Set constraints.no_pii_raw=True",
"Apply sha256() on PII columns before the join"
]
}
```

2. **Offline-verifiable audit chain.** Every decision is a
content-hashed event linked into a SHA-256 chain. `cfa audit verify`
replays the chain on any host that has the JSONL file. No vendor, no
server, no API key.
```bash
$ cfa audit verify --file audit.jsonl
OK · 1 274 events verified · last_hash=a4f3…6c01
```

3. **Dataset-aware policy primitives baked in.** PII columns,
partitioning, classification, merge keys, target layer — these are
first-class primitives, not metadata you re-encode in Rego. A typical
rule fits in six YAML lines:
```yaml
- name: forbid_raw_pii
condition: pii_in_protected_layer
action: block
fault_code: GOVERNANCE_RAW_PII
severity: critical
remediation: ["Apply sha256() on PII columns before the write"]
```

4. **One signature, three production backends.** The same approved
`StateSignature` compiles to PySpark + Delta Lake, ANSI SQL with
`MERGE INTO`, or dbt models with `schema.yml`. Each backend declares
its own forbidden tokens for static validation. New backends register
through `BackendRegistry` without touching the kernel.

5. **MCP server, working today.** Any MCP-compatible agent (Claude
Desktop, Cursor, Continue, custom LangGraph nodes) calls CFA before
it touches production data. Five tools: `cfa_evaluate_signature`,
`cfa_describe_rules`, `cfa_explain_fault`, `cfa_audit_check`,
`cfa_list_backends`.

6. **Deterministic by default; LLM is opt-in.** The decision path is a
pure function of `(signature, policy_bundle, catalog)`. Same inputs
produce the same decision and the same hash, every time, with no
network call. LLMs participate only on the front edge (intent →
signature) and only if you ask for them via the `[llm]` extra.

Each of these is a recorded
[Architecture Decision Record](docs/adr/). The reasoning, the
alternatives we rejected, and the boundaries are written down.

## Quick start

```bash
pip install cfa-kernel
cfa init
cfa evaluate "Join NFe with Clientes and persist to Silver" \
--catalog .cfa/catalog.json
```

For a real CI gate, the four-line decorator form:

```python
from cfa.adapters import cfa_guard

@cfa_guard("Join NFe with Clientes anonymize CPF persist Silver",
policy_bundle="policies/prod-v1.yaml", catalog=CATALOG)
def my_pipeline(): ...
```

The decorator caches a single `KernelOrchestrator` per guard and adds
~2.4 ms p99 to your call. Production-friendly.

## Where CFA pairs (instead of replacing)

CFA is **not** an LLM observability tool, a generic policy engine, a
data catalog, or a data-quality-at-rest tool. Pair with LangSmith /
Phoenix / Patronus, OPA, Unity Catalog / Atlan / DataHub, and Great
Expectations / Soda respectively. The [Compare](https://marquesantero.github.io/cfa/docs/compare)
page has the side-by-side breakdowns.

## What CFA does

| Step | What happens |
|------|-------------|
| **Formalize** | Natural language or JSON → typed `StateSignature` contract |
| **Govern** | Policy Engine evaluates PII, cost, schema, partition constraints |
| **Generate** | Execution planner + deterministic code generation (PySpark, SQL, dbt) |
| **Execute** | Pluggable sandbox with metrics collection + runtime validation |
| **Validate** | State projection, SHA-256 audit trail, lifecycle indices |

## Surfaces

All interfaces are backend-agnostic. CFA evaluates a `StateSignature` contract — however it was produced.

| Surface | For | Example |
|---------|-----|---------|
| `cfa` CLI | Everyone | `cfa policy check --signature sig.json` |
| `cfa catalog` CLI | Data platform teams | `cfa catalog validate catalog.json` |
| `cfa policy` CLI | Security/compliance | `cfa policy validate policies/prod.yaml` |
| `cfa storage` CLI | Operations | `cfa storage stats --db cfa.db` |
| `cfa lifecycle` CLI | Platform teams | `cfa lifecycle evaluate --db cfa.db` |
| `cfa signature` CLI | External systems | `cfa signature validate request.json` |
| `cfa.testing` | CI/CD | `evaluate("intent", catalog=catalog)` with pytest |
| `cfa.runtime` | Production | `RuntimeGate` as decorator/context-manager |
| `cfa.mcp` | AI agents | MCP server for any MCP-compatible client |
| `cfa.adapters` | Any framework | Universal `cfa_guard` decorator (LangGraph, CrewAI, AutoGen, DSPy, OpenAI Agents SDK) |

## Architecture

```text
CLI / MCP / Adapter / API


┌─ Formalize ──┐ NL / JSON / Tool call → typed StateSignature contract
├─ Govern ──────┤ Policy check + REPLAN cycle (approve / replan / block)
├─ Generate ────┤ Plan + code (PySpark / SQL / dbt) + static validation
├─ Execute ─────┤ Pluggable sandbox + runtime validation
└─ Validate ────┘ State projection + SHA-256 audit + lifecycle indices


Decision JSON / Audit Trail / OTel / Prometheus
```

## Capabilities

| Capability | What it gives you |
|------------|-------------------|
| SHA-256 audit trail | Tamper-evident chain of decisions, verifiable offline (`cfa audit verify`) |
| State projection | Each execution carries the typed state of the prior one — no implicit globals |
| Lifecycle indices (IFo/IFs/IFg/IDI) | Quantifies how often an intent recurs, stabilizes, and qualifies for promotion to a reusable skill |
| REPLAN cycle | Failed policy checks emit a structured remediation, not a hard stop |
| Backend-agnostic codegen | Same signature compiles to PySpark, ANSI SQL, or dbt — pluggable via `BackendRegistry` |
| Artifact hashing | Catalog, policy bundle, and signature are content-hashed and bound to every decision |
| MCP protocol | Any MCP-compatible agent can call CFA as a governance tool |
| SQLite + JSONL storage | First-class persistence with stats, retention cleanup, and vacuum |
| Config auto-discovery | `cfa.yaml` walked up the tree; all CLI commands respect it |
| Zero core dependencies | Optional extras for `yaml`, `otel`, `mcp`, `llm` — none required for the kernel |

## CLI

```bash
# Governance & evaluation
cfa evaluate "intent" --catalog catalog.json --strict
cfa policy check --signature signature.json --policy-bundle policies/prod.yaml
cfa policy check --signature sig.json --catalog cat.json --strict --audit-log audit.jsonl

# Validation (CI-ready with JSON output and exit codes)
cfa catalog validate catalog.json --require-datasets --format json
cfa signature validate signature.json --format json
cfa policy validate policies/prod.yaml --format json

# Audit & verification
cfa audit show --id INTENT_ID --file audit.jsonl --format json
cfa audit verify --file audit.jsonl

# Policy rules
cfa rules list
cfa rules explain FAULT_CODE

# Storage management
cfa storage stats --db cfa.db --format json
cfa storage cleanup --db cfa.db --retention 90
cfa storage vacuum --db cfa.db

# Lifecycle management
cfa lifecycle evaluate --db cfa.db --window 30
cfa lifecycle list --db cfa.db

# Project health
cfa status --format json

# Bootstrap
cfa init

# Backends
cfa backend list
```

## From Python

```python
from cfa.testing import evaluate, assert_passed

result = evaluate(
"Join NFe with Clientes and persist to Silver",
catalog=MY_CATALOG,
policy_rules=my_rules,
backend="pyspark",
)
assert_passed(result)
```

### Policy check with audit

```python
from cfa.policy.engine import PolicyEngine
from cfa.types import StateSignature

signature = StateSignature.from_dict(signature_dict)
engine = PolicyEngine(policy_bundle_version="prod-v1.0")
result = engine.evaluate(signature)
# result.action → approve / replan / block
```

### Runtime gate

```python
from cfa.runtime import RuntimeGate, GateConfig

gate = RuntimeGate(
config=GateConfig(policy_bundle="prod_v1.0", sandbox="mock"),
catalog=PROD_CATALOG,
)

@gate.guard("aggregate sales with PII protected")
def my_pipeline():
...
```

### SQLite storage

```python
from cfa.storage import SqliteStorage

store = SqliteStorage("cfa.db")
store.ensure_schema()

# Audit
store.audit_append(event)

# Execution records (lifecycle)
store.execution_append(record_dict)

# Lifecycle skills
store.skill_upsert("hash_a", skill_data)
```

## Policy Bundles

Declarative YAML policy rules — separate governance from code:

```yaml
# policies/prod-v1.yaml
policy_bundle:
version: "prod-v1.0"
rules:
- name: forbid_raw_pii
condition: pii_in_protected_layer
action: block
fault_code: GOVERNANCE_RAW_PII
severity: critical
message: "PII in protected layer without anonymization."
remediation:
- "Apply sha256 on PII columns before the operation"
```

Validated at load time — unknown conditions, duplicate fault codes, and invalid enums are caught immediately.

## Config File

```yaml
# cfa.yaml (auto-discovered by all commands)
version: "1.0"
storage:
backend: sqlite
path: cfa.db
retention_days: 90
defaults:
catalog: .cfa/catalog.json
policy_bundle: .cfa/policies/prod-v1.yaml
backend: pyspark
```

## Backends

Three governed code generation backends, all pluggable via `BackendRegistry`:

| Backend | Language | Features |
|---------|----------|----------|
| `pyspark` | PySpark + Delta Lake | Merge, partition overwrite, PII anonymization |
| `sql` | ANSI SQL | MERGE INTO, INSERT OVERWRITE, partition clauses |
| `dbt` | dbt models + schema.yml | Config blocks, refs, not_null/unique tests, PII annotations |

Each backend declares its own forbidden tokens for static validation.

## MCP Server

Expose CFA governance to any AI agent via Model Context Protocol:

```json
{
"mcpServers": {
"cfa": {
"command": "python",
"args": ["-m", "cfa.mcp"]
}
}
}
```

5 tools: `cfa_evaluate_signature`, `cfa_describe_rules`, `cfa_explain_fault`, `cfa_audit_check`, `cfa_list_backends`.

## Repository

```text
src/cfa/
├── core/ Kernel, Planner, CodeGen, Conditions, Phases
├── policy/ PolicyEngine, PolicyBundle, catalog validation, standalone-governance surface
├── resolve/ Intent → StateSignature (rule-based + LLM backends, confirmation orchestrator)
├── validate/ Static, runtime, and signature validation
├── obs/ Metrics, OTel, Notify, Indices, Promotion
├── behavior/ BehaviorSpec + Systematizer (human intent → policy rules)
├── audit/ AuditTrail, Context, Hashing
├── lifecycle/ IFo/IFs/IFg/IDI indices + Promotion/Demotion engine
├── execution/ Partial execution, State projection
├── adapters/ Universal cfa_guard decorator for any framework
├── backends/ PySpark, SQL, dbt (pluggable)
├── sandbox/ Pluggable sandbox backend + registry + executor
├── cli/ CLI commands by family (core/, governance/, reporting/, project/, infrastructure/)
├── storage/ SQLite + JSONL backends (stats, cleanup, vacuum)
├── mcp/ MCP server (JSON-RPC over stdio)
├── reporting/ HTML reports
├── runtime/ Production governance gate
├── testing/ pytest-native evaluate() + fixtures
├── config.py CFA config (discovery, defaults)
├── types.py StateSignature, Fault, KernelResult
└── _lazy.py Reusable lazy loader for package __init__
```

> The 1.1.0 cycle consolidated five packages from the 1.0.0 layout: `governance` → `policy`, `validation` → `validate`, `observability` → `obs`, `normalizer` + `resolution` → `resolve`. `adapters/` lost the per-framework shim files (langgraph/crewai/autogen/dspy/openai_agents) in favor of a single universal decorator.

## Docs

All documentation at **[marquesantero.github.io/cfa](https://marquesantero.github.io/cfa/)**:

- [Getting Started](https://marquesantero.github.io/cfa/docs/getting-started)
- [CLI Reference](https://marquesantero.github.io/cfa/docs/cli)
- [Policy Bundles](https://marquesantero.github.io/cfa/docs/policy-bundles)
- [Backends](https://marquesantero.github.io/cfa/docs/backends)
- [MCP Server](https://marquesantero.github.io/cfa/docs/mcp-server)
- [Reporting](https://marquesantero.github.io/cfa/docs/reporting)
- [Architecture Notes](https://marquesantero.github.io/cfa/docs/architecture-notes)
- [FAQ](https://marquesantero.github.io/cfa/docs/faq)

## Demos

Two complete notebooks, tested on Databricks with CFA 1.0.0, 0 errors:

| File | Format | Description |
|------|--------|-------------|
| `demos/cfa_demo_complete` | `.dbc` / `.py` | Rule-based governance — APPROVE, REPLAN, BLOCK, codegen, audit, storage |
| `demos/cfa_llm_demo_complete` | `.dbc` / `.py` | LLM-powered — semantic normalizer, systematizer, strict mode, compare |

Import the `.dbc` into Databricks or run the `.py` files anywhere.

## Extending CFA

CFA is built so that adding a **vertical** (a new domain to govern —
infrastructure, agent tool calls, financial transactions, ML deploys)
or an **integration** (a new way to feed signatures in and emit decisions
out) is a pip-installable package. You do not edit the kernel.

```toml
# pyproject.toml of your plugin
[project.entry-points."cfa.verticals"]
myapp = "cfa_vertical_myapp.vertical:MyappVertical"

[project.entry-points."cfa.integrations"]
mytool = "cfa_int_mytool.integration:MyToolIntegration"

[project.entry-points."cfa.decision_sinks"]
slack = "cfa_sink_slack.sink:SlackWebhookSink"
```

Reference contracts:
[ADR-0007](docs/adr/0007-layered-architecture.md),
[ADR-0009](docs/adr/0009-vertical-protocol.md),
[ADR-0010](docs/adr/0010-integration-protocol.md).
Full guide: [Extending CFA](https://marquesantero.github.io/cfa/docs/extending).

## Roadmap

CFA is a typed layer between **intent** and **execution**. Data writes
were the first vertical because the maintainer is a data engineer with
primitives that were easy to test. The kernel itself is domain-agnostic —
every additional vertical (infrastructure, agent tool calls, financial
transactions, schema migrations, ML model deploys) plugs in as an
external package via the
[Vertical](docs/adr/0009-vertical-protocol.md) contract.

The strategy from 1.2.0 forward — formalised in
[ADR-0013](docs/adr/0013-protocol-over-product.md) — is **dual-track**:
every release ships one substrate deliverable (something that survives
multiple hype cycles) and one adoption deliverable (something useful
this quarter). Cadence: 6–8 weeks per minor.

Full plan in [`drafts/ROADMAP.md`](drafts/ROADMAP.md). Headline picks:

| Release | Substrate | Adoption |
|---------|-----------|----------|
| **1.1.0** (current) | Plug contracts (ADR-0007 → 0012), vertical-aware `StateSignature`, 599 tests, perf baselines | Honest positioning, MCP server, `compare.md`, ADRs |
| **1.2.0** (next) | `cfa-protocol v0.1` in a separate repo — JSON Schema for signature, audit chain, decision, policy bundle; conformance suite | `cfa dbt check` — reads `manifest.json`, runs the policy bundle in CI; GitHub Action template; demo project |
| **1.3.0** | Standalone Go binary `cfa-verify` — validates audit chains and signatures without Python | `cfa.verticals.agent` + reference LangGraph + Claude demo: agent tries to delete prod, CFA blocks with remediation |
| **1.4.0** | TypeScript signature builder (`@cfa/protocol`); `cfa-hub` catalog of verticals/bundles/sinks | Airflow `CFAGateOperator`; Slack / OTel / GitHub PR DecisionSinks; `cfa.verticals.infra` + `cfa terraform check` |
| **1.5.0** | Conformance badge + `cfa-protocol v0.5`; spec stability milestones | Live dashboard, lifecycle CLI, 2-3 case studies |
| **2.0.0** | `cfa-protocol 1.0` stable; third-party security audit; governance process | Multi-vertical in real production; 5+ implementations (Python + Go + TS + 2 others) |

The protocol becomes the product. The Python kernel is *one*
implementation. Verticals, integrations, decision sinks are shipped or
maintained externally. The substrate survives whatever framework wins
in 2027.

## Contributing

See [CONTRIBUTING.md](./CONTRIBUTING.md) for development setup, test conventions, and the PR checklist. By participating, you agree to the [Code of Conduct](./CODE_OF_CONDUCT.md). Security issues: see [SECURITY.md](./SECURITY.md).

## License

[MIT](./LICENSE) · [Antero Marques](https://github.com/marquesantero)