https://github.com/mizcausevic-dev/data-contract-registry
Schema registry for data contracts. Semver versioning, compatibility checks (backward/forward/full), ownership, freshness SLAs. Bridges to procurement-decision-api.
https://github.com/mizcausevic-dev/data-contract-registry
data-contract data-governance data-quality fastapi kinetic-gain pydantic python schema-registry
Last synced: 3 days ago
JSON representation
Schema registry for data contracts. Semver versioning, compatibility checks (backward/forward/full), ownership, freshness SLAs. Bridges to procurement-decision-api.
- Host: GitHub
- URL: https://github.com/mizcausevic-dev/data-contract-registry
- Owner: mizcausevic-dev
- License: mit
- Created: 2026-05-15T00:09:25.000Z (20 days ago)
- Default Branch: main
- Last Pushed: 2026-05-15T17:14:56.000Z (19 days ago)
- Last Synced: 2026-05-15T18:19:40.618Z (19 days ago)
- Topics: data-contract, data-governance, data-quality, fastapi, kinetic-gain, pydantic, python, schema-registry
- Language: Python
- Homepage: https://kineticgain.com/
- Size: 27.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# data-contract-registry
[](https://github.com/mizcausevic-dev/data-contract-registry/actions/workflows/ci.yml)
[](https://www.python.org/)
[](LICENSE)
**Schema registry for data contracts.** Semver versioning, compatibility checks (backward / forward / full), declared owners, freshness SLAs. The "you can't promote a new dataset version without an approved contract" pattern, lifted from API governance and aimed at data pipelines.
The headline endpoint is `POST /contracts` — register a new version, get back a deterministic compatibility report or a 422 with every breaking change called out by field name and kind.
---
## Why
The thing that gets data teams paged at 2am isn't a missing test. It's a producer who quietly removed `ltv` because "we never use it anymore" while three downstream dashboards still join on it. Schema registries (Confluent, Buf, etc.) solved this for streaming and gRPC; data pipelines need the same hardness in a shape that fits the things data teams actually argue about:
- **owners** — who do I page when this dataset goes stale
- **freshness SLA** — when does "stale" become "broken"
- **primary key** — changing it is a `MAJOR`, not a `MINOR`
- **enum drift** — adding a value is fine; removing one is a backward-compatibility break
- **deprecation policy** — flag a version with the URI of the migration plan; don't delete it
This package is the smallest thing that does all of those.
---
## Install
```bash
pip install data-contract-registry
# with the FastAPI surface:
pip install "data-contract-registry[api]"
```
Python 3.11+. Runtime deps: `pydantic` + `PyYAML`.
---
## Library quickstart
```python
from data_contract_registry import (
ContractRegistry,
DataContract,
DataField,
Owner,
)
registry = ContractRegistry()
v1 = DataContract(
dataset_id="users.daily_active",
version="1.0.0",
primary_key=["user_id", "active_date"],
owners=[Owner(team="growth-platform", contact="#growth-platform")],
fields=[
DataField(name="user_id", type="string"),
DataField(name="active_date", type="timestamp"),
DataField(name="plan", type="string", enum=["free", "pro", "enterprise"]),
DataField(name="ltv", type="number", required=False),
],
status="active",
)
registry.register(v1)
# Compatible promotion (added an optional field).
v1_1 = v1.model_copy(update={
"version": "1.1.0",
"fields": [*v1.fields, DataField(name="signup_source", type="string", required=False)],
})
report = registry.register(v1_1)
print(report.compatible) # True
# Incompatible promotion — removing a field breaks backward compatibility.
v2 = v1.model_copy(update={"version": "2.0.0", "fields": [f for f in v1.fields if f.name != "ltv"]})
report = registry.register(v2)
print(report.compatible) # False
print(report.errors[0].kind) # "field_removed"
print(report.errors[0].message) # "field 'ltv' was removed; old data will fail validation"
```
---
## Compatibility modes
| Mode | Meaning |
| ---------- | --- |
| `backward` | New schema can read data produced by the previous schema. **Default.** Consumers upgrade first. |
| `forward` | Previous schema can read data produced by the new schema. Producers upgrade first. |
| `full` | Both. |
| `none` | Anything goes. First-time onboarding only. |
The checks the engine knows how to flag (each carries a structured `kind` so you can build CI gates around specific failures):
| Kind | Severity | Mode |
| -------------------------- | -------- | --- |
| `field_removed` | error | backward |
| `field_type_changed` | error | backward |
| `field_required_added` | error | backward (optional→required) **or** forward (new required field) |
| `field_enum_shrunk` | error | backward |
| `primary_key_changed` | error | always |
| `version_not_increasing` | error | always |
| `owner_missing` | error | always |
---
## FastAPI surface
```bash
pip install "data-contract-registry[api]"
uvicorn data_contract_registry.app:app --port 8090
```
| Method | Path | What it does |
| --- | --- | --- |
| GET | `/` | Service info. |
| GET | `/healthz` | Liveness probe. |
| GET | `/datasets` | List registered dataset IDs. |
| POST | `/contracts` | Register / promote a contract. 422 with a structured issue list when incompatible. |
| POST | `/contracts/check` | Dry-run compatibility check — does **not** register. |
| GET | `/contracts/{ds}/latest` | Latest **active** contract for a dataset. |
| GET | `/contracts/{ds}/versions` | Full version history. |
| GET | `/contracts/{ds}/versions/{v}` | One specific version. |
| POST | `/contracts/{ds}/versions/{v}/deprecate` | Mark deprecated with a migration URI. |
| POST | `/contracts/{ds}/versions/{v}/archive` | Archive a version (history preserved). |
| POST | `/contracts/owners/from-decision-card` | **Cross-ecosystem hook** — pull owners out of a Decision Card. |
Bundles are held in-memory by default. For restart-safe storage, swap `_BundleStore`'s implementation; the protocol is small.
---
## The cross-ecosystem hook
The third hook in the portfolio (after `procurement-decision-api` → `policy-as-code-engine` and the Suite → Decision Intelligence bridge). When a buyer approves a vendor whose data product the team will consume, the Decision Card's `buyer.name` + `decision_maker` are **the right answer** to "who owns the contract on our side":
```bash
curl -X POST http://localhost:8090/contracts/owners/from-decision-card \
-H 'Content-Type: application/json' \
-d @decision-card.json
# -> [
# {"team": "Springfield USD", "contact": "#data-platform"},
# {"team": "Director of Data (Alex Chen)", "contact": null}
# ]
```
Drop that list straight into `DataContract.owners` and the registration carries paging info the team didn't have to re-type.
---
## YAML authoring
```yaml
# contracts/users-daily-active.yaml
dataset_id: users.daily_active
version: "1.0.0"
owners:
- team: growth-platform
contact: "#growth-platform"
freshness_sla:
max_lag_seconds: 86400
fields:
- {name: user_id, type: string}
- {name: active_date, type: timestamp}
- {name: plan, type: string, enum: [free, pro, enterprise]}
```
Hand-author in YAML, validate in CI, register from Python:
```python
import yaml
from pathlib import Path
from data_contract_registry import ContractRegistry, DataContract
raw = yaml.safe_load(Path("contracts/users-daily-active.yaml").read_text())
ContractRegistry().register(DataContract.model_validate(raw))
```
---
## Tests
```bash
pip install -e ".[dev]"
ruff check src tests && ruff format --check src tests
mypy src
pytest -v
```
CI matrix runs Python 3.11 / 3.12 / 3.13.
---
## Related in this ecosystem
- **[procurement-decision-api](https://github.com/mizcausevic-dev/procurement-decision-api)** — drafts the Decision Cards this registry pulls owners from.
- **[policy-as-code-engine](https://github.com/mizcausevic-dev/policy-as-code-engine)** — pair with this registry to enforce contracts at request time.
- **[slo-budget-tracker](https://github.com/mizcausevic-dev/slo-budget-tracker)** — wire your freshness SLA into the same monitoring story.
- More at [kineticgain.com](https://kineticgain.com/).
---
## License
MIT. See [LICENSE](LICENSE).