https://github.com/opendqv/opendqv
Open-source, contract-driven data quality validation. Shift-left enforcement at the point of write — before data enters your pipeline.
https://github.com/opendqv/opendqv
data-contracts data-governance data-quality data-validation fastapi mcp open-source python shift-left
Last synced: 27 days ago
JSON representation
Open-source, contract-driven data quality validation. Shift-left enforcement at the point of write — before data enters your pipeline.
- Host: GitHub
- URL: https://github.com/opendqv/opendqv
- Owner: OpenDQV
- License: mit
- Created: 2026-03-13T00:39:41.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-11T09:17:12.000Z (about 1 month ago)
- Last Synced: 2026-04-11T10:27:17.019Z (about 1 month ago)
- Topics: data-contracts, data-governance, data-quality, data-validation, fastapi, mcp, open-source, python, shift-left
- Language: Python
- Homepage: https://github.com/OpenDQV/OpenDQV
- Size: 47.8 MB
- Stars: 10
- Watchers: 0
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md
- Support: SUPPORT.md
- Governance: GOVERNANCE.md
- Roadmap: docs/roadmap.md
- Cla: CLA.md
Awesome Lists containing this project
README
[](https://github.com/OpenDQV/OpenDQV/actions/workflows/ci.yml)
[](https://github.com/OpenDQV/OpenDQV/blob/main/LICENSE)
[](https://pypi.org/project/opendqv/)
[](https://pypi.org/project/opendqv/)
[](https://github.com/orgs/OpenDQV/packages/container/package/opendqv%2Fopendqv)
[](#)
[](https://securityscorecards.dev/#/projects/github.com/OpenDQV/OpenDQV)
[](https://codecov.io/gh/OpenDQV/OpenDQV)
[](https://github.com/astral-sh/ruff)
[](https://www.bestpractices.dev/projects/12229)
| [Quickstart](docs/quickstart.md) | [Rules](docs/rules/) | [Contracts](docs/compliance-contracts.md) | [MCP](docs/mcp.md) | [API](docs/index.md) | [Security](SECURITY.md) | [FAQ](docs/faq.md) |
|---|---|---|---|---|---|---|
> **"Trust is easier to build than to repair."**
> That is why OpenDQV exists. A `422` at the point of write is cheaper than a data incident three weeks later.
> **Beta (v2.x).** Public API surface (REST, contract YAML, MCP tools, Python SDK) is stable. Breaking changes follow a one-release deprecation cycle. Security fixes backported to the latest 2.x line. See [API Stability](#api-stability) for commitments.
**OpenDQV is a write-time data validation service.** Source systems call it before writing data. Bad records return a `422` with per-field errors. Good records pass through. No payload is stored.

```mermaid
flowchart LR
subgraph Callers
direction TB
SF[Salesforce]
SAP[SAP]
DYN[Dynamics]
ORA[Oracle]
WEB[Web forms]
ETL1[ETL pipelines]
DJ[Django clean]
PY[Python scripts]
PD[Pandas / ETL]
CD[Claude Desktop]
CUR[Cursor]
LLM[LLM agents]
end
subgraph OpenDQV
direction TB
API[Validation API\nREST / batch]
SDK[LocalValidator\nin-process SDK]
MCP[MCP Server\nAI-native]
API & SDK & MCP --> CON[Contracts · YAML\nGovernance · RBAC\nAudit trail]
API & SDK & MCP --> GEN[Code Generator\nApex · JS · SQL]
end
subgraph Results
direction TB
R1[valid: true / false]
R2[per-field errors]
R3[severity levels]
R4[webhooks on events]
end
SF & SAP & DYN & ORA & WEB & ETL1 --> API
DJ & PY & PD --> SDK
CD & CUR & LLM --> MCP
API & SDK & MCP --> R1
subgraph Importers
IMP[dbt schema · GX suites\nSoda checks · ODCS · CSV]
end
IMP --> CON
style API fill:#0d3b5e,stroke:#092a44,color:#fff
style SDK fill:#0d3b5e,stroke:#092a44,color:#fff
style MCP fill:#0d3b5e,stroke:#092a44,color:#fff
style CON fill:#1a8aad,stroke:#14708d,color:#fff
style GEN fill:#1a8aad,stroke:#14708d,color:#fff
style R1 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
style R2 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
style R3 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
style R4 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
style IMP fill:#1a8aad,stroke:#14708d,color:#fff
```
A `422` at the point of write closes the feedback loop — producers see failures immediately and fix them at source. Rejection rates drop over time because the tool changes the incentive, not just the outcome.
For post-landing monitoring use [Great Expectations](https://greatexpectations.io), [Soda](https://www.soda.io), or [dbt tests](https://docs.getdbt.com/docs/build/tests) — they're complementary, not competing. OpenDQV owns layer one (write-time enforcement); those tools own layer three (post-ingestion observability).
---
## AI Agents — first-class via MCP
OpenDQV ships a built-in [Model Context Protocol](https://modelcontextprotocol.io) server, so [Claude Desktop](https://claude.ai/download), [Cursor](https://www.cursor.com), and any other MCP-compatible agent can discover contracts, validate records, and explain failures through tool calls the agent **explicitly declares** — no hallucinated compliance, no invented rules.
[](https://github.com/user-attachments/assets/4d414ff1-b08c-4ff1-91e4-e421f0d5391d)
*4-minute demo: Claude Desktop uses two MCP servers — OpenDQV for validation, Marmot for catalog lineage — to check a menu item against `ppds_menu_item` for Natasha's Law allergen compliance, stating which tool calls it makes and why. ([Backup: download the MP4 from the repo](https://github.com/OpenDQV/OpenDQV/raw/main/docs/demo_mcp.mp4))*
For tool reference, write guardrails, remote/enterprise mode, and the Marmot composition pattern, see **[docs/mcp.md](docs/mcp.md)**.
---
## Install
| I have... | Command |
|-----------|---------|
| Python 3.11+ | `git clone https://github.com/OpenDQV/OpenDQV.git && cd OpenDQV && bash install.sh` |
| Docker | `git clone https://github.com/OpenDQV/OpenDQV.git && cd OpenDQV && cp .env.example .env && docker compose up -d` |
| Just the SDK/CLI | `pip install opendqv` then `opendqv init` to bootstrap contracts |
| None of the above | [Beginner setup guide →](docs/beginner-quickstart.md) |
`install.sh` creates a virtual environment, installs dependencies, and launches the onboarding wizard. Docker pulls `ghcr.io/opendqv/opendqv:latest` — no build step required.
> ⚠️ `AUTH_MODE=open` (the default) has **no authentication**. Set `AUTH_MODE=token` and a strong `SECRET_KEY` in `.env` before any non-local deployment. See [SECURITY.md](SECURITY.md).
---
## Your First Validation
**1. Write a contract** — drop a YAML file in your contracts directory (run `opendqv init --all` to copy the 43 bundled contracts, or `opendqv init` for a single starter):
```yaml
contract:
name: order
version: "1.0"
owner: "Data Governance"
status: active
rules:
- name: valid_email
type: regex
field: email
pattern: "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$"
severity: error
error_message: "Invalid email format"
- name: amount_positive
type: min
field: amount
min: 0.01
severity: error
error_message: "Order amount must be positive"
- name: status_valid
type: allowed_values
field: status
allowed_values: [pending, confirmed, shipped, cancelled]
severity: error
error_message: "Invalid order status"
```
**2. Reload contracts:**
```bash
curl -X POST http://localhost:8000/api/v1/contracts/reload
```
**3. Send a bad record — OpenDQV rejects it:**
```bash
curl -s -X POST http://localhost:8000/api/v1/validate \
-H "Content-Type: application/json" \
-d '{"contract": "order", "record": {"email": "not-an-email", "amount": -5, "status": "unknown"}}'
```
```json
{
"valid": false,
"errors": [
{"field": "email", "rule": "valid_email", "message": "Invalid email format", "severity": "error"},
{"field": "amount", "rule": "amount_positive", "message": "Order amount must be positive", "severity": "error"},
{"field": "status", "rule": "status_valid", "message": "Invalid order status", "severity": "error"}
],
"contract": "order",
"version": "1.0"
}
```
**4. Fix the record — it passes:**
```bash
curl -s -X POST http://localhost:8000/api/v1/validate \
-H "Content-Type: application/json" \
-d '{"contract": "order", "record": {"email": "alice@example.com", "amount": 49.99, "status": "pending"}}'
```
```json
{"valid": true, "errors": [], "warnings": [], "contract": "order", "version": "1.0"}
```
The `customer` contract ships pre-seeded if you want to skip step 1. The [quickstart guide](docs/quickstart.md) walks through authoring, lifecycle, and batch validation.
---
## Rules
| Type | What it checks |
|------|----------------|
| `not_empty` | Field is present and non-empty |
| `regex` | Field matches (or does not match) a pattern. Built-ins: `builtin:email`, `builtin:uuid`, `builtin:ipv4`, `builtin:url` |
| `min` / `max` / `range` | Numeric bounds |
| `min_length` / `max_length` | String length |
| `date_format` | Parseable date/datetime. Falls back through common formats if no explicit format is set |
| `allowed_values` | Value must be in a fixed list |
| `lookup` | Value must appear in a local file or HTTP endpoint (with TTL cache) |
| `compare` | Cross-field: `field` op `compare_to` — supports `gt`, `lt`, `gte`, `lte`, `eq`, `neq`, and `today`/`now` sentinels |
| `required_if` / `forbidden_if` | Conditional: required or forbidden when another field equals a value |
| `checksum` | Check-digit integrity: IBAN, GTIN/GS1, NHS, ISIN, LEI, VIN, CPF, ISRC |
| `unique` | No duplicates within a batch (batch mode only) |
| `cross_field_range` | Value must be between two other fields in the same record |
| `field_sum` | Sum of named fields must equal a target (within optional tolerance) |
| `geospatial_bounds` | Lat/lon pair within a bounding box |
| `date_diff` | Difference between two date fields within a range |
| `age_match` | Declared age consistent with date-of-birth field |
Rules have `severity: error` (blocks the record) or `severity: warning` (flags but allows).
Any rule can include a `condition` block to apply it only when another field equals a given value.
Full reference: [docs/rules/](docs/rules/)
---
## How it compares
A mature data governance programme operates across three layers, each with a distinct job:
| Layer | Purpose | Tools |
|---|---|---|
| **1. Write-time enforcement** | Prevent bad data from entering any system | **OpenDQV** |
| **2. Catalog / governance / stewardship** | Ownership, glossary, lineage, policy, stewardship workflows | Alation, Atlan, Collibra, Purview, DataHub, Marmot |
| **3. Pipeline testing / observability** | Detect drift, freshness issues, residual quality after ingestion | Great Expectations, Soda Core, dbt tests, Monte Carlo |
OpenDQV Core owns layer one. Your catalog handles layer two, your pipeline tools handle layer three.
| | Great Expectations / Soda / dbt | OpenDQV |
|---|---|---|
| **When** | After data lands (in warehouse/lake) | Before data is written (at the door) |
| **Where** | Data pipelines, batch jobs | Source system integration points |
| **Model** | Scan data at rest | Validate data in flight |
| **Latency** | Minutes to hours (batch) | Milliseconds (API call) |
| **Who calls it** | Data engineers | Data engineers, developers, CRM admins |
**They're complementary.** Use Great Expectations to monitor your warehouse. Use OpenDQV to stop bad data from getting there in the first place.
---
## Contracts
43 production-ready contracts ship inside the `opendqv` package covering GDPR, HIPAA, SOX, MiFID II,
UK Building Safety Act, Martyn's Law, Natasha's Law, Ofcom Online Safety Act, EU DORA,
and 20+ other regulatory frameworks across UK, EU, and US. `pip install opendqv` gives you all of them
— `opendqv list` works with zero configuration.
See [docs/compliance-contracts.md](docs/compliance-contracts.md) for the full list with
regulatory context, or browse [opendqv/contracts/](opendqv/contracts/) directly.
17 minimal starter templates are in [examples/starter_contracts/](examples/starter_contracts/).
---
## Performance
EC2 c6i.large, 2 workers, 12-rule contract, mixed 50/50 workload:
**~482 req/s, p99 ~182 ms.** Sizing rule: `WEB_CONCURRENCY = number of vCPUs`.
See [docs/benchmark_throughput.md](docs/benchmark_throughput.md) for full platform comparison,
methodology, and monthly volume extrapolation.
---
## Documentation
| | |
|---|---|
| [Quickstart](docs/quickstart.md) | Build your first contract in 15 minutes |
| [Rules Reference](docs/rules/) | All rule types with parameters and examples |
| [Compliance Contracts](docs/compliance-contracts.md) | 44 contracts with regulatory context |
| [API Reference](docs/index.md) | REST endpoints, SDK, GraphQL, webhooks |
| [Security](SECURITY.md) | Deployment checklist, threat model, RBAC |
| [Production Deployment](docs/production_deployment.md) | Token auth, TLS, Docker Compose, hardening |
| [Integrations](docs/index.md) | Salesforce, Kafka, Snowflake, dbt, Databricks, MCP, and more |
| [All docs →](docs/) | 76 documentation files |
---
## API Stability
OpenDQV is in Beta as of 2.0.0. The following stability commitments apply to the v2.x series:
- **REST API endpoints** — paths, request bodies, and response shapes are stable within `v2.x`. Backwards-incompatible changes require a major version bump and follow a deprecation cycle (one minor release of warnings before removal).
- **YAML contract format** — the contract schema (rules, fields, types) is stable within `v2.x`. New rule types may be added; existing rules will not change semantics without a deprecation cycle.
- **Python SDK** — `OpenDQVClient`, `AsyncOpenDQVClient`, and `LocalValidator` public method signatures are stable within `v2.x`. Internal helpers (prefixed `_`) are not covered.
- **MCP tools** — tool names and parameters are stable within `v2.x`.
- **Security fixes** — backported to the latest 2.x line on a best-effort basis.
### Known limitations in v2.2.x
- **Rule null handling is inconsistent.** Most format rules fail when the target
field is missing; a few (`max_length`, `allowed_values`) pass silently;
`field_sum` and `ratio_check` coerce missing operands to `0`. Single-record
and batch paths disagree in a few cases. See
[`docs/rules/core_rules.md`](docs/rules/core_rules.md#null-handling-current-v22x-behaviour)
for the full matrix and the safe pattern to use today. v2.3.0 will make this
consistent (loud-by-default with an `optional: true` opt-out).
- **Unknown rule types pass silently at runtime.** A typo in `type:` (e.g.
`min_lenght`) is caught by `opendqv lint` but not by the engine — a typo'd
rule is a disabled rule. Always lint before deploy. v2.3.0 will reject
unknown types at contract load.
---
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions, coding guidelines, and how to submit changes.
## License
MIT — see [LICENSE](LICENSE).
## Acknowledgements
**Led by [Sunny Sharma](https://uk.linkedin.com/in/sunny-sharma-3927632), [BGMS Consultants Ltd](https://www.bgmsconsultants.com).** The vision, the architecture, every contract, and every design decision in this repository are directed by a human who believes data quality is a write-time responsibility.
OpenDQV is built with a hybrid team. Sunny leads — carbon and silicon. Three AI collaborators execute: Claude Sonnet 4.6 (primary developer), Claude Opus 4.6 (strategic auditor), and Grok (market intelligence). All answer to the same ethos: *trust is easier to build than to repair.*