https://github.com/sidequery/sqlmesh-openlineage
https://github.com/sidequery/sqlmesh-openlineage
Last synced: 11 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/sidequery/sqlmesh-openlineage
- Owner: sidequery
- License: mit
- Created: 2026-01-11T15:59:21.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-02-16T15:03:14.000Z (2 months ago)
- Last Synced: 2026-02-16T22:30:58.201Z (about 2 months ago)
- Language: Python
- Size: 181 KB
- Stars: 7
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# sqlmesh-openlineage
OpenLineage integration for SQLMesh. Automatically emits lineage events to Marquez or any OpenLineage-compatible backend.
## Features
- **Table-level lineage**: Track which models depend on which upstream models
- **Column-level lineage**: Track which columns flow from source to destination
- **Schema capture**: Column names and types for each model
- **Execution stats**: Duration, rows processed, bytes processed
- **Per-model events**: START/COMPLETE/FAIL events for each model evaluation
## Installation
```bash
pip install sqlmesh-openlineage
```
Or with uv:
```bash
uv add sqlmesh-openlineage
```
## Quick Start (CLI Users)
**Note:** This package requires Python-based SQLMesh configuration (`config.py`), not YAML configuration.
Add this to your `config.py`:
```python
import sqlmesh_openlineage
sqlmesh_openlineage.install(
url="http://localhost:5000",
namespace="my_project",
# api_key="...", # optional
)
from sqlmesh.core.config import Config
config = Config(
# ... your existing config
)
```
Then run `sqlmesh run` as normal. OpenLineage events will be emitted for each model evaluation.
## Environment Variables
You can also configure via environment variables:
```bash
export OPENLINEAGE_URL=http://localhost:5000
export OPENLINEAGE_NAMESPACE=my_project
export OPENLINEAGE_API_KEY=... # optional
```
Then in `config.py`:
```python
import sqlmesh_openlineage
sqlmesh_openlineage.install() # reads from env vars
```
## How It Works
This package uses SQLMesh's `set_console()` API to inject a custom Console wrapper. The wrapper intercepts per-snapshot lifecycle events and emits corresponding OpenLineage events:
- `START` event when a model evaluation begins
- `COMPLETE` event when evaluation succeeds (includes execution stats)
- `FAIL` event when evaluation fails or audits fail
## Events Emitted
| SQLMesh Event | OpenLineage Event | Data Included |
|---------------|-------------------|---------------|
| Model evaluation start | RunEvent(START) | Input datasets, output dataset with schema, column lineage |
| Model evaluation success | RunEvent(COMPLETE) | Execution stats (rows, bytes, duration) |
| Model evaluation failure | RunEvent(FAIL) | Error message |
| Audit failure | RunEvent(FAIL) | Audit failure details |
## Column-Level Lineage
The integration automatically extracts column-level lineage using SQLMesh's built-in lineage analysis. For example, if you have:
```sql
-- customers.sql
SELECT customer_id, name, email FROM raw_customers
-- customer_summary.sql
SELECT
c.customer_id,
c.name as customer_name,
COUNT(o.order_id) as total_orders
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
```
The lineage will show that `customer_summary.customer_name` traces back to `customers.name`.
## Testing with Marquez
```bash
# Start Marquez (requires Docker)
docker compose up -d
# Configure and run SQLMesh
export OPENLINEAGE_URL=http://localhost:5001
sqlmesh run
# View lineage at http://localhost:3000
```
## Development
```bash
# Install dependencies
uv sync --dev
# Run tests (unit + integration)
uv run pytest tests/ -v
# Run Marquez integration test (requires Docker)
docker compose up -d
uv run pytest tests/test_marquez_integration.py -v -s
docker compose down
```
## License
MIT