https://github.com/paulushcgcj/sqlfy
Schema Graph Engine — Parse Flyway migrations into an AST, reconstruct your database schema state, and export LLM-ready vector context.
https://github.com/paulushcgcj/sqlfy
llm migration sql
Last synced: 1 day ago
JSON representation
Schema Graph Engine — Parse Flyway migrations into an AST, reconstruct your database schema state, and export LLM-ready vector context.
- Host: GitHub
- URL: https://github.com/paulushcgcj/sqlfy
- Owner: paulushcgcj
- License: gpl-3.0
- Created: 2026-05-25T17:05:28.000Z (24 days ago)
- Default Branch: main
- Last Pushed: 2026-06-14T05:51:02.000Z (4 days ago)
- Last Synced: 2026-06-14T06:26:10.494Z (4 days ago)
- Topics: llm, migration, sql
- Language: Python
- Homepage: https://paulushcgcj/github.io/sqlfy
- Size: 1.37 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 61
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SQLfy
**Schema Graph Engine** — Parse Flyway migrations into an AST, reconstruct your database schema state, and export LLM-ready vector context.
```
Flyway SQL files → sqlglot AST → Reconstructor → Schema Graph / SchemaState → LLM Chunks
```
---
## Overview
SQLfy reads a set of Flyway migration files in version order, parses each DDL statement into an abstract syntax tree, and reconstructs the **final state** of your database schema. From that state it produces:
- An interactive **ERD** showing tables and foreign-key relationships
- A structured **table explorer** with columns, types, constraints, indexes, and comments
- Pre-formatted **LLM context chunks** ready to be embedded into a RAG pipeline or pasted into a prompt
Primary target dialect is **OracleDB**. **PostgreSQL**, **MySQL**, and **SQLite** are also supported via the `--dialect` flag.
### Multi-Dialect Support
SQLfy supports multiple SQL dialects with automatic type normalization:
| Dialect | Invoke with | Type Normalization Examples |
|---|---|---|
| **Oracle** _(default)_ | `--dialect oracle` | `VARCHAR2` → `VARCHAR`, `NUMBER` → `NUMERIC` |
| **PostgreSQL** | `--dialect postgres` | `SERIAL` → `INTEGER`, `TEXT` → `VARCHAR` |
| **MySQL** | `--dialect mysql` | `TINYINT` → `SMALLINT`, `DATETIME` → `TIMESTAMP` |
| **SQLite** | `--dialect sqlite` | `TEXT` → `VARCHAR`, `REAL` → `FLOAT` |
**Usage:**
```bash
sqlfy dump ./postgres-migrations --dialect postgres
sqlfy graph ./mysql-migrations --dialect mysql --format mermaid
sqlfy insights ./sqlite-migrations --dialect sqlite
```
**How it works:**
- The `--dialect` flag is passed to [sqlglot](https://github.com/tobymao/sqlglot) for parsing
- Types are normalized to canonical forms (e.g., `SERIAL` → `INTEGER`, `VARCHAR2` → `VARCHAR`)
- Auto-increment columns are detected per-dialect (`SERIAL`, `AUTO_INCREMENT`, `IDENTITY`)
- Output formats work consistently across all dialects
---
## Repository Structure
```
sqlfy/
├── app/ React + Vite + Tauri desktop UI
├── cli/ Python CLI (pip-installable)
│ ├── src/sqlfy/
│ │ ├── core.py Schema graph engine (data types, chunk builder, layout)
│ │ ├── reconstructor.py Stateful migration processor (incremental, point-in-time)
│ │ ├── schema_state.py SchemaState dictionary — serialisable, LLM-ready snapshot
│ │ └── main.py argparse CLI entry point
│ ├── tests/ pytest suite (140+ tests)
│ └── pyproject.toml
└── samples/ Shared Flyway .sql fixtures (Oracle DDL — used by app and test suite)
```
---
## Quick Start
### Desktop app
```bash
cd app
npm install
npm run dev # Vite dev server (browser)
npx tauri dev # Tauri desktop window
```
The app is pre-loaded with the sample Oracle schema from `samples/`. Replace the SQL with your own Flyway files, or add files with **+ Add Migration File**, then click **▶ Parse →**.
### CLI
```bash
cd cli
pip install . # install
sqlfy ./samples # human-readable schema summary
```
---
## Distribution
### Automated releases (Recommended)
Every time you create a new tag, GitHub Actions automatically builds binaries for all platforms:
```bash
# Create and push a tag
git tag v0.20.0
git push origin v0.20.0
```
This triggers the build workflow which creates:
- `sqlfy-macos-arm64.zip` (macOS Apple Silicon)
- `sqlfy-linux-amd64.zip` (Linux x86_64)
- `sqlfy-windows-amd64.zip` (Windows x86_64)
Each zip contains the binary + README.md. The workflow automatically creates a GitHub Release with all files attached.
**Users download from:** `https://github.com/paulushcgcj/sqlfy/releases`
### Building a standalone binary locally
To build manually for your current platform:
```bash
cd cli
bash build-binary.sh
```
This creates `cli/dist/sqlfy-binary/sqlfy` (~35 MB) — a self-contained executable with zero dependencies.
**To share:**
1. Zip the binary: `tar -czf sqlfy-macos.tar.gz -C dist/sqlfy-binary sqlfy`
2. Send `sqlfy-macos.tar.gz` to your user
3. They extract and run: `tar -xzf sqlfy-macos.tar.gz && chmod +x sqlfy && ./sqlfy --help`
**Cross-platform:** Build on macOS → works on macOS. Build on Linux → works on Linux.
**Alternative (requires Python 3.11+):**
```bash
cd cli
python -m build # creates wheel
pip install dist/sqlfy-*.whl # install from wheel
```
---
## CLI Reference
SQLfy has 31 CLI subcommands covering schema reconstruction, graph visualization, impact analysis, linting, drift detection, domain analysis, RAG Q&A, and more.
See the [full command reference on the wiki](https://github.com/paulushcgcj/sqlfy.wiki/wiki/commands/) for documentation on every command, including usage, flags, and examples.
### Quick reference
| Subcommand | Description |
|---|---|
| `dump` | Output the Schema State Dictionary |
| `manifest` | Output graph manifest/metadata with high-level summary |
| `chunks` | Output LLM vector chunks |
| `diff` | Compare two Schema State Dictionaries or migration directories |
| `diff-versions` | Compare two version snapshots from the same migration set |
| `graph` | Graph representation (DOT, Mermaid, Excalidraw, Draw.io, JSON, HTML, report) |
| `graph-migrations` | Visualize migration timeline and dependency graph |
| `build-graph` | Build complete graphify-out/ directory (unified all-in-one) |
| `rollback-analysis` | Analyze migration rollback feasibility and generate rollback scripts |
| `lint` | Lint migration SQL for quality and style using sqlfluff |
| `insights` | Analyse schema and report findings (orphan tables, missing PKs, etc.) |
| `health` | Generate migration folder health report with quality score |
| `simulate` | Simulate schema evolution with hypothetical migrations |
| `integrity` | Check migration file integrity using SHA256 hashes |
| `provenance` | Collect git provenance for migration files |
| `cache` | Manage file-based caching system |
| `ask` | Ask a natural language question about the schema (RAG) |
| `chat` | Interactive multi-turn schema chat session |
| `export` | Export schema as self-contained HTML documentation |
| `query` | Deterministic graph queries (no LLM) |
| `impact` | Analyze impact of schema object changes using graph traversal |
| `lineage` | Column-level lineage and data flow analysis |
| `domains` | Detect semantic business domains using community detection |
| `stability` | Calculate schema stability metrics and churn rates |
| `validate` | Validate migration ordering and detect issues |
| `deps` | Analyze migration dependencies and detect circular dependencies |
| `drift` | Detect schema drift between migration folders and generate repair SQL |
| `classify` | Classify migrations by semantic category (table creation, data migration, cleanup, etc.) |
| `naming` | Enforce migration filename naming conventions (Flyway pattern, description format) |
| `cost` | Estimate migration execution cost (score, category, estimated_seconds) |
| `safety` | Score migrations by safety level (SAFE / MEDIUM_RISK / HIGH_RISK / DANGEROUS) |
**Common flags available on most commands:**
- `--dialect oracle|postgres|mysql|sqlite` — SQL dialect (default: `oracle`)
- `--at VERSION` — Point-in-time snapshot at a specific Flyway version
- `--out FILE` — Write output to file instead of stdout
- `--format` — Output format (varies by command)
> Use `sqlfy --help` for detailed usage.
---
## Development
### App
```bash
cd app
npm install
npm run dev # Vite dev server (browser, no Tauri)
npm run build # production Vite build
npm run lint # ESLint
npx tauri dev # Tauri desktop window (requires Rust + cargo)
npx tauri build # Tauri production bundle (.app / .exe / .deb)
```
### CLI
```bash
cd cli
pip install -e ".[dev]" # editable install + pytest
python -m pytest -v # run all tests
python -m sqlfy ./samples # run directly without installing
```
Tests read real `.sql` files from `samples/` and validate the parser, Reconstructor, and SchemaState builder end-to-end.
### PyInstaller binary (for bundling with Tauri)
```bash
cd cli
pip install pyinstaller
pyinstaller --onefile src/sqlfy/main.py --name sqlfy
# Output: dist/sqlfy — copy to app/src-tauri/binaries/sqlfy-
```
---
## How the App Uses the CLI
The desktop app (Tauri) and the browser dev mode use the CLI differently:
```
Browser dev mode:
App (TypeScript) ──▶ app/src/core/core.ts (in-process parser, no CLI)
Tauri desktop:
App (TypeScript) ──▶ app/src/bridge/cli.ts
│
├─ writes migrations to a temp JSON file: [{ filename, sql }]
├─ spawns CLI sidecar: sqlfy --json-input --all
└─ parses response JSON: { graph: {...}, chunks: [...] }
```
**Detection** — `app/src/bridge/cli.ts` checks `'__TAURI_INTERNALS__' in window` to decide which path to use.
**CLI sidecar** — configured in `app/src-tauri/tauri.conf.json` under `externalBin`. The binary must be placed at `app/src-tauri/binaries/sqlfy-` before `npx tauri build`.
**Output contract** — the CLI's `--all` flag produces:
```json
{
"graph": { "tables": {}, "sequences": {}, "edges": [], "migration_history": [] },
"chunks": [{ "id": "", "type": "", "title": "", "content": "", "metadata": {}, "hint": "" }]
}
```
The TypeScript deserialiser in `cli.ts` maps `snake_case` keys to `camelCase` for the React component layer.
---
## Features
### ① Migrations tab
- Add, edit, or remove SQL migration files directly in the browser
- Files are parsed in Flyway version order (`V1__`, `V2__`, …)
- Supports multi-file sequences with incremental schema changes
### ② Schema Graph tab
- **ERD canvas** — topology-aware layout showing table nodes and FK edges
- **Table detail panel** — per-table view of:
- Columns with data type, precision/scale, nullability, default value, and inline comment
- Constraint badges: `PK`, `NOT NULL`, `UNIQUE`, `FK`
- Outgoing and incoming FK relationships with `ON DELETE` action
- Indexes (including unique indexes) with version provenance
- Check constraints
- Migration action history per table (CREATE, ADD_COLUMN, MODIFY_COLUMN, …)
- **Sequence list** — `START WITH` / `INCREMENT BY` metadata per sequence
### ③ LLM Chunks tab
- **Schema Summary** chunk — table count, column count, FK edge count, migration history, table role classification (root / junction / leaf / standalone)
- **Per-table** chunks — full column inventory + constraint + relationship text in a structured, embedding-friendly format
- **Relationship Graph** chunk — adjacency list of all FK edges for JOIN-path planning
- One-click copy per chunk
### ④ Ask tab
- Natural-language Q&A against your schema using RAG (Retrieval-Augmented Generation)
- Choose retrieval strategy: local BM25 (no keys) or dense embeddings (requires API key)
- Shows source chunks and provenance for transparency and reproducibility
- Useful for quick schema discovery: "Which tables lack a PRIMARY KEY?", "How do orders join to customers?"
### ⑤ Schema tab
- Table explorer and compact schema panel with per-table details:
- Columns with data type, nullability, defaults, and inline comments
- Constraint, index, and FK badges with provenance
- Migration history for the selected table (CREATE / ALTER operations)
- Includes a lightweight "Run insights" action to analyse the current schema from this panel
### ⑥ Insights tab
- Dedicated schema quality analysis panel powered by the `sqlfy insights` engine:
- Health score (0–100) and grade (A–D)
- Severity filter (Error / Warning / Info), category dropdown, and keyword search
- Expandable finding cards with full detail and suggested fix or SQL
- CLI-required: runs the Python CLI (`sqlfy insights --format json`) via Tauri or the dev-server proxy
- Browser-only mode shows a clear "CLI required" message and documentation on how to enable the CLI
### ⑦ Graph Export tab
- Export the schema graph to multiple formats: Mermaid, DOT, Excalidraw, Draw.io, JSON, HTML, or a human-readable summary
- Advanced options: diagram title, layout resolution, `--no-split` subgraph behavior, and point-in-time `--at` version
- Uses the CLI sidecar in Tauri or the dev-server proxy; browser fallback produces a limited in-process Mermaid/DOT rendering
### ⑧ Simulate tab
- Test hypothetical DDL changes against the current schema without modifying any files
- Enter any SQL statement (DDL), optionally specify a base migration version (`--at`), and run a sandboxed simulation
- Results show: safety badge (✓ Safe / ✕ Unsafe), breaking-change flag, health score, schema diff stats (tables/columns/sequences/relationships added or removed), and collapsible warnings list
- Requires CLI (Tauri or Vite dev server) — not available in pure-browser mode
---
---
## Supported DDL
| Statement | Support |
|---|---|
| `CREATE TABLE` | ✅ columns, PK, FK, UNIQUE, CHECK |
| `ALTER TABLE … ADD COLUMN` | ✅ |
| `ALTER TABLE … ADD CONSTRAINT` | ✅ |
| `ALTER TABLE … DROP COLUMN` | ✅ |
| `ALTER TABLE … DROP CONSTRAINT` | ✅ |
| `ALTER TABLE … MODIFY` | ✅ type, precision/scale, default, nullability |
| `ALTER TABLE … RENAME COLUMN` | ✅ |
| `CREATE [UNIQUE] INDEX` | ✅ |
| `DROP TABLE` | ✅ |
| `DROP INDEX` | ✅ |
| `CREATE SEQUENCE` | ✅ |
| `DROP SEQUENCE` | ✅ |
| `COMMENT ON TABLE / COLUMN` | ✅ |
---
## LLM Usage
> [!IMPORTANT]
> **Vector embeddings require an API key.**
> The `ask` and `chat` subcommands support a `--embed` flag that switches from
> BM25 retrieval to dense vector search using [Voyage AI](https://voyageai.com)
> (model `voyage-3`, accessed via the Anthropic API).
> Set `ANTHROPIC_API_KEY` in your environment before using `--embed`.
> Without the flag, all retrieval is local BM25 — no key needed.
>
> TODO: evaluate whether to replace with a local embedding model (e.g. `nomic-embed-text`
> via Ollama) to remove the external dependency entirely.
Each chunk is self-contained and human-readable. Example table chunk:
```
TABLE: APP.ORDERS
Schema: APP | Created: V2
COLUMNS:
ORDER_ID: NUMBER(10) [PK, NOT NULL]
USER_ID: NUMBER(10) [NOT NULL, FK]
TOTAL_AMOUNT: NUMBER(12,2) [NOT NULL]
STATUS: VARCHAR2(20) [NOT NULL, DEFAULT PENDING]
CREATED_AT: TIMESTAMP [NOT NULL, DEFAULT SYSTIMESTAMP]
REFERENCES (outgoing FK):
USER_ID) → APP.USERS(USER_ID) ON DELETE CASCADE [FK_ORDERS_USER]
REFERENCED BY:
APP.ORDER_ITEMS.ORDER_ID → ORDER_ID
INDEXES:
IDX_ORDERS_USER: (USER_ID) [V2]
IDX_ORDERS_STATUS: (STATUS, CREATED_AT) [V2]
MIGRATION ACTIONS:
V2: CREATE TABLE APP.ORDERS
```
Paste the **Schema Summary** chunk as system context and individual **table chunks** as retrieval results for precise, grounded SQL generation.
---
## Tech Stack
| Layer | Technology |
|---|---|
| Desktop UI | React 19 + Vite + Tauri 2 |
| CLI | Python 3.11+ with sqlglot ≥25 (Oracle AST) |
| Distribution | PyInstaller binary + Tauri desktop bundle |
| Tests | pytest 9 |
---
## Roadmap
- [x] Split into `app/` (React/Vite/Tauri) and `cli/` (Python)
- [x] Shared `samples/` fixtures used by both the app and the test suite
- [x] Migrate parser to **sqlglot** for full Oracle AST fidelity
- [x] `DROP TABLE`, `DROP COLUMN`, `DROP CONSTRAINT`, `MODIFY COLUMN`, `RENAME COLUMN` support
- [x] `SchemaState` dictionary — versioned, serialisable, fingerprinted snapshot
- [x] YAML export of SchemaState (`sqlfy dump --format yaml`)
- [x] Point-in-time reconstruction via `--at`
- [x] Schema diff command (`sqlfy diff`)
- [x] Graph output command (`sqlfy graph` — DOT, Mermaid, Excalidraw, Draw.io, JSON, HTML, report)
- [x] Schema insights (`sqlfy insights` — orphan tables, missing PKs, FK candidates, circular refs, islands)
- [x] Health report (`sqlfy health` — migration quality score)
- [x] Schema simulator (`sqlfy simulate` — test what-if migrations)
- [x] Migration integrity checks (`sqlfy integrity` — SHA256 hashing)
- [x] File-based caching (`sqlfy cache`)
- [x] Natural language queries (`sqlfy ask` — single-shot RAG)
- [x] Interactive chat (`sqlfy chat` — multi-turn conversations)
- [x] HTML documentation export (`sqlfy export`)
- [x] Deterministic graph queries (`sqlfy query` — tables, columns, fk-path, refs, orphans, islands, cycles, missing-pk, indexes)
- [x] Impact analysis (`sqlfy impact` — graph traversal for change impact)
- [x] Manifest generation (`sqlfy manifest` — metadata summary)
- [x] Community detection in graph exports (NetworkX + Louvain algorithm)
- [ ] PostgreSQL dialect parity
- [ ] Vector embeddings: evaluate replacing Voyage AI (`ANTHROPIC_API_KEY`) with a local model (e.g. Ollama `nomic-embed-text`) — see LLM Usage note above
---
## License
MIT