https://github.com/daihuynh/lark-dbml
DBML parser using LARK provides structured output, Pydantic models, and supports multiple converters, e.g., SQL with SQLGlot
https://github.com/daihuynh/lark-dbml
dbml lark lark-parser pydantic-v2 sqlglot
Last synced: about 1 month ago
JSON representation
DBML parser using LARK provides structured output, Pydantic models, and supports multiple converters, e.g., SQL with SQLGlot
- Host: GitHub
- URL: https://github.com/daihuynh/lark-dbml
- Owner: daihuynh
- License: mit
- Created: 2025-06-23T07:24:22.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2026-01-01T23:09:43.000Z (about 1 month ago)
- Last Synced: 2026-01-02T09:36:34.824Z (about 1 month ago)
- Topics: dbml, lark, lark-parser, pydantic-v2, sqlglot
- Language: Python
- Homepage:
- Size: 308 KB
- Stars: 6
- Watchers: 0
- Forks: 2
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://pypi.org/project/lark-dbml/)
[](https://github.com/daihuynh/lark-dbml)
[](https://codecov.io/gh/daihuynh/lark-dbml)
[](https://github.com/astral-sh/ruff)
[](https://pepy.tech/projects/lark-dbml)
# Lark-DBML
* [Features](#features)
* [Milestones](#milestones)
* [Installation](#installation)
* [Usage](#usage)
* [Output Structure](#output-structure)
* [Parser](#parser)
* [Load](#load-dbml)
* [Dump](#dump-dbml)
* [Converters](#converters)
* [SQL](#sql)
* [Data Contract](#data-contract)
* [Development](#development)
* [License](#license)
A Python parser for [Database Markup Language (DBML)](https://dbml.dbdiagram.io) built with the powerful LARK parsing toolkit, utilizing the Earley algorithm for robust and flexible parsing.
## Features
* **High-Performance:** `lark-dbml` supports both the Earley and LALR(1) algorithms. The LALR(1) algorithm in `lark` has the best performance compared to Parsimonious, PyParsing, and ANTLR, according to [lark](https://github.com/lark-parser/lark).
* **Standalone Mode:** the package does not require `lark` package by default. The whole parsing is packed in a single Python file generated from the EBNF grammar file.
* **Standalone Mode:** fully support [DBML latest specification - April 2025](https://docs.dbdiagram.io/release-notes/).
* **Pydantic Validation:** Ensures the parsed DBML data conforms to a well-defined structure using Pydantic 2.11, providing reliable data integrity.
* **Structured Output:** Generates Python objects representing your DBML diagram, making it easy to programmatically access and manipulate your database schema.
* **Future-Proof:** the parser accepts any properties or settings that are not defined in the DBML spec.
* **Powerful Conversion & Tooling**:
* **DBML Round-Trip**: The package supports full round-trip conversion, allowing to parse DBML, programmatically manipulate the Pydantic models, and then generate the DBML back out.
* **SQL**: Convert between DBML and SQL (DDL) using [sqlglot](https://github.com/tobymao/sqlglot). Supports both `to_sql` and `from_sql`.
* **Mermaid**: Convert DBML to Mermaid.js ERD diagrams.
* **Data Contract**: Transform your DBML models into [data contract specification](https://datacontract.com).
## Milestones
- [x] DBML Parser - Earley
- [x] SQL Converter (Round-trip)
- [x] DBML Converter
- [x] Data Contract Converter
- [x] Mermaid Converter
- [x] Optimised DBML Parser - LALR(1)
- [ ] CLI - TBD
- [ ] Generate DBML from a database connection string - TBD
## Installation
You can install lark-dbml using pip:
```bash
pip install lark-dbml
```
To use `lark` mode when `standalone_mode` is set as False in the `load` function
```bash
pip install lark-dbml
```
To use SQL converter
```bash
pip install "lark-dbml[sql]"
```
## Usage
### Output Structure
Diagram - a Pydantic model - defines the expected structure of the parsed DBML content, ensuring consistency and type safety.
```python
class Diagram(BaseModel):
project: Project
enums: list[Enum] | None = []
table_groups: list[TableGroup] | None = []
sticky_notes: list[Note] | None = []
references: list[Reference] | None = []
tables: list[Table] | None = []
table_partials : list[TablePartial] | None = []
```
### Parser
lark-dbml uses the same API as other parser packages in Python. The default option is `standalone` mode with `LALR(1)` algorithm. Beside default parameters, `load` and `loads` accept any options used by the Lark parser, which can be found at this [link](https://github.com/lark-parser/lark/blob/d1a456dd365603bbcb4b5b4ec2c29e6096b82f59/lark/lark.py#L47)
#### Load DBML
```python
from lark_dbml import load, loads
# 1. Read from a string
dbml = """
Project "My Database" {
database_type: 'PostgreSQL'
Note: "This is a sample database"
}
Table "users" {
id int [pk, increment]
username varchar [unique, not null]
email varchar [unique]
created_at timestamp [default: `now()`]
}
Table "posts" {
id int [pk, increment]
title varchar
content text
user_id int
}
Ref: posts.user_id > users.id
"""
# Default option
diagram = loads(dbml)
# Change to Lark mode
diagram = loads(dbml, standalone_mode=False)
# Switch to Earley algorithm
diagram = loads(dbml, parser="earley")
# 2. Read from a file
diagram = load('example.dbml')
```
The parser can read any settings or properties in DBML objects even if the spec doesn't define them.
```python
diagram = loads("""
Table myTable [newkey: 'random_value'] {
id int [pk]
}
""")
```
```
>>> diagram.tables[0].settings
TableSettings(note=None, header_color=None, newkey='random_value')
```
#### Dump DBML
```python
from lark_dbml import dump, dumps
from lark_dbml.converter.dbml import DBMLConverterSettings
from lark_dbml.schema import (
Column,
ColumnSettings,
DataType,
Diagram,
Table,
TableSettings
)
diagram = Diagram(
tables=[
Table(
name="body",
alias="full_table",
note="Incorporated with header and footer",
settings=TableSettings(
headercolor="#3498DB",
note="header note",
partitioned_by="id"
),
columns=[
Column(
name="id",
data_type=DataType(sql_type="int"),
settings=ColumnSettings(
is_primary_key=True, note="why is id behind name?"
),
),
Column(
name="audit_date",
data_type=DataType(sql_type="timestamp"),
settings=ColumnSettings(default="`getdate()`"),
),
],
)
]
)
# This converts the diagram to DBML,
# but partitioned_by will not be included
dumps(diagram)
# This includes partitioned_by in the output
dumps(diagram,
settings=DBMLConverterSettings(
allow_extra=True
)
)
# Write the DBML to file
dump(diagram, 'diagram.dbml')
```
### Converters
#### SQL
SQL conversion is backed by **sqlglot** package. The underlying code converts the output Pydantic model to **sqlglot**'s AST Expression. Using **sqlglot** helps transpilation to any SQL dialect easily.
**NOTE THAT**: the output SQL is not guaranteed to be perfect or completely functional due to differences between dialects. If you find any issue, please create a new issue in Github :)
```python
from lark_dbml import load
from lark_dbml.converter import to_sql
from sqlglot import Dialects
# Load DBML diagram
diagram = load("diagram.dbml")
# Convert to SQL for PostgreSQL
sql = to_sql(diagram, Dialects.POSTGRES)
```
#### Data Contract
Convert DBML diagram to a [Data Contract](https://datacontract.com) spec file. The basic usage just convert tables and columns to "model" and "definition" sections. However, `lark-dbml` supports settings to extract more information from a DBML - please expand Advanced Usage.
Basic example
```python
from lark_dbml import load
from lark_dbml.converter import to_data_contract
# Load DBML diagram
diagram = load("diagram.dbml")
# Convert to SQL for PostgreSQL
data_contract = to_data_contract(diagram)
```
Advanced Usage
You can leverage Sticky Notes in DBML to store information about "terms" and "servers" in JSON or YAML format. Then, you can set `note_as_fields` in the settings to parse and include those information in the generated contract. Here is the example
```python
import json
from lark_dbml import load
from lark_dbml.converter import to_data_contract
from lark_dbml.converter.datacontract import DataContractConverterSettings
# complex_datacontract.dbml inside the exmaples folder in this repo
diagram = load('examples/complex_datacontract.dbml')
# project_as_info: properties in Project are put into "info"
# note_as_description: note in Table Settings is treated as model description.
# note_as_fields: inline note in Table is parsed and extends the corresponding model's properties.
# deserialization_func is required once note_as_fields is set. This is the function to parse the content inside an inline note. In this example, it's JSON
data_contract = to_data_contract(diagram=diagram,
settings=DataContractConverterSettings(
project_as_info=True,
note_as_description=True,
note_as_fields=True,
deserialization_func=json.loads
))
```
## Development
Contributions are welcome! Please feel free to open issues or submit pull requests.
## License
This project is licensed under the MIT License - see the LICENSE file for details.