https://github.com/harehare/mq-db

Markdown-specialized embedded database with interval-indexed block storage and hierarchical query support.
https://github.com/harehare/mq-db
database markdown mq
Last synced: 13 days ago
JSON representation
Markdown-specialized embedded database with interval-indexed block storage and hierarchical query support.
Host: GitHub
URL: https://github.com/harehare/mq-db
Owner: harehare
License: mit
Created: 2026-05-30T13:05:40.000Z (about 1 month ago)
Default Branch: main
Last Pushed: 2026-06-11T14:17:23.000Z (20 days ago)
Last Synced: 2026-06-11T16:12:48.913Z (20 days ago)
Topics: database, markdown, mq
Language: Rust
Homepage: https://mqlang.org
Size: 1.36 MB
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          


  

mq-db


**Markdown-specialized embedded database with interval-indexed block storage and hierarchical query support.**

[![ci](https://img.shields.io/github/actions/workflow/status/harehare/mq-db/ci.yml?logo=github-actions&label=ci)](https://github.com/harehare/mq-db/actions/workflows/ci.yml)

[![audit](https://img.shields.io/github/actions/workflow/status/harehare/mq-db/audit.yml?logo=shield&label=audit)](https://github.com/harehare/mq-db/actions/workflows/audit.yml)

[![license](https://img.shields.io/badge/license-MIT-blue)](LICENSE)

![demo](./assets/demo.gif)



`mq-db` treats Markdown documents as **structured, hierarchical databases** rather than plain text. It parses Markdown into a flat block list with an **interval index** (Nested Set / Pre-Post Order), enabling O(1) section hierarchy queries. Documents can be queried with **SQL** or **[mq](https://github.com/harehare/mq)** and persisted to a compact custom page-file format.

```mermaid

flowchart TD

    A["Markdown File(s)"] -->|"CST Parser (mq-markdown)"| B["Block Tree\n(heading · paragraph · code · list …)"]

    B -->|"Interval Index + Secondary Indexes"| C["Flat Block Vector\n(pre/post integers)"]

    C --> D["BitmapIndex\n(block_type)"]

    C --> E["BTreeIndex\n(pre / post)"]

    C --> F["HashIndex\n(content / lang / depth)"]

    C --> G["Zone Maps\n(per-document stats)"]

    C --> H["SQL Engine\n(sqlparser — custom native evaluator)"]

    C --> I["mq Engine\n(mq-lang evaluator)"]

```

> [!IMPORTANT]

> This project is under active development and the API may change.

## Features

- **Flat block storage** — every Markdown element becomes a typed `Block` with row-polymorphic properties

- **O(1) hierarchy queries** — interval index (`pre`/`post`) makes ancestor/descendant checks a single integer comparison

- **Three-layer secondary indexes** — `BitmapIndex` (block type), `BTreeIndex` (pre/post), `HashIndex` (content/lang/depth) for fast SQL predicate pushdown

- **Zone Maps** — per-document statistics skip irrelevant files before scanning any blocks

- **Dual query engines** — SQL via a custom `sqlparser`-based evaluator, and `mq` via `mq-lang`

- **DDL support** — `CREATE TABLE`, `INSERT INTO`, `DROP TABLE` for in-memory custom tables

- **`mq()` scalar function** — run an mq program against Markdown content inline in SQL

- **Custom page-file persistence** — 8 KB fixed pages, checksums, atomic writes

- **CLI + interactive REPL + TUI** — full terminal experience

## Installation

### Using the Installation Script (Recommended)

```bash

curl -fsSL https://raw.githubusercontent.com/harehare/mq-db/main/bin/install.sh | bash

```

The installer will:

- Download the latest release for your platform

- Verify the binary with SHA256 checksum

- Install to `~/.local/bin/`

- Update your shell profile (bash, zsh, or fish)

After installation, restart your terminal or run:

```bash

source ~/.bashrc  # or ~/.zshrc, or ~/.config/fish/config.fish

```

### Using Cargo

```bash

cargo install mq-db

```

### From Source

```bash

# Latest Development Version

cargo install --git https://github.com/harehare/mq-db.git

```

### Supported Platforms

- **Linux**: x86_64, aarch64

- **macOS**: x86_64 (Intel), aarch64 (Apple Silicon)

- **Windows**: x86_64

## CLI Usage

### Index Markdown files

```bash

mq-db index docs/ --recursive --output store.mq-db

mq-db index README.md DESIGN.md

mq-db index docs/ --no-spans   # omit source spans (~21 bytes/block saved)

```

```

  ✓ docs/DESIGN.md

  ✓ docs/API.md

Indexed 2 files → store.mq-db

```

### List indexed documents

```bash

mq-db list --db store.mq-db

mq-db list --db store.mq-db --format json   # also: csv, tsv, markdown, html

```

```

┌──────┬────────────────────────────────────────────────────┬────────┬──────────┐

│   ID │ Path / Title                                       │ Blocks │ Tags     │

├──────┼────────────────────────────────────────────────────┼────────┼──────────┤

│    0 │ docs/DESIGN.md                                     │    142 │          │

│    1 │ docs/API.md                                        │     87 │ api, v2  │

└──────┴────────────────────────────────────────────────────┴────────┴──────────┘

2 documents

```

### SQL queries

```bash

mq-db sql "SELECT block_type, count(*) FROM blocks GROUP BY block_type" --db store.mq-db

mq-db sql --file query.sql --db store.mq-db           # read SQL from a file

mq-db sql "SELECT ..." --db store.mq-db --format json  # also: csv, tsv, markdown, html

```

```

┌─────────────┬──────────┐

│ block_type  │ count(*) │

├─────────────┼──────────┤

│ paragraph   │ 48       │

│ heading     │ 21       │

│ code        │ 15       │

└─────────────┴──────────┘

(3 rows)

```

**Hierarchy query with `under()`** — find all content inside a specific section:

```bash

mq-db sql "

  SELECT b.block_type, b.content

  FROM blocks b

  WHERE under(b.pre, b.post,

    (SELECT pre FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'),

    (SELECT post FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'))

  ORDER BY b.pre

" --db store.mq-db

```

**`mq()` scalar function** — run an mq program against Markdown content inline:

```bash

mq-db sql "SELECT mq('.h1 | to_text', content) AS title FROM blocks WHERE block_type = 'code'" --db store.mq-db

```

### DDL — custom in-memory tables

```bash

# Create from a SELECT result

mq-db sql "CREATE TABLE headings AS SELECT content, depth FROM blocks WHERE block_type = 'heading'" --db store.mq-db

# Create with explicit schema, then insert

mq-db sql "CREATE TABLE notes (id TEXT, body TEXT)" --db store.mq-db

mq-db sql "INSERT INTO notes VALUES ('1', 'Hello world')" --db store.mq-db

# Inspect

mq-db sql "SHOW TABLES" --db store.mq-db

mq-db sql "DESC notes"  --db store.mq-db

# Drop

mq-db sql "DROP TABLE notes" --db store.mq-db

```

### mq queries

```bash

mq-db mq ".h1" --db store.mq-db

mq-db mq 'select(.code_lang == "rust")' --db store.mq-db

mq-db mq ".h1" --db store.mq-db --format markdown  # also: json, csv, tsv, html

```

### Interactive REPL

```bash

mq-db repl --db store.mq-db --mode sql

```

```

mq-db  (.help for commands  .quit to exit)

mode: sql  (.mode mq | .mode sql)

sql> SELECT content FROM blocks WHERE block_type = 'heading' LIMIT 3;

┌──────────────────┐

│ content          │

├──────────────────┤

│ Overview         │

│ Architecture     │

│ Query Engine     │

└──────────────────┘

(3 rows)

sql> .mode mq

→ mq mode

mq> .h2

## Architecture

## Query Engine

```

### HTTP server

```bash

mq-db serve --db store.mq-db              # listens on 127.0.0.1:7878

mq-db serve --db store.mq-db --port 8080  # custom port

mq-db serve --db store.mq-db --host 0.0.0.0 --port 8080

```

Three endpoints are available:

| Method | Path      | Body                   | Description                                          |

| ------ | --------- | ---------------------- | ---------------------------------------------------- |

| `GET`  | `/health` | —                      | `{"status":"ok","documents":}`                    |

| `POST` | `/sql`    | `{"query":"SELECT …"}` | Execute a SQL query, returns JSON rows               |

| `POST` | `/mq`     | `{"code":".h1"}`       | Evaluate an mq expression, returns `{"results":[…]}` |

```bash

# Health check

curl http://127.0.0.1:7878/health

# SQL via HTTP

curl -s -X POST http://127.0.0.1:7878/sql \

  -H 'Content-Type: application/json' \

  -d '{"query":"SELECT block_type, count(*) FROM blocks GROUP BY block_type"}'

# mq via HTTP

curl -s -X POST http://127.0.0.1:7878/mq \

  -H 'Content-Type: application/json' \

  -d '{"code":".h1"}'

```

### Structural linting

```bash

mq-db lint --db store.mq-db --depth 2

```

```

✗  1 violation  (H2 immediately followed by list)

  file                                      heading

  ────────────────────────────────────────  ──────────────────────────────

  docs/DESIGN.md                            "Quick Start"

```

### Statistics

```bash

mq-db stats --db store.mq-db

```

```

  Documents  5

  Blocks     632

  Block types

  ────────────────────────────────────────────────────────

   ¶  paragraph    ████████████████████░░░░   241  (38%)

   #  heading      ████████░░░░░░░░░░░░░░░░    89  (14%)

  {}  code         ███████░░░░░░░░░░░░░░░░░    73  (12%)

   •  list         ██████░░░░░░░░░░░░░░░░░░    58   (9%)

  Code languages

  ────────────────────────────────────────────────────────

  {}  rust         ████████████████████████    41  (57%)

  {}  python       ██████████░░░░░░░░░░░░░░    18  (25%)

  {}  bash         ███████░░░░░░░░░░░░░░░░░    14  (19%)

```

### Show document structure

```bash

mq-db show 0 --db store.mq-db

```

```

  docs/DESIGN.md

  title   Design Document

  blocks  142

  pre   post  type               content

  ────  ────  ────────────────   ──────────────────────────────────────────

     0   141  heading H1         Design Document

     2    55  heading H2         Architecture

     4    21  paragraph          The system is built on…

    22    37  heading H3           Query Engine

    24    36  code                   fn main() { … }

```

### TUI

```bash

mq-db tui --db store.mq-db

```

```

 mq-db  SQL  Tab:switch  i:input  j/k:nav  d/u:scroll  q:quit

┌─ Documents ──────────┬─ SQL ────────────────────────────────────────────────┐

│ DESIGN.md            │ SELECT block_type, count(*) FROM blocks GROUP BY b_  │

│   142 blocks         ├─ Results ────────────────────────────────────────────┤

│ API.md               │ ┌─────────────┬──────────┐                           │

│   87 blocks  API     │ │ block_type  │ count(*) │                           │

│ README.md            │ ├─────────────┼──────────┤                           │

│   34 blocks          │ │ paragraph   │ 48       │                           │

└──────────────────────┴──────────────────────────────────────────────────────┘

 5 docs  632 blocks  3 rows

```

**Keys:**

| Key            | Action                   |

| -------------- | ------------------------ |

| `i`            | Focus query input        |

| `Esc`          | Blur input               |

| `Enter`        | Run query                |

| `Tab`          | Toggle mq / SQL mode     |

| `j` / `k`      | Navigate document list   |

| `d` / `u`      | Scroll results down / up |

| `g` / `G`      | Jump to top / bottom     |

| `q` / `Ctrl+C` | Quit                     |

## Library API

```rust

use mq_db::{DocumentStore, SqlEngine, MqEngine, block::BlockType};

// ── Build in memory ──────────────────────────────────────────────────────────

let mut store = DocumentStore::new();

store.add_file("docs/DESIGN.md")?;

store.add_str("# Hello\n\n## Architecture\n\nDetails\n")?;

// Chainable query API — zone-map skip + interval scope + block predicates

let chunks = store.query()

    .documents(|doc| doc.zone_maps.heading_contents.contains("Architecture"))

    .under_heading("Architecture", Some(2))

    .filter(|b| matches!(b.block_type, BlockType::Paragraph | BlockType::Code))

    .blocks();

// SQL engine (custom sqlparser-based evaluator — no SQLite dependency)

let engine = SqlEngine::new(&store)?;

let out = engine.execute(

    "SELECT content FROM blocks WHERE block_type = 'heading' ORDER BY pre"

)?;

print!("{}", out.to_table());

// mq engine

let results = MqEngine::eval_store(".h1", &store)?;

// Structural lint

let violations = store.query().lint_heading_followed_by(2, &[BlockType::List]);

// ── Persist / load ───────────────────────────────────────────────────────────

store.save("store.mq-db")?;

// Full load — all blocks read into memory, indexes built on first SqlEngine use

let store = DocumentStore::load("store.mq-db")?;

// Lazy open — catalog only; call load_all_blocks() + load_all_indexes() before SQL

let mut store = DocumentStore::open("store.mq-db")?;

store.load_all_blocks()?;

store.load_all_indexes()?;

// Catalog-only — for metadata commands (list, stats) that don't need block data

let store = DocumentStore::load_catalog_only("store.mq-db")?;

```

## SQL Reference

### Virtual schema

```sql

SELECT id, path, title, tags FROM documents;

SELECT id, document_id, block_type, content, pre, post,

       depth, lang, properties FROM blocks;

```

### Built-in functions

| Function                              | Description                                |

| ------------------------------------- | ------------------------------------------ |

| `under(pre, post, anc_pre, anc_post)` | O(1) interval ancestor check               |

| `mq(program, content)`                | Run an mq program against Markdown content |

| `json_extract(json, path)`            | Extract a value from a JSON string         |

| `count(*) / min / max / sum / avg`    | Aggregate functions                        |

| `lower / upper / length / coalesce`   | Scalar utilities                           |

### DDL statements

| Statement                         | Description                                       |

| --------------------------------- | ------------------------------------------------- |

| `CREATE TABLE name AS SELECT …`   | Create a custom table from a query result         |

| `CREATE TABLE name (col TYPE, …)` | Create an empty custom table with explicit schema |

| `INSERT INTO name VALUES (…)`     | Insert a row into a custom table                  |

| `DROP TABLE name`                 | Drop a custom table                               |

| `SHOW TABLES`                     | List all custom tables                            |

| `DESC name`                       | Show schema of a custom table                     |

### Example queries

```sql

-- All text/code under a specific section (RAG extraction)

SELECT b.block_type, b.content

FROM blocks b

WHERE under(b.pre, b.post,

  (SELECT pre FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'),

  (SELECT post FROM blocks WHERE block_type = 'heading' AND content = 'Architecture'))

  AND b.block_type IN ('paragraph', 'code')

ORDER BY b.pre;

-- Extract H1 title from code block content via the mq() scalar function

SELECT mq('.h1 | to_text', content) AS title

FROM blocks

WHERE block_type = 'code' AND lang = 'markdown';

-- H2 headings immediately followed by a list (structural lint)

SELECT d.path, h.content AS heading

FROM blocks h

JOIN blocks nxt ON nxt.document_id = h.document_id AND nxt.pre = h.pre + 1

JOIN documents d ON d.id = h.document_id

WHERE h.block_type = 'heading' AND depth = 2 AND nxt.block_type = 'list';

-- Documents containing Python code

SELECT DISTINCT d.path

FROM documents d JOIN blocks b ON b.document_id = d.id

WHERE b.block_type = 'code' AND lang = 'python';

```

## Architecture

### Block model

Every Markdown element becomes a `Block`:

```rust

struct Block {

    id: u32,

    document_id: u32,

    block_type: BlockType,  // Heading, Paragraph, Code, List, …

    content: String,

    span: Option,     // line/column for editor sync

    pre: u32,               // interval index pre-order

    post: u32,              // interval index post-order

    properties: Properties, // row-polymorphic extra attributes

}

```

| Block type      | Properties                                          |

| --------------- | --------------------------------------------------- |

| `Heading`       | `{ "depth": 2, "slug": "architecture" }`            |

| `Code`          | `{ "lang": "rust", "meta": "no_run" }`              |

| `List`          | `{ "ordered": false, "level": 1, "checked": null }` |

| `Yaml` / `Toml` | parsed front-matter keys (`"title"`, `"tags"`, …)   |

### Index layers

mq-db applies three complementary index layers, cheapest-first.

```mermaid

flowchart LR

    Q["SQL Query"] --> ZM["Layer 1\nZone Maps\n(document skip)"]

    ZM -->|"relevant docs"| II["Layer 2\nInterval Index\n(section scope)"]

    II -->|"candidate blocks"| SI["Layer 3\nSecondary Indexes\n(block lookup)"]

    SI -->|"BitmapIndex\nBTreeIndex\nHashIndex"| R["Result Rows"]

    ZM -->|"skip"| X1["✗ irrelevant docs"]

    SI -->|"no hint"| FS["Full Scan"]

```

#### Layer 1 — Zone Maps (document-level skip)

Built once per document and stored in the `.mq-db` file. Checked before any block is read:

| Field               | Skips documents where…               |

| ------------------- | ------------------------------------ |

| `heading_contents`  | The requested heading text is absent |

| `code_languages`    | The requested language tag is absent |

| `max_heading_depth` | The requested depth cannot exist     |

| `tags`              | The tag filter cannot match          |

#### Layer 2 — Interval Index (section hierarchy)

Heading hierarchy encoded as `(pre, post)` pairs via Pre-Post Order (Nested Set) traversal:

```mermaid

graph TD

    doc["# Doc\npre=0 · post=11"]

    secA["## Section A\npre=2 · post=7"]

    para1["Paragraph\npre=3 · post=4"]

    code1["Code\npre=5 · post=6"]

    secB["## Section B\npre=8 · post=11"]

    para2["Paragraph\npre=9 · post=10"]

    doc --> secA

    doc --> secB

    secA --> para1

    secA --> code1

    secB --> para2

```

`A is_under B` ↔ `B.pre < A.pre AND A.post < B.post` — O(1), no tree traversal.

#### Layer 3 — Secondary Indexes (block-level fast lookup)

| Index         | Column(s)                  | Structure              | Complexity                         |

| ------------- | -------------------------- | ---------------------- | ---------------------------------- |

| `BitmapIndex` | `block_type`               | Inverted list per type | O(1) key + O(k) iterate            |

| `BTreeIndex`  | `pre`, `post`              | `BTreeMap`             | O(log n) point, O(log n + k) range |

| `HashIndex`   | `content`, `lang`, `depth` | `HashMap`              | O(1) average                       |

SQL predicate pushdown picks an `IndexHint`:

```mermaid

flowchart TD

    P["SQL WHERE predicate"]

    P -->|"block_type = '...'"| B["BitmapIndex"]

    P -->|"pre = N"| BT1["BTreeIndex (point)"]

    P -->|"pre BETWEEN N AND M"| BT2["BTreeIndex (range)"]

    P -->|"content = '...'"| H1["HashIndex"]

    P -->|"lang = '...'"| H2["HashIndex"]

    P -->|"depth = N"| H3["HashIndex"]

    P -->|"other"| FS["Full Scan"]

```

### Storage format

Custom 8 KB page file:

```mermaid

graph TD

    P0["Page 0 — File Header\nmagic 0x4D514442 · version · page count"]

    P1["Page 1 — Catalog\ndoc_id → first_block_page · num_blocks · ZoneMaps"]

    P2["Page 2+ — Block Data\nlinked page chains · overflow pages"]

    P0 --> P1 --> P2

```

Writes are atomic: data goes to `.tmp` then renamed to `` on success.

## License

[MIT](LICENSE)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/harehare/mq-db

Awesome Lists containing this project

README

mq-db