https://github.com/mbailey/sqldown

SQLite is the new markdown. Bidirectional markdown-sqlite converter - query your docs with SQL, export results as markdown.
https://github.com/mbailey/sqldown
cli markdown python sqlite
Last synced: 3 months ago
JSON representation
SQLite is the new markdown. Bidirectional markdown-sqlite converter - query your docs with SQL, export results as markdown.
Host: GitHub
URL: https://github.com/mbailey/sqldown
Owner: mbailey
Created: 2025-11-13T11:23:07.000Z (9 months ago)
Default Branch: master
Last Pushed: 2025-11-18T09:25:16.000Z (8 months ago)
Last Synced: 2026-04-15T19:42:41.864Z (4 months ago)
Topics: cli, markdown, python, sqlite
Language: Python
Homepage:
Size: 131 KB
Stars: 3
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # ☠️ SQLDown - pre-alpha - USE AT OWN RISK

[![PyPI version](https://badge.fury.io/py/sqldown.svg)](https://pypi.org/project/sqldown/)

[![Python versions](https://img.shields.io/pypi/pyversions/sqldown.svg)](https://pypi.org/project/sqldown/)

[![CI/CD](https://github.com/mbailey/sqldown/actions/workflows/ci.yml/badge.svg)](https://github.com/mbailey/sqldown/actions)

**Bidirectional markdown ↔ SQLite conversion** - Load markdown files into SQLite, query with SQL, and export back to markdown.

## Features

☠️ **Pre-Alpha Software (v0.1.0) - USE AT YOUR OWN RISK:**

- Dynamic schema generation from YAML frontmatter and markdown structure

- Column limit protection with intelligent section extraction

- Import markdown collections into queryable SQLite databases

- Export database rows back to markdown files

- Watch mode for auto-refresh on file changes

- Gitignore-aware file filtering

- Smart change detection (skip unchanged files)

## Installation

```bash

# Install from PyPI

pip install sqldown

# Or use uv for faster installation

uv pip install sqldown

```

## Quick Start

```bash

# Load markdown files into SQLite

sqldown load ~/tasks

# Query with sqlite3

sqlite3 sqldown.db "SELECT * FROM docs WHERE status='active'"

# Export back to markdown

sqldown dump -d sqldown.db -o ~/restored

# Get database info

sqldown info

# Show table details

sqldown info -t docs

```

## Load Command

```bash

sqldown load PATH [OPTIONS]

```

**What it does:**

- Scans markdown files recursively

- Parses YAML frontmatter → database columns

- Extracts H2 sections → `section_*` columns

- Creates schema dynamically based on discovered fields

- Upserts into SQLite (idempotent - safe to run multiple times)

- Respects `.gitignore` patterns automatically

**Options:**

- `-d, --db PATH` - Database file (default: `sqldown.db`)

- `-t, --table NAME` - Table name (default: `docs`)

- `-p, --pattern GLOB` - File pattern (default: `**/*.md`)

- `-m, --max-columns N` - Maximum allowed columns (default: 1800, SQLite limit: 2000)

- `-n, --top-sections N` - Extract only top N most common sections (default: 20, 0=all)

- `-w, --watch` - Watch for file changes and auto-update

- `-v, --verbose` - Show detailed progress

## Dump Command

```bash

sqldown dump -d DATABASE -o OUTPUT_DIR [OPTIONS]

```

**What it does:**

- Exports database rows back to markdown files

- Reconstructs original markdown structure with frontmatter

- Preserves file paths from original import

- Skips unchanged files (smart change detection)

- Supports SQL filtering to export subsets

**Options:**

- `-d, --db PATH` - Database file (required)

- `-t, --table NAME` - Table name (default: `docs`)

- `-o, --output PATH` - Output directory (required)

- `-f, --filter WHERE` - SQL WHERE clause to filter rows

- `--force` - Always write files, even if unchanged

- `-n, --dry-run` - Preview what would be exported without writing

- `-v, --verbose` - Show detailed progress

**Examples:**

```bash

# Export all documents

sqldown dump -d cache.db -o ~/restored

# Export only active tasks

sqldown dump -d cache.db -t tasks -o ~/active --filter "status='active'"

# Preview export without writing files

sqldown dump -d cache.db -o ~/export --dry-run

```

## Info Command

```bash

sqldown info [OPTIONS]

```

**What it does:**

- Shows database statistics and table information

- Lists all tables with document counts

- Displays column breakdown (frontmatter vs sections)

- Provides schema details for specific tables

**Options:**

- `-d, --db PATH` - Database file (default: `sqldown.db` if exists)

- `-t, --table NAME` - Show detailed info for specific table

**Examples:**

```bash

# Show database overview (uses sqldown.db if present)

sqldown info

# Show info for specific database

sqldown info -d cache.db

# Show detailed table information

sqldown info -t tasks

```

## Using SQLite3

Once imported, use sqlite3 directly for all queries:

```bash

# List tables

sqlite3 cache.db ".tables"

# Show schema

sqlite3 cache.db ".schema tasks"

# Query

sqlite3 cache.db "SELECT title, status FROM tasks WHERE status='active' LIMIT 10"

# Aggregate

sqlite3 cache.db "SELECT status, COUNT(*) FROM tasks GROUP BY status"

# Complex queries

sqlite3 cache.db "

  SELECT project, COUNT(*) as active_count

  FROM tasks

  WHERE status='active'

  GROUP BY project

  ORDER BY active_count DESC

"

# Export to CSV

sqlite3 -csv cache.db "SELECT * FROM tasks WHERE status='active'" > active.csv

# Interactive mode

sqlite3 cache.db

```

## Dynamic Schema Example

From this markdown:

```markdown

---

status: active

project: agents

priority: high

---

# Add SQLite caching

## Objective

Create a cache layer...

## Implementation Plan

1. Parser

2. Schema

```

Creates these columns automatically:

- Core: `_id`, `_path`, `_sections`, `title`, `body`, `lead`, `file_modified`

- Frontmatter: `status`, `project`, `priority`

- Sections: `section_objective`, `section_implementation_plan`

**Real example:** 87 tasks → 181 columns (no schema design needed!)

## Column Limit Protection

SQLite has a hard limit of 2000 columns per table. With diverse markdown documents, you can easily hit this limit:

**Problem:** 5,225 tasks with diverse sections = 6,694 columns (💥 exceeds limit!)

**Solution:** Use `--top-sections` to extract only the most common sections:

```bash

# Extract only top 20 most common sections (default)

sqldown load ~/tasks -d cache.db --top-sections 20

# Result: 5,225 tasks → 116 columns ✅

# - 7 base columns (_id, _path, title, body, lead, _sections, file_modified)

# - 89 frontmatter columns (status, project, type, priority, etc.)

# - 20 section columns (overview, usage, objective, notes, next_steps, etc.)

```

**What happens to other sections?**

- All content is preserved in the `body` field

- You can still search across all sections using SQLite FTS5

- Only the top N sections become queryable columns

**Column limit validation:**

```bash

# Check if your documents will fit before importing

sqldown load ~/docs -d test.db --verbose

# Output shows breakdown:

# 📊 Column breakdown:

#   - Base columns: 7

#   - Frontmatter columns: 89

#   - Section columns: 20

#   - Total: 116 (limit: 1800)

```

**When approaching limit (>90%):**

- Shows warning but continues import

- Consider: reducing --top-sections or increasing --max-columns

**When exceeding limit:**

- Stops before import to prevent database corruption

- Shows breakdown and suggestions

## Multiple Collections

One database, multiple tables:

```bash

# Import different document types

sqldown load ~/tasks -d cache.db -t tasks

sqldown load ~/notes -d cache.db -t notes

sqldown load ~/.claude/skills -d cache.db -t skills

# Query them

sqlite3 cache.db "SELECT * FROM tasks WHERE status='active'"

sqlite3 cache.db "SELECT * FROM notes WHERE tags LIKE '%sqlite%'"

# Join across tables

sqlite3 cache.db "

  SELECT t.title as task, n.title as note

  FROM tasks t

  JOIN notes n ON n.tags LIKE '%' || t.project || '%'

  WHERE t.status='active'

"

```

## Refresh Strategy

**One-time import:**

Import is idempotent - just run it again:

```bash

# Add this to cron or a git hook

sqldown load ~/tasks -d cache.db -t tasks

```

**Watch mode (auto-refresh):**

Use the `--watch` / `-w` flag to automatically update the cache when files change:

```bash

# Watch mode: import once, then auto-update on file changes

sqldown load ~/tasks -d cache.db -t tasks --watch

# Output:

# ✅ Imported 87 documents into cache.db:tasks

# 📋 Schema has 181 columns

#

# 👀 Watching /Users/admin/tasks for changes... (Ctrl-C to stop)

# [2025-01-15 10:23:45] Updated: AG-22_feat_add-configuration/README.md

# [2025-01-15 10:24:12] Added: AG-31_feat_new-feature/README.md

```

Watch mode is ideal for development workflows where you want the cache to stay in sync with your files.

## Common Queries

```bash

# Active tasks

sqlite3 cache.db "SELECT title FROM tasks WHERE status='active'"

# Recent updates

sqlite3 cache.db "SELECT title, updated FROM tasks ORDER BY updated DESC LIMIT 10"

# By project

sqlite3 cache.db "SELECT project, COUNT(*) FROM tasks GROUP BY project"

# Search content

sqlite3 cache.db "SELECT title FROM tasks WHERE body LIKE '%cache%'"

# High priority incomplete

sqlite3 cache.db "SELECT title FROM tasks WHERE priority='high' AND status != 'completed'"

```

## Philosophy

SQLDown follows the Unix philosophy: do one thing well.

- **Load**: SQLDown handles the complex markdown → SQLite conversion

- **Query**: sqlite3 provides perfect SQL interface (no wrapper needed)

- **Dump**: SQLDown reconstructs markdown from database rows

Why not wrap sqlite3? Because it's already perfect for queries:

- Full SQL power without wrapper limitations

- Standard tool with excellent documentation

- Multiple output formats (CSV, JSON, column, etc.)

- Interactive shell with history and completion

- Zero overhead for read operations

## Requirements

- Python 3.8+ (includes sqlite3 module)

- sqlite3 CLI (built-in on macOS/Linux)

**Python Dependencies** (installed automatically):

- click >= 8.0 - CLI framework

- sqlite-utils >= 3.30 - SQLite schema management

- PyYAML >= 6.0 - YAML frontmatter parsing

- pathspec >= 0.11 - Gitignore pattern support

- watchdog >= 3.0 - File system monitoring

## Integration

SQLDown is designed for both human and AI use:

**For Developers:**

- Simple CLI with sensible defaults

- Watch mode for development workflows

- Smart change detection saves time

- Direct sqlite3 access for queries

**For AI Assistants:**

- Efficient token usage via SQL queries

- Progressive disclosure pattern

- Query metadata first, read files only when needed

- Structured data extraction from markdown

## Contributing

Contributions welcome! Please check out the [issues](https://github.com/mbailey/sqldown/issues) on GitHub.

## License

MIT

## Changelog

### v0.1.0 (2025-01-14)

**Initial PyPI Release** 🎉

- Full bidirectional markdown ↔ SQLite conversion

- Dynamic schema generation from YAML frontmatter

- Intelligent column limit protection (SQLite 2000 column limit)

- Top-N section extraction for diverse document collections

- Watch mode for automatic file sync

- Smart change detection on export

- Comprehensive CLI with `load`, `dump`, and `info` commands

- Python 3.8+ support

### Development History

See [SPECIFICATION.md](SPECIFICATION.md) for the complete design.

See [REVIEW.md](REVIEW.md) for architectural decisions and trade-offs.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mbailey/sqldown

Awesome Lists containing this project

README