https://github.com/mbailey/sqldown
SQLite is the new markdown. Bidirectional markdown-sqlite converter - query your docs with SQL, export results as markdown.
https://github.com/mbailey/sqldown
cli markdown python sqlite
Last synced: about 2 months ago
JSON representation
SQLite is the new markdown. Bidirectional markdown-sqlite converter - query your docs with SQL, export results as markdown.
- Host: GitHub
- URL: https://github.com/mbailey/sqldown
- Owner: mbailey
- Created: 2025-11-13T11:23:07.000Z (7 months ago)
- Default Branch: master
- Last Pushed: 2025-11-18T09:25:16.000Z (7 months ago)
- Last Synced: 2026-04-15T19:42:41.864Z (2 months ago)
- Topics: cli, markdown, python, sqlite
- Language: Python
- Homepage:
- Size: 131 KB
- Stars: 3
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ☠️ SQLDown - pre-alpha - USE AT OWN RISK
[](https://pypi.org/project/sqldown/)
[](https://pypi.org/project/sqldown/)
[](https://github.com/mbailey/sqldown/actions)
**Bidirectional markdown ↔ SQLite conversion** - Load markdown files into SQLite, query with SQL, and export back to markdown.
## Features
☠️ **Pre-Alpha Software (v0.1.0) - USE AT YOUR OWN RISK:**
- Dynamic schema generation from YAML frontmatter and markdown structure
- Column limit protection with intelligent section extraction
- Import markdown collections into queryable SQLite databases
- Export database rows back to markdown files
- Watch mode for auto-refresh on file changes
- Gitignore-aware file filtering
- Smart change detection (skip unchanged files)
## Installation
```bash
# Install from PyPI
pip install sqldown
# Or use uv for faster installation
uv pip install sqldown
```
## Quick Start
```bash
# Load markdown files into SQLite
sqldown load ~/tasks
# Query with sqlite3
sqlite3 sqldown.db "SELECT * FROM docs WHERE status='active'"
# Export back to markdown
sqldown dump -d sqldown.db -o ~/restored
# Get database info
sqldown info
# Show table details
sqldown info -t docs
```
## Load Command
```bash
sqldown load PATH [OPTIONS]
```
**What it does:**
- Scans markdown files recursively
- Parses YAML frontmatter → database columns
- Extracts H2 sections → `section_*` columns
- Creates schema dynamically based on discovered fields
- Upserts into SQLite (idempotent - safe to run multiple times)
- Respects `.gitignore` patterns automatically
**Options:**
- `-d, --db PATH` - Database file (default: `sqldown.db`)
- `-t, --table NAME` - Table name (default: `docs`)
- `-p, --pattern GLOB` - File pattern (default: `**/*.md`)
- `-m, --max-columns N` - Maximum allowed columns (default: 1800, SQLite limit: 2000)
- `-n, --top-sections N` - Extract only top N most common sections (default: 20, 0=all)
- `-w, --watch` - Watch for file changes and auto-update
- `-v, --verbose` - Show detailed progress
## Dump Command
```bash
sqldown dump -d DATABASE -o OUTPUT_DIR [OPTIONS]
```
**What it does:**
- Exports database rows back to markdown files
- Reconstructs original markdown structure with frontmatter
- Preserves file paths from original import
- Skips unchanged files (smart change detection)
- Supports SQL filtering to export subsets
**Options:**
- `-d, --db PATH` - Database file (required)
- `-t, --table NAME` - Table name (default: `docs`)
- `-o, --output PATH` - Output directory (required)
- `-f, --filter WHERE` - SQL WHERE clause to filter rows
- `--force` - Always write files, even if unchanged
- `-n, --dry-run` - Preview what would be exported without writing
- `-v, --verbose` - Show detailed progress
**Examples:**
```bash
# Export all documents
sqldown dump -d cache.db -o ~/restored
# Export only active tasks
sqldown dump -d cache.db -t tasks -o ~/active --filter "status='active'"
# Preview export without writing files
sqldown dump -d cache.db -o ~/export --dry-run
```
## Info Command
```bash
sqldown info [OPTIONS]
```
**What it does:**
- Shows database statistics and table information
- Lists all tables with document counts
- Displays column breakdown (frontmatter vs sections)
- Provides schema details for specific tables
**Options:**
- `-d, --db PATH` - Database file (default: `sqldown.db` if exists)
- `-t, --table NAME` - Show detailed info for specific table
**Examples:**
```bash
# Show database overview (uses sqldown.db if present)
sqldown info
# Show info for specific database
sqldown info -d cache.db
# Show detailed table information
sqldown info -t tasks
```
## Using SQLite3
Once imported, use sqlite3 directly for all queries:
```bash
# List tables
sqlite3 cache.db ".tables"
# Show schema
sqlite3 cache.db ".schema tasks"
# Query
sqlite3 cache.db "SELECT title, status FROM tasks WHERE status='active' LIMIT 10"
# Aggregate
sqlite3 cache.db "SELECT status, COUNT(*) FROM tasks GROUP BY status"
# Complex queries
sqlite3 cache.db "
SELECT project, COUNT(*) as active_count
FROM tasks
WHERE status='active'
GROUP BY project
ORDER BY active_count DESC
"
# Export to CSV
sqlite3 -csv cache.db "SELECT * FROM tasks WHERE status='active'" > active.csv
# Interactive mode
sqlite3 cache.db
```
## Dynamic Schema Example
From this markdown:
```markdown
---
status: active
project: agents
priority: high
---
# Add SQLite caching
## Objective
Create a cache layer...
## Implementation Plan
1. Parser
2. Schema
```
Creates these columns automatically:
- Core: `_id`, `_path`, `_sections`, `title`, `body`, `lead`, `file_modified`
- Frontmatter: `status`, `project`, `priority`
- Sections: `section_objective`, `section_implementation_plan`
**Real example:** 87 tasks → 181 columns (no schema design needed!)
## Column Limit Protection
SQLite has a hard limit of 2000 columns per table. With diverse markdown documents, you can easily hit this limit:
**Problem:** 5,225 tasks with diverse sections = 6,694 columns (💥 exceeds limit!)
**Solution:** Use `--top-sections` to extract only the most common sections:
```bash
# Extract only top 20 most common sections (default)
sqldown load ~/tasks -d cache.db --top-sections 20
# Result: 5,225 tasks → 116 columns ✅
# - 7 base columns (_id, _path, title, body, lead, _sections, file_modified)
# - 89 frontmatter columns (status, project, type, priority, etc.)
# - 20 section columns (overview, usage, objective, notes, next_steps, etc.)
```
**What happens to other sections?**
- All content is preserved in the `body` field
- You can still search across all sections using SQLite FTS5
- Only the top N sections become queryable columns
**Column limit validation:**
```bash
# Check if your documents will fit before importing
sqldown load ~/docs -d test.db --verbose
# Output shows breakdown:
# 📊 Column breakdown:
# - Base columns: 7
# - Frontmatter columns: 89
# - Section columns: 20
# - Total: 116 (limit: 1800)
```
**When approaching limit (>90%):**
- Shows warning but continues import
- Consider: reducing --top-sections or increasing --max-columns
**When exceeding limit:**
- Stops before import to prevent database corruption
- Shows breakdown and suggestions
## Multiple Collections
One database, multiple tables:
```bash
# Import different document types
sqldown load ~/tasks -d cache.db -t tasks
sqldown load ~/notes -d cache.db -t notes
sqldown load ~/.claude/skills -d cache.db -t skills
# Query them
sqlite3 cache.db "SELECT * FROM tasks WHERE status='active'"
sqlite3 cache.db "SELECT * FROM notes WHERE tags LIKE '%sqlite%'"
# Join across tables
sqlite3 cache.db "
SELECT t.title as task, n.title as note
FROM tasks t
JOIN notes n ON n.tags LIKE '%' || t.project || '%'
WHERE t.status='active'
"
```
## Refresh Strategy
**One-time import:**
Import is idempotent - just run it again:
```bash
# Add this to cron or a git hook
sqldown load ~/tasks -d cache.db -t tasks
```
**Watch mode (auto-refresh):**
Use the `--watch` / `-w` flag to automatically update the cache when files change:
```bash
# Watch mode: import once, then auto-update on file changes
sqldown load ~/tasks -d cache.db -t tasks --watch
# Output:
# ✅ Imported 87 documents into cache.db:tasks
# 📋 Schema has 181 columns
#
# 👀 Watching /Users/admin/tasks for changes... (Ctrl-C to stop)
# [2025-01-15 10:23:45] Updated: AG-22_feat_add-configuration/README.md
# [2025-01-15 10:24:12] Added: AG-31_feat_new-feature/README.md
```
Watch mode is ideal for development workflows where you want the cache to stay in sync with your files.
## Common Queries
```bash
# Active tasks
sqlite3 cache.db "SELECT title FROM tasks WHERE status='active'"
# Recent updates
sqlite3 cache.db "SELECT title, updated FROM tasks ORDER BY updated DESC LIMIT 10"
# By project
sqlite3 cache.db "SELECT project, COUNT(*) FROM tasks GROUP BY project"
# Search content
sqlite3 cache.db "SELECT title FROM tasks WHERE body LIKE '%cache%'"
# High priority incomplete
sqlite3 cache.db "SELECT title FROM tasks WHERE priority='high' AND status != 'completed'"
```
## Philosophy
SQLDown follows the Unix philosophy: do one thing well.
- **Load**: SQLDown handles the complex markdown → SQLite conversion
- **Query**: sqlite3 provides perfect SQL interface (no wrapper needed)
- **Dump**: SQLDown reconstructs markdown from database rows
Why not wrap sqlite3? Because it's already perfect for queries:
- Full SQL power without wrapper limitations
- Standard tool with excellent documentation
- Multiple output formats (CSV, JSON, column, etc.)
- Interactive shell with history and completion
- Zero overhead for read operations
## Requirements
- Python 3.8+ (includes sqlite3 module)
- sqlite3 CLI (built-in on macOS/Linux)
**Python Dependencies** (installed automatically):
- click >= 8.0 - CLI framework
- sqlite-utils >= 3.30 - SQLite schema management
- PyYAML >= 6.0 - YAML frontmatter parsing
- pathspec >= 0.11 - Gitignore pattern support
- watchdog >= 3.0 - File system monitoring
## Integration
SQLDown is designed for both human and AI use:
**For Developers:**
- Simple CLI with sensible defaults
- Watch mode for development workflows
- Smart change detection saves time
- Direct sqlite3 access for queries
**For AI Assistants:**
- Efficient token usage via SQL queries
- Progressive disclosure pattern
- Query metadata first, read files only when needed
- Structured data extraction from markdown
## Contributing
Contributions welcome! Please check out the [issues](https://github.com/mbailey/sqldown/issues) on GitHub.
## License
MIT
## Changelog
### v0.1.0 (2025-01-14)
**Initial PyPI Release** 🎉
- Full bidirectional markdown ↔ SQLite conversion
- Dynamic schema generation from YAML frontmatter
- Intelligent column limit protection (SQLite 2000 column limit)
- Top-N section extraction for diverse document collections
- Watch mode for automatic file sync
- Smart change detection on export
- Comprehensive CLI with `load`, `dump`, and `info` commands
- Python 3.8+ support
### Development History
See [SPECIFICATION.md](SPECIFICATION.md) for the complete design.
See [REVIEW.md](REVIEW.md) for architectural decisions and trade-offs.