https://github.com/srobinson/markdown-matters
Structural markdown intelligence for LLMs — search, index, and summarize with 80% fewer tokens
https://github.com/srobinson/markdown-matters
ai-tools cli context-window documentation embeddings llm markdown mcp semantic-search typescript
Last synced: 3 months ago
JSON representation
Structural markdown intelligence for LLMs — search, index, and summarize with 80% fewer tokens
- Host: GitHub
- URL: https://github.com/srobinson/markdown-matters
- Owner: srobinson
- Created: 2026-02-21T13:47:57.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-03-17T12:29:42.000Z (3 months ago)
- Last Synced: 2026-03-17T23:53:04.107Z (3 months ago)
- Topics: ai-tools, cli, context-window, documentation, embeddings, llm, markdown, mcp, semantic-search, typescript
- Language: TypeScript
- Size: 4.25 MB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
Awesome Lists containing this project
README
# markdown-matters
**Give LLMs exactly the markdown they need. Nothing more.**
```bash
QUICK REFERENCE
mdm init [options] Initialize mdm in a directory
mdm index [path] [options] Index markdown files (add --embed for semantic search)
mdm search [options] Search by meaning or structure
mdm context Get LLM-ready summary
mdm tree [path] Show files or document outline
mdm config Configuration management (init, show, check)
mdm duplicates [path] Find duplicate content
mdm embeddings Manage embedding namespaces
mdm links Outgoing links
mdm backlinks Incoming links
mdm stats [path] Index statistics
```
---
## Why?
Your documentation is 50K tokens of markdown. LLM context windows are limited. Raw markdown dumps waste tokens on structure, headers, and noise.
mdm extracts *structure* instead of dumping *text*. The result: **80%+ fewer tokens** while preserving everything needed to understand your docs.
```bash
npm install -g markdown-matters
mdm index . # Index your docs
mdm search "authentication" # Find by meaning
mdm context README.md # Get LLM-ready summary
```
---
## Installation
```bash
npm install -g markdown-matters
```
Requires Node.js 18+. Semantic search requires an embedding provider (OpenAI, Ollama, LM Studio, OpenRouter, or Voyage). See [docs/CONFIG.md](./docs/CONFIG.md#embedding-providers) for provider setup.
---
## Commands
### init
Initialize mdm in a directory. Supports both local project setup and global shared indexing.
```bash
mdm init # Interactive setup (prompts for local or global)
mdm init --local # Initialize locally (.mdm/ in current directory)
mdm init --global # Initialize globally (~/.mdm/)
mdm init --yes # Accept all defaults without prompting
```
Local setup creates `.mdm/` and `.mdm.toml` in your project. Global setup creates `~/.mdm/` with source registration for multi-project indexing.
Config resolution: Local `.mdm.toml` takes precedence over `~/.mdm/.mdm.toml`, which falls back to built-in defaults.
### index
Index markdown files for fast searching.
```bash
mdm index # Index current directory (prompts for semantic)
mdm index ./docs # Index specific path
mdm index --embed # Build embeddings for semantic search
mdm index --no-embed # Skip the semantic search prompt
mdm index --watch # Watch for changes and re-index automatically
mdm index --force # Bypass cache, re-process all files
mdm index --all # Index all registered global sources from ~/.mdm/.mdm.toml
mdm index --exclude "*.draft.md,research/**" # Exclude patterns (comma-separated)
mdm index --no-gitignore # Ignore .gitignore file
```
By default, mdm respects `.gitignore` and `.mdmignore` patterns. Use `--exclude` to add CLI-level patterns (highest priority).
### search
Search by meaning (semantic) or keyword (text match).
```bash
mdm search "how to authenticate" # Semantic search (if embeddings exist)
mdm search -k "auth.*flow" # Keyword search (text match)
mdm search -n 5 "setup" # Limit to 5 results
mdm search --threshold 0.25 "deploy" # Lower threshold for more results
```
#### Similarity Threshold
Semantic search filters results by similarity score (0-1). Default: **0.35** (35%).
- **0 results?** Content may exist below the threshold. Try `--threshold 0.25`
- **Typical scores**: Single-word queries score ~30-40%, multi-word phrases ~50-70%
- **Higher threshold** = stricter matching, fewer results
- **Lower threshold** = more results, possibly less relevant
```bash
mdm search "authentication" # Uses default 0.35 threshold
mdm search --threshold 0.25 "auth" # Lower threshold for broad queries
mdm search --threshold 0.6 "specific" # Higher threshold for precision
```
#### Context Lines
Show surrounding lines around matches (like grep):
```bash
mdm search "checkpoint" -C 3 # 3 lines before AND after each match
mdm search "error" -B 2 -A 5 # 2 lines before, 5 lines after
```
Auto-detection: Uses semantic search if embeddings exist and query looks like natural language. Use `-k` to force keyword search.
#### Advanced Search
**Quality Modes** - Control speed vs. accuracy tradeoff:
```bash
mdm search "query" --quality fast # 40% faster, good recall
mdm search "query" -q thorough # Best recall, 30% slower
```
**Re-ranking** - Boost precision by 20-35%:
```bash
mdm search "query" --rerank # First use downloads 90MB model
npm install @huggingface/transformers # Required dependency
```
**HyDE** - Better results for complex questions:
```bash
mdm search "how to implement auth" --hyde # Expands query semantically
```
#### AI Summarization
Generate AI-powered summaries of search results:
```bash
mdm search "authentication" --summarize # Get AI summary of results
mdm search "error handling" -s --yes # Skip cost confirmation
mdm search "database" -s --stream # Stream output in real-time
```
Uses your existing AI subscription (Claude Code, Copilot CLI) for free, or pay-per-use API providers. See [AI Summarization](#ai-summarization) for setup.
### context
Get LLM-ready summaries from one or more files.
```bash
mdm context README.md # Single file
mdm context README.md docs/api.md # Multiple files
mdm context docs/*.md # Glob patterns work
mdm context -t 500 README.md # Token budget
mdm context --brief README.md # Minimal output
mdm context --full README.md # Include full content
```
#### Section Filtering
Extract specific sections instead of entire files:
```bash
mdm context doc.md --sections # List available sections
mdm context doc.md --section "Setup" # Extract by section name
mdm context doc.md --section "2.1" # Extract by section number
mdm context doc.md --section "API*" # Glob pattern matching
mdm context doc.md --section "Config" --shallow # Top-level only (no nested subsections)
```
The `--sections` flag shows all sections with their numbers and token counts, helping you target exactly what you need.
### tree
Show file structure or document outline.
```bash
mdm tree # List markdown files in current directory
mdm tree ./docs # List files in specific directory
mdm tree README.md # Show document outline (heading hierarchy)
```
Auto-detection: Directory shows file list, file shows document outline.
### links / backlinks
Analyze link relationships.
```bash
mdm links README.md # What does this file link to?
mdm backlinks docs/api.md # What files link to this?
```
### stats
Show index statistics.
```bash
mdm stats # Current directory
mdm stats ./docs # Specific path
```
### duplicates
Detect duplicate content in markdown files.
```bash
mdm duplicates # Find duplicates in current directory
mdm duplicates docs/ # Find duplicates in specific directory
mdm duplicates --min-length 100 # Only flag sections over 100 characters
mdm duplicates -p "docs/**" # Filter by path pattern
```
### embeddings
Manage embedding providers and namespaces.
```bash
mdm embeddings list # List all embedding namespaces
mdm embeddings current # Show active namespace
mdm embeddings switch openai # Switch to OpenAI embeddings
mdm embeddings remove ollama # Remove Ollama embeddings
mdm embeddings remove openai -f # Force remove active namespace
```
Namespaces store embeddings separately by provider/model. Switching is instant without rebuild.
---
## Workflows
### Before Adding Context to LLM
```bash
mdm tree docs/ # See what's available
mdm tree docs/api.md # Check document structure
mdm context -t 500 docs/api.md # Get summary within token budget
```
### Finding Documentation
```bash
mdm search "authentication" # By meaning
mdm search -k "Setup|Install" # By keyword pattern
```
### Setting Up Semantic Search
mdm supports multiple embedding providers for semantic search:
- **OpenAI** (default) - Cloud-based, requires API key
- **Ollama** - Free, local, daemon-based
- **LM Studio** - Free, local, GUI-based (development only)
- **OpenRouter** - Multi-provider gateway
- **Voyage** - Premium quality, competitive pricing
Quick start with OpenAI:
```bash
export OPENAI_API_KEY=sk-...
mdm index --embed # Build embeddings
mdm search "how to deploy" # Now works semantically
```
Using Ollama (free, local):
```bash
ollama serve && ollama pull nomic-embed-text
mdm index --embed --provider ollama --provider-model nomic-embed-text
```
See [docs/CONFIG.md](./docs/CONFIG.md#embedding-providers) for complete provider setup, comparison, and configuration options.
---
## MCP Integration
For Claude Desktop, add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
```json
{
"mcpServers": {
"mdm": {
"command": "mdm-mcp",
"args": []
}
}
}
```
For Claude Code, add to `.claude/settings.json`:
```json
{
"mcpServers": {
"mdm": {
"command": "mdm-mcp",
"args": []
}
}
}
```
### MCP Tools
| Tool | Description |
|------|-------------|
| `md_search` | Semantic search by meaning; returns relevant sections |
| `md_context` | Token-compressed file summaries at `brief`, `summary`, or `full` detail |
| `md_structure` | Heading hierarchy with token counts |
| `md_keyword_search` | Structural search by heading, code, list, or table presence |
| `md_index` | Build or rebuild the index |
| `md_links` | Outgoing links from a file |
| `md_backlinks` | Incoming links to a file |
---
## Configuration
mdm supports a layered configuration system for persistent settings:
```bash
# Create a config file
mdm config init
# Check your configuration
mdm config check
# Customize settings in .mdm.toml
```
```toml
# .mdm.toml
[index]
maxDepth = 10
excludePatterns = ["node_modules", ".git", "dist", "build"]
[search]
defaultLimit = 20
minSimilarity = 0.35
```
Configuration precedence: CLI flags > Environment variables > Config file > Defaults
**See [docs/CONFIG.md](./docs/CONFIG.md) for the complete configuration reference.**
### Index Location
Indexes are stored in `.mdm/` in your project root:
```
.mdm/
indexes/
documents.json # Document metadata
sections.json # Section index
links.json # Link graph
vectors.bin # Embeddings (if enabled)
```
### Environment Variables
| Variable | Description |
|----------|-------------|
| `OPENAI_API_KEY` | Required for OpenAI semantic search (default provider) |
| `OPENROUTER_API_KEY` | Required for OpenRouter semantic search |
| `MDM_*` | Configuration overrides (see [CONFIG.md](./docs/CONFIG.md)) |
---
## AI Summarization
Transform search results into actionable insights using AI.
### Quick Start
```bash
# Basic usage (auto-detects installed CLI tools)
mdm search "authentication" --summarize
# Skip confirmation for scripts
mdm search "error handling" --summarize --yes
# Stream output in real-time
mdm search "database" --summarize --stream
```
### First-Time Setup
On first use, mdm auto-detects available providers:
```
Using claude (subscription - FREE)
--- AI Summary ---
Based on the search results, here are the key findings...
```
### Providers
**CLI Providers (FREE with subscription):**
| Provider | Command | Subscription Required |
|----------|---------|----------------------|
| Claude Code | `claude` | Claude Pro/Team |
| GitHub Copilot | `copilot` | Copilot subscription |
| OpenCode | `opencode` | BYOK (any provider) |
**API Providers (pay-per-use):**
| Provider | Cost per 1M tokens | Notes |
|----------|-------------------|-------|
| DeepSeek | $0.14-0.56 | Ultra-cheap |
| Qwen | $0.03-0.12 | Budget option |
| Google Gemini | $0.30-2.50 | Balanced |
| OpenAI GPT | $1.75-14.00 | Premium |
| Anthropic Claude | $3.00-15.00 | Premium |
### Configuration
**Option 1: Auto-detection (recommended)**
Just run `--summarize` - mdm finds installed CLI tools automatically.
**Option 2: Config file**
```toml
# .mdm.toml
[aiSummarization]
mode = "cli" # 'cli' (free) or 'api' (paid)
provider = "claude" # Provider name
```
**Option 3: Environment variables**
```bash
export MDM_AISUMMARIZATION_MODE=api
export MDM_AISUMMARIZATION_PROVIDER=deepseek
export DEEPSEEK_API_KEY=sk-...
```
### CLI Flags
| Flag | Short | Description |
|------|-------|-------------|
| `--summarize` | `-s` | Enable AI summarization |
| `--yes` | `-y` | Skip cost confirmation |
| `--stream` | | Stream output in real-time |
### Cost Transparency
API providers show cost estimates before proceeding:
```
Cost Estimate:
Provider: deepseek
Input tokens: ~2,500
Output tokens: ~500
Estimated cost: $0.0007
Continue with summarization? [Y/n]:
```
CLI providers show free status:
```
Using claude (subscription - FREE)
```
See [docs/summarization.md](./docs/summarization.md) for architecture details and troubleshooting.
---
## Performance
| Metric | Raw Markdown | mdm | Savings |
|--------|--------------|---------|---------|
| Context for single doc | 2,500 tokens | 400 tokens | **84%** |
| Context for 10 docs | 25,000 tokens | 4,000 tokens | **84%** |
| Search latency | N/A | <100ms | - |
---
## License
MIT