{"id":48008459,"url":"https://github.com/srobinson/markdown-matters","last_synced_at":"2026-04-04T13:26:49.744Z","repository":{"id":344148417,"uuid":"1163347783","full_name":"srobinson/markdown-matters","owner":"srobinson","description":"Structural markdown intelligence for LLMs — search, index, and summarize with 80% fewer tokens","archived":false,"fork":false,"pushed_at":"2026-03-17T12:29:42.000Z","size":4455,"stargazers_count":2,"open_issues_count":6,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-17T23:53:04.107Z","etag":null,"topics":["ai-tools","cli","context-window","documentation","embeddings","llm","markdown","mcp","semantic-search","typescript"],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/srobinson.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-21T13:47:57.000Z","updated_at":"2026-03-17T12:29:46.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/srobinson/markdown-matters","commit_stats":null,"previous_names":["srobinson/mdcontext","srobinson/markdown-matters"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/srobinson/markdown-matters","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srobinson%2Fmarkdown-matters","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srobinson%2Fmarkdown-matters/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srobinson%2Fmarkdown-matters/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srobinson%2Fmarkdown-matters/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/srobinson","download_url":"https://codeload.github.com/srobinson/markdown-matters/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srobinson%2Fmarkdown-matters/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31402263,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-tools","cli","context-window","documentation","embeddings","llm","markdown","mcp","semantic-search","typescript"],"created_at":"2026-04-04T13:26:49.557Z","updated_at":"2026-04-04T13:26:49.712Z","avatar_url":"https://github.com/srobinson.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# markdown-matters\n\n**Give LLMs exactly the markdown they need. Nothing more.**\n\n```bash\nQUICK REFERENCE\n  mdm init [options]              Initialize mdm in a directory\n  mdm index [path] [options]      Index markdown files (add --embed for semantic search)\n  mdm search \u003cquery\u003e [options]    Search by meaning or structure\n  mdm context \u003cfiles...\u003e          Get LLM-ready summary\n  mdm tree [path]                 Show files or document outline\n  mdm config \u003ccommand\u003e            Configuration management (init, show, check)\n  mdm duplicates [path]           Find duplicate content\n  mdm embeddings \u003ccommand\u003e        Manage embedding namespaces\n  mdm links \u003cfile\u003e                Outgoing links\n  mdm backlinks \u003cfile\u003e            Incoming links\n  mdm stats [path]                Index statistics\n```\n\n---\n\n## Why?\n\nYour documentation is 50K tokens of markdown. LLM context windows are limited. Raw markdown dumps waste tokens on structure, headers, and noise.\n\nmdm extracts *structure* instead of dumping *text*. The result: **80%+ fewer tokens** while preserving everything needed to understand your docs.\n\n```bash\nnpm install -g markdown-matters\nmdm index .                     # Index your docs\nmdm search \"authentication\"     # Find by meaning\nmdm context README.md           # Get LLM-ready summary\n```\n\n---\n\n## Installation\n\n```bash\nnpm install -g markdown-matters\n```\n\nRequires Node.js 18+. Semantic search requires an embedding provider (OpenAI, Ollama, LM Studio, OpenRouter, or Voyage). See [docs/CONFIG.md](./docs/CONFIG.md#embedding-providers) for provider setup.\n\n---\n\n## Commands\n\n### init\n\nInitialize mdm in a directory. Supports both local project setup and global shared indexing.\n\n```bash\nmdm init                        # Interactive setup (prompts for local or global)\nmdm init --local                # Initialize locally (.mdm/ in current directory)\nmdm init --global               # Initialize globally (~/.mdm/)\nmdm init --yes                  # Accept all defaults without prompting\n```\n\nLocal setup creates `.mdm/` and `.mdm.toml` in your project. Global setup creates `~/.mdm/` with source registration for multi-project indexing.\n\nConfig resolution: Local `.mdm.toml` takes precedence over `~/.mdm/.mdm.toml`, which falls back to built-in defaults.\n\n### index\n\nIndex markdown files for fast searching.\n\n```bash\nmdm index                       # Index current directory (prompts for semantic)\nmdm index ./docs                # Index specific path\nmdm index --embed               # Build embeddings for semantic search\nmdm index --no-embed            # Skip the semantic search prompt\nmdm index --watch               # Watch for changes and re-index automatically\nmdm index --force               # Bypass cache, re-process all files\nmdm index --all                 # Index all registered global sources from ~/.mdm/.mdm.toml\nmdm index --exclude \"*.draft.md,research/**\"  # Exclude patterns (comma-separated)\nmdm index --no-gitignore        # Ignore .gitignore file\n```\n\nBy default, mdm respects `.gitignore` and `.mdmignore` patterns. Use `--exclude` to add CLI-level patterns (highest priority).\n\n### search\n\nSearch by meaning (semantic) or keyword (text match).\n\n```bash\nmdm search \"how to authenticate\"        # Semantic search (if embeddings exist)\nmdm search -k \"auth.*flow\"              # Keyword search (text match)\nmdm search -n 5 \"setup\"                 # Limit to 5 results\nmdm search --threshold 0.25 \"deploy\"    # Lower threshold for more results\n```\n\n#### Similarity Threshold\n\nSemantic search filters results by similarity score (0-1). Default: **0.35** (35%).\n\n- **0 results?** Content may exist below the threshold. Try `--threshold 0.25`\n- **Typical scores**: Single-word queries score ~30-40%, multi-word phrases ~50-70%\n- **Higher threshold** = stricter matching, fewer results\n- **Lower threshold** = more results, possibly less relevant\n\n```bash\nmdm search \"authentication\"              # Uses default 0.35 threshold\nmdm search --threshold 0.25 \"auth\"       # Lower threshold for broad queries\nmdm search --threshold 0.6 \"specific\"    # Higher threshold for precision\n```\n\n#### Context Lines\n\nShow surrounding lines around matches (like grep):\n\n```bash\nmdm search \"checkpoint\" -C 3            # 3 lines before AND after each match\nmdm search \"error\" -B 2 -A 5            # 2 lines before, 5 lines after\n```\n\nAuto-detection: Uses semantic search if embeddings exist and query looks like natural language. Use `-k` to force keyword search.\n\n#### Advanced Search\n\n**Quality Modes** - Control speed vs. accuracy tradeoff:\n```bash\nmdm search \"query\" --quality fast       # 40% faster, good recall\nmdm search \"query\" -q thorough          # Best recall, 30% slower\n```\n\n**Re-ranking** - Boost precision by 20-35%:\n```bash\nmdm search \"query\" --rerank             # First use downloads 90MB model\nnpm install @huggingface/transformers         # Required dependency\n```\n\n**HyDE** - Better results for complex questions:\n```bash\nmdm search \"how to implement auth\" --hyde   # Expands query semantically\n```\n\n#### AI Summarization\n\nGenerate AI-powered summaries of search results:\n\n```bash\nmdm search \"authentication\" --summarize     # Get AI summary of results\nmdm search \"error handling\" -s --yes        # Skip cost confirmation\nmdm search \"database\" -s --stream           # Stream output in real-time\n```\n\nUses your existing AI subscription (Claude Code, Copilot CLI) for free, or pay-per-use API providers. See [AI Summarization](#ai-summarization) for setup.\n\n### context\n\nGet LLM-ready summaries from one or more files.\n\n```bash\nmdm context README.md                   # Single file\nmdm context README.md docs/api.md       # Multiple files\nmdm context docs/*.md                   # Glob patterns work\nmdm context -t 500 README.md            # Token budget\nmdm context --brief README.md           # Minimal output\nmdm context --full README.md            # Include full content\n```\n\n#### Section Filtering\n\nExtract specific sections instead of entire files:\n\n```bash\nmdm context doc.md --sections           # List available sections\nmdm context doc.md --section \"Setup\"    # Extract by section name\nmdm context doc.md --section \"2.1\"      # Extract by section number\nmdm context doc.md --section \"API*\"     # Glob pattern matching\nmdm context doc.md --section \"Config\" --shallow  # Top-level only (no nested subsections)\n```\n\nThe `--sections` flag shows all sections with their numbers and token counts, helping you target exactly what you need.\n\n### tree\n\nShow file structure or document outline.\n\n```bash\nmdm tree                        # List markdown files in current directory\nmdm tree ./docs                 # List files in specific directory\nmdm tree README.md              # Show document outline (heading hierarchy)\n```\n\nAuto-detection: Directory shows file list, file shows document outline.\n\n### links / backlinks\n\nAnalyze link relationships.\n\n```bash\nmdm links README.md             # What does this file link to?\nmdm backlinks docs/api.md       # What files link to this?\n```\n\n### stats\n\nShow index statistics.\n\n```bash\nmdm stats                       # Current directory\nmdm stats ./docs                # Specific path\n```\n\n### duplicates\n\nDetect duplicate content in markdown files.\n\n```bash\nmdm duplicates                  # Find duplicates in current directory\nmdm duplicates docs/            # Find duplicates in specific directory\nmdm duplicates --min-length 100 # Only flag sections over 100 characters\nmdm duplicates -p \"docs/**\"     # Filter by path pattern\n```\n\n### embeddings\n\nManage embedding providers and namespaces.\n\n```bash\nmdm embeddings list             # List all embedding namespaces\nmdm embeddings current          # Show active namespace\nmdm embeddings switch openai    # Switch to OpenAI embeddings\nmdm embeddings remove ollama    # Remove Ollama embeddings\nmdm embeddings remove openai -f # Force remove active namespace\n```\n\nNamespaces store embeddings separately by provider/model. Switching is instant without rebuild.\n\n---\n\n## Workflows\n\n### Before Adding Context to LLM\n\n```bash\nmdm tree docs/                          # See what's available\nmdm tree docs/api.md                    # Check document structure\nmdm context -t 500 docs/api.md          # Get summary within token budget\n```\n\n### Finding Documentation\n\n```bash\nmdm search \"authentication\"             # By meaning\nmdm search -k \"Setup|Install\"           # By keyword pattern\n```\n\n### Setting Up Semantic Search\n\nmdm supports multiple embedding providers for semantic search:\n\n- **OpenAI** (default) - Cloud-based, requires API key\n- **Ollama** - Free, local, daemon-based\n- **LM Studio** - Free, local, GUI-based (development only)\n- **OpenRouter** - Multi-provider gateway\n- **Voyage** - Premium quality, competitive pricing\n\nQuick start with OpenAI:\n```bash\nexport OPENAI_API_KEY=sk-...\nmdm index --embed                       # Build embeddings\nmdm search \"how to deploy\"              # Now works semantically\n```\n\nUsing Ollama (free, local):\n```bash\nollama serve \u0026\u0026 ollama pull nomic-embed-text\nmdm index --embed --provider ollama --provider-model nomic-embed-text\n```\n\nSee [docs/CONFIG.md](./docs/CONFIG.md#embedding-providers) for complete provider setup, comparison, and configuration options.\n\n---\n\n## MCP Integration\n\nFor Claude Desktop, add to `~/Library/Application Support/Claude/claude_desktop_config.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"mdm\": {\n      \"command\": \"mdm-mcp\",\n      \"args\": []\n    }\n  }\n}\n```\n\nFor Claude Code, add to `.claude/settings.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"mdm\": {\n      \"command\": \"mdm-mcp\",\n      \"args\": []\n    }\n  }\n}\n```\n\n### MCP Tools\n\n| Tool | Description |\n|------|-------------|\n| `md_search` | Semantic search by meaning; returns relevant sections |\n| `md_context` | Token-compressed file summaries at `brief`, `summary`, or `full` detail |\n| `md_structure` | Heading hierarchy with token counts |\n| `md_keyword_search` | Structural search by heading, code, list, or table presence |\n| `md_index` | Build or rebuild the index |\n| `md_links` | Outgoing links from a file |\n| `md_backlinks` | Incoming links to a file |\n\n---\n\n## Configuration\n\nmdm supports a layered configuration system for persistent settings:\n\n```bash\n# Create a config file\nmdm config init\n\n# Check your configuration\nmdm config check\n\n# Customize settings in .mdm.toml\n```\n\n```toml\n# .mdm.toml\n[index]\nmaxDepth = 10\nexcludePatterns = [\"node_modules\", \".git\", \"dist\", \"build\"]\n\n[search]\ndefaultLimit = 20\nminSimilarity = 0.35\n```\n\nConfiguration precedence: CLI flags \u003e Environment variables \u003e Config file \u003e Defaults\n\n**See [docs/CONFIG.md](./docs/CONFIG.md) for the complete configuration reference.**\n\n### Index Location\n\nIndexes are stored in `.mdm/` in your project root:\n\n```\n.mdm/\n  indexes/\n    documents.json    # Document metadata\n    sections.json     # Section index\n    links.json        # Link graph\n    vectors.bin       # Embeddings (if enabled)\n```\n\n### Environment Variables\n\n| Variable | Description |\n|----------|-------------|\n| `OPENAI_API_KEY` | Required for OpenAI semantic search (default provider) |\n| `OPENROUTER_API_KEY` | Required for OpenRouter semantic search |\n| `MDM_*` | Configuration overrides (see [CONFIG.md](./docs/CONFIG.md)) |\n\n---\n\n## AI Summarization\n\nTransform search results into actionable insights using AI.\n\n### Quick Start\n\n```bash\n# Basic usage (auto-detects installed CLI tools)\nmdm search \"authentication\" --summarize\n\n# Skip confirmation for scripts\nmdm search \"error handling\" --summarize --yes\n\n# Stream output in real-time\nmdm search \"database\" --summarize --stream\n```\n\n### First-Time Setup\n\nOn first use, mdm auto-detects available providers:\n\n```\nUsing claude (subscription - FREE)\n\n--- AI Summary ---\n\nBased on the search results, here are the key findings...\n```\n\n### Providers\n\n**CLI Providers (FREE with subscription):**\n\n| Provider | Command | Subscription Required |\n|----------|---------|----------------------|\n| Claude Code | `claude` | Claude Pro/Team |\n| GitHub Copilot | `copilot` | Copilot subscription |\n| OpenCode | `opencode` | BYOK (any provider) |\n\n**API Providers (pay-per-use):**\n\n| Provider | Cost per 1M tokens | Notes |\n|----------|-------------------|-------|\n| DeepSeek | $0.14-0.56 | Ultra-cheap |\n| Qwen | $0.03-0.12 | Budget option |\n| Google Gemini | $0.30-2.50 | Balanced |\n| OpenAI GPT | $1.75-14.00 | Premium |\n| Anthropic Claude | $3.00-15.00 | Premium |\n\n### Configuration\n\n**Option 1: Auto-detection (recommended)**\n\nJust run `--summarize` - mdm finds installed CLI tools automatically.\n\n**Option 2: Config file**\n\n```toml\n# .mdm.toml\n[aiSummarization]\nmode = \"cli\"        # 'cli' (free) or 'api' (paid)\nprovider = \"claude\" # Provider name\n```\n\n**Option 3: Environment variables**\n\n```bash\nexport MDM_AISUMMARIZATION_MODE=api\nexport MDM_AISUMMARIZATION_PROVIDER=deepseek\nexport DEEPSEEK_API_KEY=sk-...\n```\n\n### CLI Flags\n\n| Flag | Short | Description |\n|------|-------|-------------|\n| `--summarize` | `-s` | Enable AI summarization |\n| `--yes` | `-y` | Skip cost confirmation |\n| `--stream` | | Stream output in real-time |\n\n### Cost Transparency\n\nAPI providers show cost estimates before proceeding:\n\n```\nCost Estimate:\n  Provider: deepseek\n  Input tokens: ~2,500\n  Output tokens: ~500\n  Estimated cost: $0.0007\n\nContinue with summarization? [Y/n]:\n```\n\nCLI providers show free status:\n\n```\nUsing claude (subscription - FREE)\n```\n\nSee [docs/summarization.md](./docs/summarization.md) for architecture details and troubleshooting.\n\n---\n\n## Performance\n\n| Metric | Raw Markdown | mdm | Savings |\n|--------|--------------|---------|---------|\n| Context for single doc | 2,500 tokens | 400 tokens | **84%** |\n| Context for 10 docs | 25,000 tokens | 4,000 tokens | **84%** |\n| Search latency | N/A | \u003c100ms | - |\n\n---\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrobinson%2Fmarkdown-matters","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsrobinson%2Fmarkdown-matters","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrobinson%2Fmarkdown-matters/lists"}