{"id":33347334,"url":"https://github.com/jpotter80/mcp","last_synced_at":"2026-04-16T17:02:20.516Z","repository":{"id":325541920,"uuid":"1101585930","full_name":"jpotter80/mcp","owner":"jpotter80","description":"A framework for building self-contained, searchable MCP servers from technical documentation. Create independent documentation servers with hybrid vector + keyword search, ready to distribute and deploy.","archived":false,"fork":false,"pushed_at":"2025-12-30T00:23:10.000Z","size":277553,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-02T07:45:47.562Z","etag":null,"topics":["docs","documentation","duckdb","mcp","mcp-server","mojo"],"latest_commit_sha":null,"homepage":"","language":"MDX","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jpotter80.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-21T22:25:33.000Z","updated_at":"2025-12-30T00:23:13.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/jpotter80/mcp","commit_stats":null,"previous_names":["jpotter80/mcp"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jpotter80/mcp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jpotter80%2Fmcp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jpotter80%2Fmcp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jpotter80%2Fmcp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jpotter80%2Fmcp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jpotter80","download_url":"https://codeload.github.com/jpotter80/mcp/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jpotter80%2Fmcp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31895650,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-16T11:36:10.202Z","status":"ssl_error","status_checked_at":"2026-04-16T11:36:09.652Z","response_time":69,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docs","documentation","duckdb","mcp","mcp-server","mojo"],"created_at":"2025-11-22T08:00:58.873Z","updated_at":"2026-04-16T17:02:20.467Z","avatar_url":"https://github.com/jpotter80.png","language":"MDX","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Multi-Server MCP Documentation Search\n\nA framework for building self-contained, searchable MCP servers from technical documentation. Create independent documentation servers with hybrid vector + keyword search, ready to distribute and deploy.\n\n## 🎯 Project Overview\n\nThis framework enables you to:\n\n1. **Build** searchable MCP servers from any Markdown/MDX documentation source\n2. **Search** with hybrid semantic (vector) + keyword (FTS) search via Reciprocal Rank Fusion\n3. **Distribute** self-contained servers as standalone repositories or packages\n4. **Deploy** to VS Code, Claude Desktop, or any MCP-compatible host\n5. **Scale** to multiple documentation sources with automated tooling\n\n### Core Value Proposition\n\n- 🔍 **Hybrid Search**: Combines semantic similarity (HNSW) and keyword matching (BM25) intelligently\n- 📦 **Self-Contained Servers**: Each MCP server is fully standalone and distributable\n- 🚀 **Multi-Format Support**: Works with MDX, Markdown, and other documentation formats\n- 🎛️ **Config-Driven**: All paths and parameters controlled via YAML configuration\n- 💾 **Versioned Data**: DuckLake provides reproducible documentation snapshots\n- 🔄 **Automated Tooling**: Scripts for syncing, scaffolding, and building new servers\n\n## 📐 Multi-Server Architecture\n\nThis project supports multiple independent MCP servers, each serving different documentation sources:\n\n```\n/home/james/mcp/\n├── servers/                          # Standalone MCP servers\n│   ├── mojo-manual-mcp/              # Mojo documentation server\n│   │   ├── runtime/                  # Server code + indexed database\n│   │   │   ├── mojo_manual_mcp_server.py\n│   │   │   ├── search.py\n│   │   │   └── mojo_manual_mcp.db\n│   │   ├── config/                   # YAML configuration\n│   │   │   ├── processing_config.yaml\n│   │   │   └── server_config.yaml\n│   │   ├── requirements.txt\n│   │   └── README.md\n│   │\n│   └── [future-servers]/             # DuckDB, Python, etc.\n│\n├── shared/                           # Build-time infrastructure (dev only)\n│   ├── preprocessing/                # Document processing pipeline\n│   ├── embedding/                    # Embedding generation scripts\n│   ├── templates/                    # Templates for new servers\n│   └── build/                        # Ephemeral build artifacts\n│\n├── source-documentation/             # Documentation sources\n│   ├── mojo/manual/                  # Mojo docs (MDX files)\n│   └── [other-sources]/\n│\n└── tools/                            # Automation scripts\n    ├── sync_documentation.sh         # Sync from upstream repos\n    ├── scaffold_new_mcp.sh           # Create new server structure\n    └── build_mcp.sh                  # Build server database\n```\n\n**Key Design Principles**:\n- Each server in `/servers/{name}/` is completely self-contained and distributable\n- Shared build infrastructure in `/shared/` is for development only (not packaged with servers)\n- All configuration is YAML-based with variable substitution (no hardcoded paths)\n- Multi-format support via pluggable processor architecture\n- Works with or without pixi (pip + venv supported)\n\n## 🚀 Quick Start\n\nGet the Mojo documentation MCP server running in 3 steps:\n\n### Using Pixi (Recommended)\n\n# 1. Clone and install dependencies\n\n```bash\ngit clone jpotter80/mcp\ncd mcp/servers/mojo-manual-mcp\npixi install\n```\n\n# 2. Configure VS Code\n\nAdd the Mojo-Manual MCP server, by adding the config to your VS Code settings via mcp.json for global settings. Replace `/absolute/path/to/mojo-manual-mcp` with your actual server path.\n\n```json\n{\n  \"servers\": {\n    \"mojo-manual\": {\n      \"type\": \"stdio\",\n      \"command\": \"pixi\",\n      \"args\": [\"run\", \"serve\"],\n      \"cwd\": \"/absolute/path/to/mojo-manual-mcp\"\n    }\n  }\n}\n```\n\n# 3. From the mcp.json file in VS Code, if properly configured, the server will show a start button to launch the server. From then on, VS Code will manage starting/stopping the server as needed.\n\n\n\n**Note**: Pre-built databases are included in the repository. No build step required to run the server.\n\n📖 **Detailed guides**: See [`docs/QUICKSTART.md`](docs/QUICKSTART.md) for complete setup instructions.\n\n**Environment Variables**:\n- `MOJO_DB_PATH`: Path to the indexed database\n- `MAX_SERVER_URL`: Embedding server endpoint (automatically started if `AUTO_START_MAX=1`)\n- `EMBED_MODEL_NAME`: Sentence transformer model name\n- `AUTO_START_MAX`: Set to `1` to auto-start MAX server (recommended)\n\n## 🏗️ Building from Source\n\nIf you want to rebuild the database from scratch or create a new MCP server:\n\n### Rebuild Mojo Server\n\n```bash\n# Full pipeline (all steps)\npixi run mojo-build\n\n# Or step-by-step\npixi run mojo-process              # Process documentation\npixi run mojo-generate-embeddings  # Generate vectors\npixi run mojo-consolidate          # Consolidate data\npixi run mojo-load                 # Load to DuckLake\npixi run mojo-index                # Create indexes\n```\n\n### Create a New MCP Server\n\n```bash\n# 1. Scaffold new server structure\n./tools/scaffold_new_mcp.sh --name duckdb --doc-type docs --format markdown\n\n# 2. Add documentation to source-documentation/duckdb/docs/\n\n# 3. Build the server\n./tools/build_mcp.sh --mcp-name duckdb\n\n# 4. Test the server\npython servers/duckdb-docs-mcp/runtime/duckdb_docs_mcp_server.py\n```\n\n📖 **Developer guides**: \n- [`docs/CREATING_NEW_MCP.md`](docs/CREATING_NEW_MCP.md) - Create new servers\n\n## 📋 Available Servers\n\nCurrently implemented:\n\n| Server | Documentation Source | Format | Status |\n|--------|---------------------|--------|--------|\n| **mojo-manual-mcp** | [Mojo Manual](https://docs.modular.com/mojo/manual) | MDX | ✅ Production |\n\nComing soon:\n- **duckdb-docs-mcp** - DuckDB documentation\n\n## 🛠️ Key Technologies\n\n- **Python 3.12+** — Core language for preprocessing and runtime\n- **DuckDB** — Vector similarity search (HNSW) + full-text search (BM25)\n- **DuckLake** — Versioned data lake for reproducible builds\n- **MAX** — Local sentence-transformers embedding server\n- **MCP** — Model Context Protocol for AI agent integration\n- **Pixi** — Package management and task automation (optional)\n\n## 🎓 How It Works\n\n### Build Pipeline\n\n1. **Preprocessing**: MDX/Markdown → cleaned chunks (~350-400 tokens, preserving structure)\n2. **Embeddings**: Chunks → 768-dimensional vectors via sentence-transformers\n3. **Consolidation**: Merge chunks + embeddings into consolidated Parquet dataset\n4. **Versioning**: Load into DuckLake for version-controlled data lake\n5. **Indexing**: Materialize into DuckDB with HNSW (vector) + FTS (keyword) indexes\n\n### Runtime Search\n\n- **Vector Search (HNSW)**: Semantic similarity matching via cosine distance\n- **Keyword Search (FTS/BM25)**: Exact phrase and term matching with field weighting\n- **Hybrid Fusion (RRF)**: Reciprocal Rank Fusion combines both rankings intelligently\n- **Graceful Fallback**: If MAX server unavailable, falls back to keyword-only search\n\n### Example Query Flow\n\n\nUser: \"How do I declare a variable in Mojo?\"\n  ↓\n1. Query embedding generated via MAX server\n2. Vector search finds semantically similar chunks\n3. Keyword search finds chunks with \"declare\" + \"variable\"\n4. RRF fusion combines results\n5. Top 5 chunks returned with snippets + URLs\n  ↓\nResponse: Relevant documentation sections with context\n\n\n## 🔗 External Resources\n\n- [Model Context Protocol](https://modelcontextprotocol.io) - MCP specification\n- [DuckDB Documentation](https://duckdb.org/docs) - Database engine docs\n- [DuckDB VSS Extension](https://duckdb.org/docs/extensions/vss) - Vector similarity search\n- [MAX Documentation](https://docs.modular.com/max/intro) - Embedding server\n- [Mojo Documentation](https://docs.modular.com/mojo/manual) - Example documentation source\n\n## 📄 License\n\n Copyright 2025 James Potter\n \n    Licensed under the Apache License, Version 2.0 (the \"License\");\n    you may not use this file except in compliance with the License.\n    You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n    Unless required by applicable law or agreed to in writing, software\n    distributed under the License is distributed on an \"AS IS\" BASIS,\n    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n    See the License for the specific language governing permissions and\n    limitations under the License.\n\n## 🙏 Acknowledgments\n\nBuilt with inspiration from Modular, DuckDB, and the Model Context Protocol communities - powered by open-source tools.\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjpotter80%2Fmcp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjpotter80%2Fmcp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjpotter80%2Fmcp/lists"}