{"id":51072227,"url":"https://github.com/florextech/docs-to-mcp","last_synced_at":"2026-06-23T11:33:15.326Z","repository":{"id":354739393,"uuid":"1225026371","full_name":"florextech/docs-to-mcp","owner":"florextech","description":"Convert any documentation URL into a ready-to-run MCP server. 100% local by default — no API keys needed. Crawl → Markdown → Embeddings → Vector Store → MCP Server.","archived":false,"fork":false,"pushed_at":"2026-04-29T23:09:53.000Z","size":227,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-31T23:07:01.604Z","etag":null,"topics":["ai","chromadb","claude","cli","cursor","documentation","embeddings","llm","local-first","mcp","mcp-server","model-context-protocol","open-source","playwright","rag","semantic-search","transformers-js","typescript","vector-search","web-crawler"],"latest_commit_sha":null,"homepage":"https://www.npmjs.com/package/@florexlabs/docs-to-mcp","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/florextech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-29T21:49:29.000Z","updated_at":"2026-05-03T21:02:35.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/florextech/docs-to-mcp","commit_stats":null,"previous_names":["florextech/docs-to-mcp"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/florextech/docs-to-mcp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/florextech%2Fdocs-to-mcp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/florextech%2Fdocs-to-mcp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/florextech%2Fdocs-to-mcp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/florextech%2Fdocs-to-mcp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/florextech","download_url":"https://codeload.github.com/florextech/docs-to-mcp/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/florextech%2Fdocs-to-mcp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34686727,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-23T02:00:07.161Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","chromadb","claude","cli","cursor","documentation","embeddings","llm","local-first","mcp","mcp-server","model-context-protocol","open-source","playwright","rag","semantic-search","transformers-js","typescript","vector-search","web-crawler"],"created_at":"2026-06-23T11:33:12.607Z","updated_at":"2026-06-23T11:33:15.320Z","avatar_url":"https://github.com/florextech.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# @florexlabs/docs-to-mcp\n\nConvert any documentation URL into a ready-to-run MCP server.\n\n**100% local by default** — no API keys needed. Embeddings run locally with [Transformers.js](https://huggingface.co/docs/transformers.js).\n\n```\nURL → crawl → clean HTML → markdown → chunks → embeddings → vector store → MCP server\n```\n\n## Prerequisites\n\n- **Node.js** \u003e= 18\n- **Docker** (for ChromaDB): `docker run -p 8000:8000 chromadb/chroma`\n- **Playwright browsers**: `npx playwright install chromium`\n- No API keys needed for default local embeddings\n\n## Quick Start\n\n```bash\n# Install Playwright browsers (one-time)\nnpx playwright install chromium\n\n# Start ChromaDB\ndocker run -p 8000:8000 chromadb/chroma\n\n# Initialize a project from a docs URL\nnpx @florexlabs/docs-to-mcp init https://docs.example.com --out ./my-docs-to-mcp\n\ncd my-docs-to-mcp\nnpm install\n\n# Crawl, build, and start — no API keys needed!\nnpm run crawl\nnpm run build\nnpm run start\n```\n\n## Installation\n\n```bash\nnpm install -g @florexlabs/docs-to-mcp\n```\n\nOr use directly with npx:\n\n```bash\nnpx @florexlabs/docs-to-mcp \u003ccommand\u003e\n```\n\n## Embedding Providers\n\n### Local (default)\n\nUses [Transformers.js](https://huggingface.co/docs/transformers.js) with the `Xenova/all-MiniLM-L6-v2` model. Runs 100% on your machine via ONNX runtime. No API keys, no external services, no cost.\n\n```bash\ndocs-to-mcp build                                    # uses local by default\ndocs-to-mcp build --model Xenova/all-MiniLM-L6-v2    # explicit model\n```\n\nThe model is downloaded automatically on first use (~80MB) and cached locally.\n\n### OpenAI (opt-in)\n\nFor higher quality embeddings on large documentation sets, you can use OpenAI:\n\n```bash\nexport OPENAI_API_KEY=sk-...\ndocs-to-mcp build --provider openai\ndocs-to-mcp build --provider openai --model text-embedding-3-large\n```\n\n## Commands\n\n### `docs-to-mcp init \u003curl\u003e`\n\nGenerate a new MCP server project from a documentation URL.\n\n```bash\ndocs-to-mcp init https://docs.example.com --out ./my-docs-to-mcp\n```\n\nOptions:\n- `--out \u003cdir\u003e` — Output directory (default: `./docs-to-mcp-project`)\n- `--depth \u003cn\u003e` — Crawl depth (default: `3`)\n- `--limit \u003cn\u003e` — Max pages (default: `50`)\n- `--provider \u003cname\u003e` — Embedding provider: `local` or `openai` (default: `local`)\n- `--model \u003cname\u003e` — Embedding model\n- `--collection \u003cname\u003e` — Collection name (default: `docs`)\n\n### `docs-to-mcp crawl \u003curl\u003e`\n\nCrawl a documentation site, parse HTML to markdown, and chunk it.\n\n```bash\ndocs-to-mcp crawl https://docs.example.com --out ./data --depth 3 --limit 50\n```\n\nOptions:\n- `--out \u003cdir\u003e` — Output directory (default: `./data`)\n- `--depth \u003cn\u003e` — Crawl depth (default: `3`)\n- `--limit \u003cn\u003e` — Max pages (default: `50`)\n- `--verbose` — Verbose output\n\n### `docs-to-mcp build`\n\nEmbed chunks and upsert into ChromaDB.\n\n```bash\ndocs-to-mcp build                          # local embeddings (default)\ndocs-to-mcp build --provider openai        # use OpenAI instead\n```\n\nOptions:\n- `--collection \u003cname\u003e` — Collection name (default: `docs`)\n- `--provider \u003cname\u003e` — `local` or `openai` (default: `local`)\n- `--model \u003cname\u003e` — Embedding model\n- `--data \u003cdir\u003e` — Data directory (default: `./data`)\n- `--force` — Force rebuild\n- `--verbose` — Verbose output\n\n### `docs-to-mcp start`\n\nStart the MCP server (stdio transport).\n\n```bash\ndocs-to-mcp start --collection docs\n```\n\n### `docs-to-mcp dev`\n\nStart the MCP server in development mode with logging.\n\n```bash\ndocs-to-mcp dev --collection docs\n```\n\n## MCP Tools\n\nThe server exposes three tools:\n\n| Tool | Description |\n|------|-------------|\n| `search_docs(query, topK?)` | Semantic search across indexed documentation |\n| `get_source(url)` | Get all chunks from a specific source URL |\n| `list_sources()` | List all indexed documentation sources |\n\n## Connecting to MCP Clients\n\n### Claude Desktop\n\nAdd to `~/Library/Application Support/Claude/claude_desktop_config.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"my-docs\": {\n      \"command\": \"npx\",\n      \"args\": [\"@florexlabs/docs-to-mcp\", \"start\", \"--collection\", \"docs\"],\n      \"env\": {\n        \"CHROMA_URL\": \"http://localhost:8000\"\n      }\n    }\n  }\n}\n```\n\n### Cursor\n\nAdd to `.cursor/mcp.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"my-docs\": {\n      \"command\": \"npx\",\n      \"args\": [\"@florexlabs/docs-to-mcp\", \"start\", \"--collection\", \"docs\"],\n      \"env\": {\n        \"CHROMA_URL\": \"http://localhost:8000\"\n      }\n    }\n  }\n}\n```\n\n## Environment Variables\n\n```\nCHROMA_URL=http://localhost:8000\n\n# Only needed with --provider openai:\nOPENAI_API_KEY=sk-...\nOPENAI_EMBEDDING_MODEL=text-embedding-3-small\n```\n\n## Architecture\n\n```\npackages/\n  cli/          — CLI commands (init, crawl, build, start, dev)\n  crawler/      — Playwright-based same-origin doc crawler\n  parser/       — HTML cleanup (Cheerio) + markdown conversion (Turndown)\n  chunker/      — Heading-aware markdown chunking\n  embeddings/   — Local (Transformers.js) + OpenAI providers\n  vector-store/ — ChromaDB adapter\n  mcp-server/   — MCP server with search tools\n```\n\n## Security Notes\n\n- Only crawls same-origin links by default\n- Never executes scraped content\n- URLs are sanitized and normalized\n- Local embeddings stay on your machine — nothing leaves your network\n- If using OpenAI, embeddings are sent to OpenAI's API\n- Do not crawl private documentation unless you understand where data goes\n- No shell execution from user-controlled input\n\n## Development\n\n```bash\npnpm install\npnpm test\npnpm build\n```\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflorextech%2Fdocs-to-mcp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fflorextech%2Fdocs-to-mcp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflorextech%2Fdocs-to-mcp/lists"}