{"id":50954767,"url":"https://github.com/andrecrjr/raglike.md","last_synced_at":"2026-06-18T05:02:28.669Z","repository":{"id":361609848,"uuid":"1254875613","full_name":"andrecrjr/raglike.md","owner":"andrecrjr","description":"A high-performance, local semantic search engine for Markdown documentation. Built with Bun, PGlite (pgvector), External Postgres Support, and Xenova Transformers.","archived":false,"fork":false,"pushed_at":"2026-06-13T12:14:58.000Z","size":752,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-13T14:12:46.072Z","etag":null,"topics":["agent","bun","docker","documentation-tool","llm-tools","mcp-server","onnx","open-source","rag","webapp"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andrecrjr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-05-31T05:27:07.000Z","updated_at":"2026-06-13T12:15:02.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/andrecrjr/raglike.md","commit_stats":null,"previous_names":["andrecrjr/search-vector-mcp-api","andrecrjr/raglike.md"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/andrecrjr/raglike.md","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrecrjr%2Fraglike.md","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrecrjr%2Fraglike.md/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrecrjr%2Fraglike.md/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrecrjr%2Fraglike.md/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andrecrjr","download_url":"https://codeload.github.com/andrecrjr/raglike.md/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrecrjr%2Fraglike.md/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34476728,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-18T02:00:06.871Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","bun","docker","documentation-tool","llm-tools","mcp-server","onnx","open-source","rag","webapp"],"created_at":"2026-06-18T05:02:27.793Z","updated_at":"2026-06-18T05:02:28.664Z","avatar_url":"https://github.com/andrecrjr.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# raglike-md 🚀\n\n### **The Zero-UI Knowledge Engine for AI Agents**\n\n`raglike-md` is a high-performance, local semantic search engine specifically engineered for AI Agents (like Cursor, Claude Code, and Windsurf). It transforms your documentation and codebases into a high-fidelity knowledge base that agents can navigate with structural precision.\n\nBuilt with **Bun**, **PGlite (pgvector)**, **MDAST**, and **Xenova Transformers**.\n\n---\n\n## 🧠 Why raglike-md?\n\nMost RAG systems use generic character-based chunking that breaks context. `raglike-md` is **Code-Aware**:\n\n- **Structural AST Chunking:** Splits documents by Markdown headers (#, ##) instead of raw character counts, preserving logical context.\n- **Code-Block Bundling:** Strictly binds code blocks (```ts ... ```) to their preceding contextual paragraphs. Code is never split across chunks.\n- **Zero-UI Git Pipeline:** Ingest repositories automatically via Webhooks (GitHub/GitLab). No manual uploads required.\n- **Multi-Repo Scoping:** Search across multiple repositories with scoped queries.\n- **Agent-First Protocol:** Native **Model Context Protocol (MCP)** support via SSE, optimized for agentic workflows.\n\n---\n\n## ✨ Key Features\n\n- **4-way Hybrid Search:** Combines conceptual similarity (pgvector), English keyword matching (stemmed), literal matching (non-stemmed), and literal heading boosts using RRF.\n- **Cross-Encoder Reranking:** Secondary pass using `bge-reranker-base` for ultra-high precision.\n- **Secure \u0026 Team-Ready:** Bearer Token authentication and Webhook signature validation.\n- **HNSW Acceleration:** Sub-second searches across massive datasets.\n- **Local Embeddings:** All processing happens on your machine using `all-mpnet-base-v2`.\n\n---\n\n## 🚀 Quick Start (Docker SSE)\n\nThe most portable way to use `raglike-md` with AI tools (Cursor, Claude Code, etc.) is via **SSE (Server-Sent Events)** inside Docker.\n\n1. **Start the server:**\n   ```bash\n   docker run -d \\\n     -p 4321:4321 \\\n     -e ENABLE_MCP=true \\\n     -e API_TOKEN=your_secure_token \\\n     -e WEBHOOK_SECRET=your_webhook_secret \\\n     -v $(pwd)/.repos:/app/.repos \\\n     raglike-md\n   ```\n\n2. **Tool Configuration:**\n\n| Tool | Type | URL | Auth |\n| :--- | :--- | :--- | :--- |\n| **Cursor** | SSE | `http://localhost:4321/mcp` | Add `Authorization: Bearer your_token` |\n| **Claude Code** | SSE | `claude mcp add --transport sse raglike http://localhost:4321/mcp` | |\n| **Windsurf** | SSE | `url: http://localhost:4321/mcp` | |\n\n---\n\n## 🔄 Git Ingestion (Zero-UI)\n\nConfigure a Webhook on your GitHub/GitLab repository to trigger automatic re-indexing on every push.\n\n**Endpoint:** `POST http://localhost:4321/api/v1/sync/webhook`\n**Secret:** Matches your `WEBHOOK_SECRET` environment variable.\n\nSupported Events:\n- **GitHub:** `push` event (Signature: `x-hub-signature-256`)\n- **GitLab:** `Push Hook` (Token: `x-gitlab-token`)\n\n---\n\n## 💻 Terminal Sync (raglike-cli)\n\nSynchronize local folders directly from your terminal.\n\n```bash\n# Option 1: Run without installing (zero-install)\nbun x ./cli ./my-notes --server http://localhost:4321\n\n# Option 2: Install globally\nbun run install-cli\nraglike-cli ./my-notes\n```\n\n- **Delta Sync:** Only uploads new or changed files.\n- **Config Support:** Uses `.raglike` (JSON) files for persistent settings.\n- **Recursive:** Scans subdirectories for `.md` and `.pdf`.\n\n[Read the CLI Guide →](docs/guides/cli.md)\n\n---\n\n## 📚 MCP Tools\n\n### 1. `semantic_markdown_search`\nFind precise information across all ingested repositories.\n**Arguments:**\n- `query`: The conceptual query.\n- `limit`: Number of results (default: 3).\n- `rerank`: Use cross-encoder (default: false).\n- `repository`: Optional repo ID (e.g., \"owner-repo\") to scope search.\n\n### 2. `read_chunk_neighbors`\nGet the text before and after a result to expand context.\n\n### 3. `get_full_document`\nRetrieve the full raw markdown content of a file.\n\n---\n\n## 🌐 API Reference\n\n### Semantic Search\n`POST /search`\n```json\n{\n  \"query\": \"How does the protocol handle SSE?\",\n  \"limit\": 3,\n  \"repository\": \"my-org-project\"\n}\n```\n\n### Generate Embeddings\n`POST /api/v1/embeddings`\n```json\n{\n  \"texts\": [\"Hello world\", \"Semantic intelligence\"]\n}\n```\n*Returns `{\"success\": true, \"embeddings\": [[...], [...]]}`*\n\n---\n\n## 🛠️ Local Development\n\n```bash\nbun install\nbun run src/index.ts --mcp\n```\n\n### Environment Variables\n- `API_TOKEN`: Secure your MCP/REST endpoints.\n- `WEBHOOK_SECRET`: Secure your Git ingestion pipeline.\n- `API_EMBEDDING_URL`: (Optional) Offload embedding generation to an external API.\n- `API_EMBEDDING_TOKEN`: (Optional) Bearer token for the external embedding API.\n- `POSTGRES_URL`: (Optional) Use an external Postgres instance.\n\n---\n\n## ⚖️ License\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrecrjr%2Fraglike.md","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandrecrjr%2Fraglike.md","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrecrjr%2Fraglike.md/lists"}