{"id":47811336,"url":"https://github.com/brosnanyuen/opencodesearch","last_synced_at":"2026-04-08T23:01:12.460Z","repository":{"id":348638449,"uuid":"1199112190","full_name":"BrosnanYuen/opencodesearch","owner":"BrosnanYuen","description":"Largescale MCP server for codebase search with background indexing and automatic updating to git commits in rust","archived":false,"fork":false,"pushed_at":"2026-04-03T08:29:10.000Z","size":197,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-05T20:02:26.238Z","etag":null,"topics":["ai","claude-code","code","codex","llm","mcp","mcp-server","openai","openclaw","opencode","search","search-engine","vector-database"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BrosnanYuen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-02T03:59:41.000Z","updated_at":"2026-04-05T08:54:11.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/BrosnanYuen/opencodesearch","commit_stats":null,"previous_names":["brosnanyuen/opencodesearch"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/BrosnanYuen/opencodesearch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BrosnanYuen%2Fopencodesearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BrosnanYuen%2Fopencodesearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BrosnanYuen%2Fopencodesearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BrosnanYuen%2Fopencodesearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BrosnanYuen","download_url":"https://codeload.github.com/BrosnanYuen/opencodesearch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BrosnanYuen%2Fopencodesearch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31489427,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-06T17:22:55.647Z","status":"ssl_error","status_checked_at":"2026-04-06T17:22:54.741Z","response_time":112,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","claude-code","code","codex","llm","mcp","mcp-server","openai","openclaw","opencode","search","search-engine","vector-database"],"created_at":"2026-04-03T18:12:21.342Z","updated_at":"2026-04-06T21:00:37.377Z","avatar_url":"https://github.com/BrosnanYuen.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# opencodesearch\n\n`opencodesearch` is an asynchronous Rust code search system with a Model Context Protocol (MCP) server.\nIt indexes large repositories into vector + keyword backends and serves search results through MCP tools.\n\n## Features\n- Fully async runtime (`tokio`)\n- 4 isolated processes:\n  - orchestrator (state machine + supervision)\n  - background ingestor\n  - MCP server process\n  - git watchdog process\n- Required crates integrated and used in runtime code:\n  - `opencodesearchparser`\n  - `qdrant-client`\n  - `ollama-rs`\n  - `rmcp`\n- Hybrid retrieval:\n  - semantic search (Qdrant vectors)\n  - keyword search (Quickwit HTTP + local shadow fallback)\n- MCP server compatible with MCP clients using streamable HTTP and stdio transports\n\n## Architecture\nState machine in orchestrator:\n- `SPINUP`: load `config.json`\n- `NORMAL`: run `ingestor` + `mcp` + `watchdog`\n- `UPDATE`: keep `watchdog`, stop `ingestor` + `mcp` during update window\n- `CLOSING`: stop all children gracefully\n\nUpdate flow:\n- watchdog tracks git commits since last sync\n- when threshold (`commit_threshold`) is reached:\n  - send `UPDATE_START` to orchestrator\n  - pull + compute changed/deleted files\n  - remove stale docs\n  - reindex changed files\n  - send `UPDATE_END`\n\n## Requirements\n- Rust stable toolchain\n- Docker + Docker Compose\n- Local network access to:\n  - Ollama (`11434`)\n  - Qdrant (`6333` HTTP, `6334` gRPC)\n  - Quickwit (`7280`)\n\n## Configuration\n`config.json` schema:\n\n```json\n{\n  \"codebase\": {\n    \"directory_path\": \"/path/to/massive/repo\",\n    \"git_branch\": \"main\",\n    \"commit_threshold\": 50,\n    \"mcp_server_name\": \"My cool codebase\",\n    \"mcp_server_url\": \"http://localhost:9443\",\n    \"background_indexing_threads\": 2\n  },\n  \"ollama\": {\n    \"server_url\": \"http://localhost:11434\",\n    \"embedding_model\": \"qwen3-embedding:0.6b\",\n    \"context_size\": 2000\n  },\n  \"qdrant\": {\n    \"server_url\": \"http://localhost:6334\",\n    \"collection_name\": \"opencodesearch-code-chunks\",\n    \"api_key\": null\n  },\n  \"quickwit\": {\n    \"quickwit_url\": \"http://localhost:7280\",\n    \"quickwit_index_id\": \"opencodesearch-code-chunks\"\n  }\n}\n```\n\nImportant:\n- `qdrant.server_url` should target the gRPC endpoint port (`6334`) for `qdrant-client`.\n- `quickwit.quickwit_url` should target HTTP (`7280`).\n\n## Start Backend Services\nRun all local dependencies:\n\n```bash\ndocker compose up -d\n```\n\nCheck containers:\n\n```bash\ndocker ps\n```\n\n## Running the System\n\n### 1) Orchestrator mode (recommended)\nStarts and supervises all child processes.\n\n```bash\ncargo run -- orchestrator --config config.json\n```\n\n### 2) Individual process modes\nYou can run each process directly for debugging.\n\nIngestor:\n```bash\ncargo run -- ingestor --config config.json\n```\n\nMCP server:\n```bash\ncargo run -- mcp --config config.json\n```\n\nMCP server over stdio (for local MCP clients):\n```bash\ncargo run -- mcp-stdio --config config.json\n```\n\nWatchdog (requires orchestrator IPC env):\n```bash\nOPENCODESEARCH_IPC_SOCKET=/tmp/opencodesearch.sock cargo run -- watchdog --config config.json\n```\n\n## MCP Server Usage\nThe MCP server supports:\n- streamable HTTP via `cargo run -- mcp --config config.json`\n- stdio via `cargo run -- mcp-stdio --config config.json`\n\nImplemented MCP tool:\n- `search_code`\n  - input:\n    - `query: string`\n    - `limit?: number` (default 8, max 50)\n  - output (structured JSON): array of objects with\n    - `snippet`\n    - `path`\n    - `start_line`\n    - `end_line`\n    - `score`\n    - `source`\n\n### Example tool input\n\n```json\n{\n  \"query\": \"which function changes obj variable\",\n  \"limit\": 5\n}\n```\n\n### Result shape\n\n```json\n{\n  \"hits\": [\n    {\n      \"path\": \"/repo/module.py\",\n      \"snippet\": \"def mutate(obj): ...\",\n      \"start_line\": 10,\n      \"end_line\": 22,\n      \"score\": 0.92,\n      \"source\": \"qdrant\"\n    }\n  ]\n}\n```\n\n## Using With MCP Clients\nThis server supports both:\n- streamable HTTP (`cargo run -- mcp --config config.json`)\n- local stdio (`cargo run -- mcp-stdio --config config.json`)\n\n### OpenAI Codex\nCodex supports both stdio and streamable HTTP MCP servers.\n\nStdio (CLI):\n```bash\ncodex mcp add opencodesearch -- \\\n  cargo run --quiet --manifest-path /home/brosnan/opencodesearch/Cargo.toml -- \\\n  mcp-stdio --config /home/brosnan/opencodesearch/config.json\n```\n\nRemote HTTP (`~/.codex/config.toml` or `.codex/config.toml`):\n```toml\n[mcp_servers.opencodesearch]\nurl = \"http://localhost:9443/\"\n```\n\nThen verify:\n```bash\ncodex mcp list\n```\n\n### OpenCode\nOpenCode config uses the `mcp` section in `opencode.json` (or `opencode.jsonc`).\n\nRemote HTTP:\n```json\n{\n  \"$schema\": \"https://opencode.ai/config.json\",\n  \"mcp\": {\n    \"opencodesearch\": {\n      \"type\": \"remote\",\n      \"url\": \"http://localhost:9443/\",\n      \"enabled\": true\n    }\n  }\n}\n```\n\nLocal stdio:\n```json\n{\n  \"$schema\": \"https://opencode.ai/config.json\",\n  \"mcp\": {\n    \"opencodesearch\": {\n      \"type\": \"local\",\n      \"command\": [\n        \"cargo\",\n        \"run\",\n        \"--quiet\",\n        \"--manifest-path\",\n        \"/home/brosnan/opencodesearch/Cargo.toml\",\n        \"--\",\n        \"mcp-stdio\",\n        \"--config\",\n        \"/home/brosnan/opencodesearch/config.json\"\n      ],\n      \"enabled\": true\n    }\n  }\n}\n```\n\n### Claude Code\nClaude Code supports HTTP, SSE, and stdio MCP transports.\n\nRemote HTTP:\n```bash\nclaude mcp add --transport http opencodesearch http://localhost:9443/\n```\n\nLocal stdio:\n```bash\nclaude mcp add --transport stdio opencodesearch -- \\\n  cargo run --quiet --manifest-path /home/brosnan/opencodesearch/Cargo.toml -- \\\n  mcp-stdio --config /home/brosnan/opencodesearch/config.json\n```\n\nThen verify:\n```bash\nclaude mcp list\n```\n\n### TLS / HTTPS Notes\n- Default local config uses `http://localhost:9443`.\n- For `https://...`, provide a certificate trusted by your MCP client.\n- TLS cert and key defaults:\n  - `certs/localhost-cert.pem`\n  - `certs/localhost-key.pem`\n- Override TLS file paths with:\n  - `OPENCODESEARCH_TLS_CERT_PATH`\n  - `OPENCODESEARCH_TLS_KEY_PATH`\n- For Codex specifically, you can provide a custom CA bundle with `CODEX_CA_CERTIFICATE`.\n\nReferences:\n- Codex MCP docs: https://developers.openai.com/codex/mcp\n- OpenCode MCP docs: https://opencode.ai/docs/mcp-servers/\n- Claude Code MCP docs: https://code.claude.com/docs/en/mcp\n\n### Quick curl test\nUse the included script:\n\n```bash\n./test_mcp_curl.sh\n```\n\nOptional:\n- `MCP_URL=https://localhost:9443/ MCP_INSECURE=1 ./test_mcp_curl.sh`\n\nThe script performs the required MCP HTTP handshake steps:\n1. `initialize`\n2. extract `mcp-session-id` from response headers\n3. send `notifications/initialized` with the same `mcp-session-id`\n4. call `tools/call` for `search_code`\n\n### Manual curl sequence\nInitialize and capture session id:\n\n```bash\ncurl -sS -D headers.txt http://localhost:9443/ \\\n  -H 'Content-Type: application/json' \\\n  -H 'Accept: application/json, text/event-stream' \\\n  -d '{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"initialize\",\"params\":{\"protocolVersion\":\"2025-03-26\",\"capabilities\":{},\"clientInfo\":{\"name\":\"curl-test\",\"version\":\"1.0\"}}}'\n```\n\nSend initialized notification:\n\n```bash\nSESSION_ID=\"$(awk 'tolower($1)==\"mcp-session-id:\"{print $2}' headers.txt | tr -d '\\r' | tail -n 1)\"\ncurl -sS http://localhost:9443/ \\\n  -H 'Content-Type: application/json' \\\n  -H 'Accept: application/json, text/event-stream' \\\n  -H \"mcp-session-id: ${SESSION_ID}\" \\\n  -d '{\"jsonrpc\":\"2.0\",\"method\":\"notifications/initialized\"}'\n```\n\nCall the MCP tool:\n\n```bash\ncurl -N http://localhost:9443/ \\\n  -H 'Content-Type: application/json' \\\n  -H 'Accept: application/json, text/event-stream' \\\n  -H \"mcp-session-id: ${SESSION_ID}\" \\\n  -d '{\"jsonrpc\":\"2.0\",\"id\":2,\"method\":\"tools/call\",\"params\":{\"name\":\"search_code\",\"arguments\":{\"query\":\"which function mutates obj\",\"limit\":5}}}'\n```\n\n## Rust API Documentation\nThe crate exposes reusable modules for embedding, indexing, MCP serving, and process control.\n\n### Modules\n- `config`: parse typed app config (`AppConfig`)\n- `chunking`: parse/split source files into chunks (`chunk_file`)\n- `indexing`: indexing runtime (`IndexingRuntime`)\n- `qdrant_store`: vector storage + semantic query (`QdrantStore`)\n- `quickwit`: keyword storage/query (`QuickwitStore`)\n- `mcp`: MCP server type (`OpenCodeSearchMcpServer`)\n- `watchdog`: git update monitor (`WatchdogProcess`)\n- `orchestrator`: multi-process supervisor (`Orchestrator`)\n\n### Minimal Rust indexing example\n\n```rust\nuse opencodesearch::config::AppConfig;\nuse opencodesearch::indexing::IndexingRuntime;\n\n#[tokio::main]\nasync fn main() -\u003e anyhow::Result\u003c()\u003e {\n    let config = AppConfig::from_path(\"config.json\")?;\n    let runtime = IndexingRuntime::from_config(config)?;\n\n    runtime.index_entire_codebase().await?;\n    Ok(())\n}\n```\n\n### Minimal Rust semantic search example\n\n```rust\nuse opencodesearch::config::AppConfig;\nuse opencodesearch::indexing::IndexingRuntime;\n\n#[tokio::main]\nasync fn main() -\u003e anyhow::Result\u003c()\u003e {\n    let config = AppConfig::from_path(\"config.json\")?;\n    let runtime = IndexingRuntime::from_config(config)?;\n\n    let query_vec = runtime.embed_query(\"where is object mutated\") .await?;\n    let hits = runtime.qdrant.semantic_search(query_vec, 5).await?;\n\n    for hit in hits {\n        println!(\"{}:{}-{}\", hit.path, hit.start_line, hit.end_line);\n    }\n\n    Ok(())\n}\n```\n\n### Minimal Rust MCP server embedding\n\n```rust\nuse opencodesearch::config::AppConfig;\nuse opencodesearch::indexing::IndexingRuntime;\nuse opencodesearch::mcp::OpenCodeSearchMcpServer;\n\n#[tokio::main]\nasync fn main() -\u003e anyhow::Result\u003c()\u003e {\n    let config = AppConfig::from_path(\"config.json\")?;\n    let runtime = IndexingRuntime::from_config(config)?;\n    OpenCodeSearchMcpServer::new(runtime)\n        .run_streamable_http(\"http://localhost:9443\")\n        .await\n}\n```\n\n## Testing\n### Standard tests\n```bash\ncargo test\n```\n\n### Live container integration tests\nRequires running Docker services and local git:\n\n```bash\ncargo test -- --ignored\n```\n\nCurrent ignored integration tests validate:\n- Ollama connectivity\n- Quickwit + Qdrant connectivity\n- full indexing flow on generated Python project\n- retrieval through MCP search path with non-exact query phrasing\n- 100-commit refactor scenario for watchdog threshold behavior\n\n## Troubleshooting\n- Quickwit health endpoint: use `http://localhost:7280/health/livez`\n- If embeddings fail, confirm Ollama model availability:\n  - `qwen3-embedding:0.6b`\n- Qdrant client requires gRPC port (`6334`) in config\n- If integration tests fail on startup race, rerun after a short container warmup\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrosnanyuen%2Fopencodesearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrosnanyuen%2Fopencodesearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrosnanyuen%2Fopencodesearch/lists"}