{"id":50530003,"url":"https://github.com/nikolaiugelvik/pi-semsearch","last_synced_at":"2026-06-04T13:00:26.511Z","repository":{"id":361267150,"uuid":"1253137578","full_name":"NikolaiUgelvik/pi-semsearch","owner":"NikolaiUgelvik","description":"Semantic code search plugin for opencode","archived":false,"fork":false,"pushed_at":"2026-06-03T07:32:40.000Z","size":452,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-03T12:34:01.444Z","etag":null,"topics":["bm25","hyde-rag","llm-rag","llm-tools","pi-coding-agent","pi-extension","rag-pipeline","semantic-search"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NikolaiUgelvik.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-05-29T07:24:59.000Z","updated_at":"2026-06-03T07:32:44.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/NikolaiUgelvik/pi-semsearch","commit_stats":null,"previous_names":["nikolaiugelvik/opencode-plugin-cast","nikolaiugelvik/pi-semsearch"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/NikolaiUgelvik/pi-semsearch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NikolaiUgelvik%2Fpi-semsearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NikolaiUgelvik%2Fpi-semsearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NikolaiUgelvik%2Fpi-semsearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NikolaiUgelvik%2Fpi-semsearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NikolaiUgelvik","download_url":"https://codeload.github.com/NikolaiUgelvik/pi-semsearch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NikolaiUgelvik%2Fpi-semsearch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33905359,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-04T02:00:06.755Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bm25","hyde-rag","llm-rag","llm-tools","pi-coding-agent","pi-extension","rag-pipeline","semantic-search"],"created_at":"2026-06-03T12:30:35.998Z","updated_at":"2026-06-04T13:00:26.234Z","avatar_url":"https://github.com/NikolaiUgelvik.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pi-semsearch\n\nSemantic code search extension for [Pi](https://github.com/Earendil-Works/pi) using syntax-aware structural chunking and OpenAI-compatible embeddings.\n\nThe extension registers two Pi tools:\n\n- `semantic_search_code` — ranked semantic repository search with syntax-aware chunks, breadcrumbs, file/line ranges, and topology context. Results expose the matched unit as `topology.current`.\n- `semantic_get_chunk` — fetch exact context for a code chunk identified by any topology node ID from `semantic_search_code`, optionally including parents, siblings, and children.\n\n## How it works\n\npi-semsearch maintains a local SQLite index per worktree/configuration.\n\nAt index time, it scans the repository, parses supported languages with native Tree-sitter grammars, splits source into structural chunks, embeds those chunks with the configured embedding provider, and stores metadata plus vectors in `index.sqlite` using `sqlite-vec`.\n\nAt query time, it runs hybrid retrieval: vector search for semantic similarity, SQLite FTS/BM25 for lexical matches, and Reciprocal Rank Fusion (RRF) to merge the candidate lists. If the initial vector match is weak, HyDE uses Pi's active model to generate alternative search text, embeds that text, and merges those candidates into retrieval. Optional reranking can reorder the final candidate set.\n\nSearch results include source text, file/line ranges, symbol breadcrumbs, and structural topology. `semantic_get_chunk` can then hydrate any topology node by ID, such as `topology.current.id`, `topology.parent.id`, a sibling ID, or a child ID.\n\nRetrieval scores and debug details are hidden from `semantic_search_code` output by default. Set `PI_SEMSEARCH_DEBUG_RETRIEVAL=1` to expose `score`, `finalScore`, `retrieval`, and `status.bestScore` while debugging.\n\n## Installation\n\nInstall from GitHub as a Pi package:\n\n```bash\npi install git:github.com/NikolaiUgelvik/pi-semsearch\n```\n\nOr use a local checkout while developing:\n\n```bash\npi install /path/to/pi-semsearch\n```\n\nAfter changing extension code in a local checkout, reload Pi with `/reload`. If dependencies or `package.json` changed, restart Pi or reinstall/update the package.\n\n## Configuration\n\nCreate `.pi/semsearch.json` in the project root, `semsearch.pi.json` in the project root, or `~/.pi/semsearch.json` for global defaults:\n\n```json\n{\n  \"embedding\": {\n    \"baseURL\": \"https://api.openai.com/v1\",\n    \"apiKeyEnv\": \"OPENAI_API_KEY\",\n    \"model\": \"text-embedding-3-small\",\n    \"dimensions\": 1536\n  },\n  \"hyde\": {\n    \"threshold\": 0.35\n  }\n}\n```\n\nYou can also set `PI_SEMSEARCH_CONFIG=/path/to/semsearch.json`.\n\nMinimal environment-based configuration is supported:\n\n```bash\nexport OPENAI_API_KEY=...\nexport PI_SEMSEARCH_EMBEDDING_MODEL=text-embedding-3-small\n# optional; defaults to https://api.openai.com/v1\nexport PI_SEMSEARCH_EMBEDDING_BASE_URL=https://api.openai.com/v1\n```\n\nHyDE is enabled by default when embeddings are configured, and uses Pi's active model for query expansion. Disable it with `\"hyde\": { \"enabled\": false }`. You can also pin HyDE to an explicit OpenAI-compatible chat provider:\n\n```json\n{\n  \"embedding\": {\n    \"baseURL\": \"https://api.openai.com/v1\",\n    \"apiKeyEnv\": \"OPENAI_API_KEY\",\n    \"model\": \"text-embedding-3-small\"\n  },\n  \"hyde\": {\n    \"baseURL\": \"https://openrouter.ai/api/v1\",\n    \"apiKeyEnv\": \"OPENROUTER_API_KEY\",\n    \"model\": \"openai/gpt-4o-mini\",\n    \"threshold\": 0.35\n  }\n}\n```\n\n## Cache and database location\n\nIndexes are stored as SQLite databases outside the repository by default:\n\n```text\n${XDG_CACHE_HOME:-~/.cache}/pi/semsearch/\u003ccache-key\u003e/index.sqlite\n```\n\nOverride the cache root with either:\n\n```json\n{\n  \"cacheDir\": \"/some/path\"\n}\n```\n\nor:\n\n```bash\nexport PI_SEMSEARCH_CACHE_DIR=/some/path\n```\n\n## Commands\n\n- `/semsearch-refresh` — force-refresh the pi-semsearch index for the current project.\n\n## Development\n\nUse Node `24.x` and npm `11.x`:\n\n```bash\nnpm install\nnpm run check\nnpm run typecheck\nnpm test\nnpm run build\n```\n\nOn Node 24, if native `tree-sitter` builds from source, use:\n\n```bash\nCXXFLAGS='-std=c++20' npm install\n```\n\nThe package uses native Tree-sitter grammar bindings, `better-sqlite3`, `sqlite-vec`, and Vitest for tests.\n\n## Notes\n\nThe Pi package manifest loads extensions from `extensions/`; `extensions/pi-semsearch.ts` delegates to the implementation in `src/`. Runtime code uses Pi's extension API (`pi.registerTool`, `pi.registerCommand`, and lifecycle events) and Node file APIs so it can run under Pi's extension loader.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnikolaiugelvik%2Fpi-semsearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnikolaiugelvik%2Fpi-semsearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnikolaiugelvik%2Fpi-semsearch/lists"}