{"id":49250477,"url":"https://github.com/solomonneas/code-search-api","last_synced_at":"2026-04-25T00:02:58.664Z","repository":{"id":352742939,"uuid":"1186458556","full_name":"solomonneas/code-search-api","owner":"solomonneas","description":"Local semantic code search with Ollama embeddings, SQLite, and hybrid search. Index your codebase with language-aware chunking and find code by intent.","archived":false,"fork":false,"pushed_at":"2026-04-20T22:55:38.000Z","size":21,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-21T00:39:01.779Z","etag":null,"topics":["code-indexing","code-search","developer-tools","embeddings","fastapi","hybrid-search","local-first","ollama","semantic-search","sqlite"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/solomonneas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"solomonneas","ko_fi":"solomonneas","buy_me_a_coffee":"solomonneas"}},"created_at":"2026-03-19T16:39:56.000Z","updated_at":"2026-04-20T22:55:42.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/solomonneas/code-search-api","commit_stats":null,"previous_names":["solomonneas/code-search-api"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/solomonneas/code-search-api","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solomonneas%2Fcode-search-api","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solomonneas%2Fcode-search-api/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solomonneas%2Fcode-search-api/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solomonneas%2Fcode-search-api/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/solomonneas","download_url":"https://codeload.github.com/solomonneas/code-search-api/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solomonneas%2Fcode-search-api/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32245156,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-24T13:21:15.438Z","status":"ssl_error","status_checked_at":"2026-04-24T13:21:15.005Z","response_time":64,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["code-indexing","code-search","developer-tools","embeddings","fastapi","hybrid-search","local-first","ollama","semantic-search","sqlite"],"created_at":"2026-04-25T00:02:57.915Z","updated_at":"2026-04-25T00:02:58.650Z","avatar_url":"https://github.com/solomonneas.png","language":"Python","funding_links":["https://github.com/sponsors/solomonneas","https://ko-fi.com/solomonneas","https://buymeacoffee.com/solomonneas"],"categories":[],"sub_categories":[],"readme":"# Code Search API\n\n**Local semantic code search powered by Ollama embeddings and SQLite.**\n\n[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-3776AB?logo=python\u0026logoColor=white)](https://python.org)\n[![FastAPI](https://img.shields.io/badge/FastAPI-009688?logo=fastapi\u0026logoColor=white)](https://fastapi.tiangolo.com)\n[![Ollama](https://img.shields.io/badge/Ollama-local--first-000000?logo=ollama)](https://ollama.com)\n[![License: MIT](https://img.shields.io/badge/license-MIT-green)](LICENSE)\n\nIndex your codebase with language-aware chunking, generate LLM summaries per chunk, and search by intent instead of exact text. Everything runs locally. No cloud APIs, no vendor lock-in, no per-query costs.\n\n## How It Works\n\n```\nYour code repos\n      │\n      ▼\n File discovery ──► Language-aware chunking (Python, TS, Go, Rust, etc.)\n      │\n      ├──► Embedding via Ollama ──► packed float32 vectors in SQLite\n      │\n      └──► LLM summarization ──► summary + summary embedding in SQLite\n                                        │\n                                        ▼\n                              FastAPI search endpoint\n                                        │\n                              ┌─────────┴─────────┐\n                              │                   │\n                        Code vectors      Summary vectors\n                              │                   │\n                              └────── weighted ────┘\n                                        │\n                                        ▼\n                              Hybrid ranked results\n```\n\n1. **Chunking**: Files are split at logical boundaries (function/class definitions, not arbitrary line counts). Python, TypeScript, JavaScript, Go, Rust, Markdown, and config files are all handled with language-specific patterns.\n\n2. **Embedding**: Each chunk is embedded with your chosen Ollama model and stored as packed float32 BLOBs in SQLite. No vector database required.\n\n3. **Summarization**: An LLM generates a 1-2 sentence summary per chunk describing what the code *does*, not just what it *contains*. The summary gets its own embedding vector.\n\n4. **Hybrid search**: Queries match against both code embeddings (35% weight) and summary embeddings (65% weight). This means searching \"authentication flow\" finds auth code even if the word \"authentication\" never appears in variable names.\n\n## Quick Start\n\n```bash\ngit clone https://github.com/solomonneas/code-search-api.git\ncd code-search-api\npython3 -m venv .venv \u0026\u0026 source .venv/bin/activate\npip install -r requirements.txt\ncp .env.example .env    # edit CODE_SEARCH_WORKSPACE to point at your repos\n```\n\nPull an embedding model and start Ollama:\n\n```bash\nollama pull qwen3-embedding:8b\n```\n\nIndex your code, then start the server:\n\n```bash\nsource .env\npython3 run-index.py                          # first-time index\nuvicorn server:app --host 0.0.0.0 --port 5204\n```\n\nSearch:\n\n```bash\ncurl -s -X POST http://localhost:5204/api/search \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"query\": \"rate limiting middleware\", \"mode\": \"hybrid\"}'\n```\n\n## Embedding Models\n\nThe embedding model is the most important choice. It determines search quality.\n\n**Recommended: `qwen3-embedding:8b`** (what this project was built on)\n\n| Model | Params | VRAM | Quality | Speed | Best For |\n|-------|--------|------|---------|-------|----------|\n| **qwen3-embedding:8b** | 8B | ~6 GB | ★★★★★ | ★★★☆☆ | Best overall. Strong code + multilingual understanding. **Recommended.** |\n| qwen3-embedding:4b | 4B | ~3 GB | ★★★★☆ | ★★★★☆ | Good balance if VRAM is tight |\n| qwen3-embedding:0.6b | 0.6B | ~500 MB | ★★★☆☆ | ★★★★★ | Laptop/low-resource environments |\n| nomic-embed-text | 137M | ~300 MB | ★★★☆☆ | ★★★★★ | Lightweight, fast, proven. Good starter model. |\n| mxbai-embed-large | 335M | ~700 MB | ★★★½☆ | ★★★★☆ | Strong English performance |\n| bge-m3 | 567M | ~1 GB | ★★★★☆ | ★★★★☆ | Excellent multilingual support |\n| snowflake-arctic-embed2 | 568M | ~1 GB | ★★★★☆ | ★★★★☆ | Strong multilingual, good scaling |\n| nomic-embed-text-v2-moe | MoE | ~500 MB | ★★★★☆ | ★★★★☆ | Multilingual MoE, efficient |\n\nPull your chosen model:\n\n```bash\nollama pull qwen3-embedding:8b    # recommended\n# or\nollama pull nomic-embed-text      # lightweight alternative\n```\n\nSet it in `.env`:\n\n```\nCODE_SEARCH_EMBED_MODEL=qwen3-embedding:8b\n```\n\n\u003e **Note:** Changing the embedding model after indexing requires a full re-index since vector dimensions and similarity spaces differ between models.\n\n## Summary Models\n\nSummaries are what make hybrid search work. The summarizer reads each code chunk and writes a 1-2 sentence description of what it *does*. That summary gets its own embedding, so you can find code by describing behavior.\n\n**Be realistic about model quality here.** A tiny quantized local model will produce vague, useless summaries like \"This file contains code.\" That defeats the purpose. You need a model that can actually read code and explain it.\n\n### If you have Ollama Pro (cloud models via Ollama)\n\nBest option. Cloud-quality summaries with zero API key management:\n\n| Model | Quality | Speed | Notes |\n|-------|---------|-------|-------|\n| **qwen3-coder-next:cloud** | ★★★★★ | ★★★★☆ | Code specialist. Recommended. |\n| deepseek-v3.2:cloud | ★★★★½ | ★★★★★ | Fast, strong general coding |\n| glm-5:cloud | ★★★★★ | ★★★☆☆ | Best raw quality, slower |\n| minimax-m2.5:cloud | ★★★★☆ | ★★★★☆ | Good all-around |\n\n### If running local models only\n\nYou need at least a 14B+ parameter model to get useful code summaries. Anything smaller will hallucinate function names and produce generic descriptions that don't help search.\n\n| Model | Params | VRAM | Quality | Notes |\n|-------|--------|------|---------|-------|\n| qwen3:32b | 32B | ~20 GB | ★★★★☆ | Best local option if you have the VRAM |\n| qwen3:14b | 14B | ~10 GB | ★★★½☆ | Minimum viable for code summaries |\n| codellama:34b | 34B | ~22 GB | ★★★★☆ | Strong code understanding |\n| deepseek-coder-v2:16b | 16B | ~11 GB | ★★★½☆ | Decent code summaries |\n\n**Models to avoid for summarization:**\n\n| Model | Why |\n|-------|-----|\n| Any model \u003c 7B | Summaries will be too vague to improve search |\n| Heavily quantized (Q2, Q3) | Quality degrades to the point of being worse than no summary |\n| Embedding models | These can't generate text, only vectors |\n\nSet your summary model in `.env`:\n\n```\nCODE_SEARCH_SUMMARY_MODEL=qwen3-coder-next:cloud    # Ollama Pro\n# or\nCODE_SEARCH_SUMMARY_MODEL=qwen3:32b                  # local, needs ~20GB VRAM\n```\n\n## API Endpoints\n\n| Method | Path | Auth | Description |\n|--------|------|------|-------------|\n| `GET` | `/health` | No | Liveness check |\n| `GET` | `/api/health` | No | Health + index stats (chunks, embedded, summarized) |\n| `POST` | `/api/search` | Yes | Semantic search with hybrid, code, or summary mode |\n| `POST` | `/api/index` | Yes | Trigger background indexing run |\n| `POST` | `/api/backfill-summaries` | Yes | Generate summaries for unsummarized chunks |\n| `GET` | `/api/projects` | Yes | Per-project chunk and summary counts |\n| `GET` | `/api/stats` | No | Chunk type breakdown and project coverage |\n| `GET` | `/api/summary-stats` | Yes | Summary counts by model |\n\n### Search request\n\n```json\n{\n  \"query\": \"websocket authentication middleware\",\n  \"mode\": \"hybrid\",\n  \"limit\": 10,\n  \"min_score\": 0.3,\n  \"project\": \"my-api\"\n}\n```\n\n**Modes:**\n- `hybrid` (default): Weighted combination of code + summary similarity. Best for most searches.\n- `code`: Raw code embedding match only. Use when searching for exact patterns.\n- `summary`: Summary embedding match only. Use when searching by high-level intent.\n\n## Configuration\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `CODE_SEARCH_WORKSPACE` | `./repos` | Root directory to scan for code |\n| `CODE_SEARCH_REFERENCE` | *(unset)* | Optional second directory for reference docs |\n| `CODE_SEARCH_DB` | `./code_index.db` | SQLite database path |\n| `CODE_SEARCH_API_KEY` | *(unset)* | API key for protected endpoints. Unset = no auth. |\n| `CODE_SEARCH_CORS_ORIGINS` | `*` | Comma-separated CORS origins |\n| `OLLAMA_URL` | `http://localhost:11434` | Ollama API base URL |\n| `CODE_SEARCH_EMBED_MODEL` | `qwen3-embedding:8b` | Embedding model |\n| `CODE_SEARCH_SUMMARY_MODEL` | `qwen3-coder-next:cloud` | Primary summarization model |\n| `CODE_SEARCH_SUMMARY_FALLBACK` | `qwen3-coder-next:cloud` | Fallback summarization model |\n| `CODE_SEARCH_SUMMARY_WORKERS` | `4` | Parallel summary generation workers |\n| `CODE_SEARCH_DB_BATCH_SIZE` | `100` | DB write batch size |\n| `CODE_SEARCH_CACHE_TTL_SECONDS` | `3600` | Query embedding cache TTL |\n\n## Helper Scripts\n\n| Script | Purpose |\n|--------|---------|\n| `run-index.py` | CLI indexer for first-time or batch re-indexing |\n| `index-then-summarize.sh` | Full pipeline: index new chunks, then summarize |\n| `backup-db.sh` | Rotated SQLite backup (configurable retention) |\n\n## Supported Languages\n\nChunking is language-aware for: Python, TypeScript/TSX, JavaScript/JSX, Go, Rust, Markdown, Astro, HTML, CSS, Shell, JSON, YAML, TOML.\n\nOther text files are indexed as flat chunks.\n\n## Requirements\n\n- Python 3.10+\n- [Ollama](https://ollama.com) running locally (or on a reachable host)\n- An embedding model pulled in Ollama\n- ~500 MB to 6 GB VRAM depending on embedding model choice\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsolomonneas%2Fcode-search-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsolomonneas%2Fcode-search-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsolomonneas%2Fcode-search-api/lists"}