{"id":44860400,"url":"https://github.com/cocoindex-io/cocoindex-code","last_synced_at":"2026-04-25T01:04:42.371Z","repository":{"id":338945815,"uuid":"1146989733","full_name":"cocoindex-io/cocoindex-code","owner":"cocoindex-io","description":"A super light-weight embedded code search engine CLI (AST based) that just works - saves 70% token and improves speed for coding agent  🌟 Star if you like it!","archived":false,"fork":false,"pushed_at":"2026-04-09T20:07:08.000Z","size":867,"stargazers_count":1289,"open_issues_count":15,"forks_count":94,"subscribers_count":11,"default_branch":"main","last_synced_at":"2026-04-09T21:28:13.217Z","etag":null,"topics":["agents","ast","cocoindex","code-search","coding-agent","context-engineering","indexing","mcp","python","tree-sitter"],"latest_commit_sha":null,"homepage":"https://cocoindex.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cocoindex-io.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-01T02:16:56.000Z","updated_at":"2026-04-09T20:50:18.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/cocoindex-io/cocoindex-code","commit_stats":null,"previous_names":["cocoindex-io/cocoindex-code"],"tags_count":29,"template":false,"template_full_name":null,"purl":"pkg:github/cocoindex-io/cocoindex-code","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cocoindex-io%2Fcocoindex-code","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cocoindex-io%2Fcocoindex-code/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cocoindex-io%2Fcocoindex-code/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cocoindex-io%2Fcocoindex-code/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cocoindex-io","download_url":"https://codeload.github.com/cocoindex-io/cocoindex-code/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cocoindex-io%2Fcocoindex-code/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31785681,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-14T02:24:21.117Z","status":"ssl_error","status_checked_at":"2026-04-14T02:24:20.627Z","response_time":153,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","ast","cocoindex","code-search","coding-agent","context-engineering","indexing","mcp","python","tree-sitter"],"created_at":"2026-02-17T09:06:35.023Z","updated_at":"2026-04-25T01:04:42.355Z","avatar_url":"https://github.com/cocoindex-io.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n\u003cimg width=\"2428\" alt=\"cocoindex code\" src=\"https://github.com/user-attachments/assets/d05961b4-0b7b-42ea-834a-59c3c01717ca\" /\u003e\n\u003c/p\u003e\n\n\n\u003ch1 align=\"center\"\u003eAST-based semantic code search that just works\u003c/h1\u003e\n\n![effect](https://github.com/user-attachments/assets/cb3a4cae-0e1f-49c4-890b-7bb93317ab60)\n\n\nA lightweight, effective **(AST-based)** semantic code search tool for your codebase. Built on [CocoIndex](https://github.com/cocoindex-io/cocoindex) — a Rust-based ultra performant data transformation engine. Use it from the CLI, or integrate with Claude, Codex, Cursor — any coding agent — via [Skill](#skill-recommended) or [MCP](#mcp-server).\n\n- Instant token saving by 70%.\n- **1 min setup** — install and go, zero config needed!\n\n\u003cdiv align=\"center\"\u003e\n\n[![Discord](https://img.shields.io/discord/1314801574169673738?logo=discord\u0026color=5B5BD6\u0026logoColor=white)](https://discord.com/invite/zpA9S2DR7s)\n[![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex)\n[![Documentation](https://img.shields.io/badge/Documentation-394e79?logo=readthedocs\u0026logoColor=00B9FF)](https://cocoindex.io/docs/getting_started/quickstart)\n[![License](https://img.shields.io/badge/license-Apache%202.0-5B5BD6?logoColor=white)](https://opensource.org/licenses/Apache-2.0)\n\u003c!--[![PyPI - Downloads](https://img.shields.io/pypi/dm/cocoindex)](https://pypistats.org/packages/cocoindex) --\u003e\n[![PyPI Downloads](https://static.pepy.tech/badge/cocoindex/month)](https://pepy.tech/projects/cocoindex)\n[![CI](https://github.com/cocoindex-io/cocoindex/actions/workflows/CI.yml/badge.svg?event=push\u0026color=5B5BD6)](https://github.com/cocoindex-io/cocoindex/actions/workflows/CI.yml)\n[![release](https://github.com/cocoindex-io/cocoindex/actions/workflows/release.yml/badge.svg?event=push\u0026color=5B5BD6)](https://github.com/cocoindex-io/cocoindex/actions/workflows/release.yml)\n\n\n🌟 Please help star [CocoIndex](https://github.com/cocoindex-io/cocoindex) if you like this project!\n\n[Deutsch](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=de) |\n[English](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=en) |\n[Español](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=es) |\n[français](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=fr) |\n[日本語](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=ja) |\n[한국어](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=ko) |\n[Português](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=pt) |\n[Русский](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=ru) |\n[中文](https://readme-i18n.com/cocoindex-io/cocoindex-code?lang=zh)\n\n\u003c/div\u003e\n\n\n## Get Started — zero config, let's go!\n\n### Install\n\nUsing [pipx](https://pipx.pypa.io/stable/installation/):\n```bash\npipx install 'cocoindex-code[full]'          # batteries included (local embeddings)\npipx upgrade cocoindex-code                  # upgrade\n```\n\nUsing [uv](https://docs.astral.sh/uv/getting-started/installation/):\n```bash\nuv tool install --upgrade 'cocoindex-code[full]'\n```\n\nTwo install styles — they mirror the Docker image variants of the same names:\n- `cocoindex-code[full]` — batteries-included. Pulls in `sentence-transformers` so local embeddings (no API key required) work out of the box. The `ccc init` interactive prompt defaults to [Snowflake/snowflake-arctic-embed-xs](https://huggingface.co/Snowflake/snowflake-arctic-embed-xs).\n- `cocoindex-code` (slim) — LiteLLM-only; requires a cloud embedding provider and API key. Use when you don't want the local-embedding deps (~1 GB of torch + transformers).\n\nNext, set up your [coding agent integration](#coding-agent-integration) — or jump to [Manual CLI Usage](#manual-cli-usage) if you prefer direct control.\n\n## Coding Agent Integration\n\n### Skill (Recommended)\n\nInstall the `ccc` skill so your coding agent automatically uses semantic search when needed:\n\n```bash\nnpx skills add cocoindex-io/cocoindex-code\n```\n\nThat's it — no `ccc init` or `ccc index` needed. The skill teaches the agent to handle initialization, indexing, and searching on its own. It will automatically keep the index up to date as you work.\n\nThe agent uses semantic search automatically when it would be helpful. You can also nudge it explicitly — just ask it to search the codebase, e.g. *\"find how user sessions are managed\"*, or type `/ccc` to invoke the skill directly.\n\nWorks with [Claude Code](https://docs.anthropic.com/en/docs/claude-code) and other skill-compatible agents.\n\n### MCP Server\n\nAlternatively, use `ccc mcp` to run as an MCP server:\n\n\u003cdetails\u003e\n\u003csummary\u003eClaude Code\u003c/summary\u003e\n\n```bash\nclaude mcp add cocoindex-code -- ccc mcp\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eCodex\u003c/summary\u003e\n\n```bash\ncodex mcp add cocoindex-code -- ccc mcp\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eOpenCode\u003c/summary\u003e\n\n```bash\nopencode mcp add\n```\nEnter MCP server name: `cocoindex-code`\nSelect MCP server type: `local`\nEnter command to run: `ccc mcp`\n\nOr use opencode.json:\n```json\n{\n  \"$schema\": \"https://opencode.ai/config.json\",\n  \"mcp\": {\n    \"cocoindex-code\": {\n      \"type\": \"local\",\n      \"command\": [\n        \"ccc\", \"mcp\"\n      ]\n    }\n  }\n}\n```\n\u003c/details\u003e\n\nOnce configured, the agent automatically decides when semantic code search is helpful — finding code by description, exploring unfamiliar codebases, fuzzy/conceptual matches, or locating implementations without knowing exact names.\n\n\u003e **Note:** The `cocoindex-code` command (without subcommand) still works as an MCP server for backward compatibility. It auto-creates settings from environment variables on first run.\n\n\u003cdetails\u003e\n\u003csummary\u003eMCP Tool Reference\u003c/summary\u003e\n\nWhen running as an MCP server (`ccc mcp`), the following tool is exposed:\n\n**`search`** — Search the codebase using semantic similarity.\n\n```\nsearch(\n    query: str,                          # Natural language query or code snippet\n    limit: int = 5,                      # Maximum results (1-100)\n    offset: int = 0,                     # Pagination offset\n    refresh_index: bool = True,          # Refresh index before querying\n    languages: list[str] | None = None,  # Filter by language (e.g. [\"python\", \"typescript\"])\n    paths: list[str] | None = None,      # Filter by path glob (e.g. [\"src/utils/*\"])\n)\n```\n\nReturns matching code chunks with file path, language, code content, line numbers, and similarity score.\n\u003c/details\u003e\n\n## Manual CLI Usage\n\nYou can also use the CLI directly — useful for manual control, running indexing after changing settings, checking status, or searching outside an agent.\n\n```bash\nccc init                                # initialize project (creates settings)\nccc index                               # build the index\nccc search \"authentication logic\"       # search!\n```\n\nThe background daemon starts automatically on first use.\n\n\u003e **Tip:** `ccc index` auto-initializes if you haven't run `ccc init` yet, so you can skip straight to indexing.\n\n### CLI Reference\n\n| Command | Description |\n|---------|-------------|\n| `ccc init` | Initialize a project — creates settings files, adds `.cocoindex_code/` to `.gitignore` |\n| `ccc index` | Build or update the index (auto-inits if needed). Shows streaming progress. |\n| `ccc search \u003cquery\u003e` | Semantic search across the codebase |\n| `ccc status` | Show index stats (chunk count, file count, language breakdown) |\n| `ccc mcp` | Run as MCP server in stdio mode |\n| `ccc doctor` | Run diagnostics — checks settings, daemon, model, file matching, and index health |\n| `ccc reset` | Delete index databases. `--all` also removes settings. `-f` skips confirmation. |\n| `ccc daemon status` | Show daemon version, uptime, and loaded projects |\n| `ccc daemon restart` | Restart the background daemon |\n| `ccc daemon stop` | Stop the daemon |\n\n### Search Options\n\n```bash\nccc search database schema                           # basic search\nccc search --lang python --lang markdown schema      # filter by language\nccc search --path 'src/utils/*' query handler        # filter by path\nccc search --offset 10 --limit 5 database schema     # pagination\nccc search --refresh database schema                 # update index first, then search\n```\n\nBy default, `ccc search` scopes results to your current working directory (relative to the project root). Use `--path` to override.\n\n## Docker\n\nA Docker image is available for teams who want a reproducible, dependency-free\nsetup — no Python, `uv`, or system dependencies required on the host.\n\nThe recommended approach is a **persistent container**: start it once, and use\n`docker exec` to run CLI commands or connect MCP sessions to it. The daemon\ninside stays warm across sessions, so the embedding model is loaded only once.\n\n### Choosing an image\n\nTwo variants are published from each release:\n\n| Tag | Size | Embedding backends | When to pick |\n|---|---|---|---|\n| `cocoindex/cocoindex-code:latest` (slim, default) | ~450 MB | LiteLLM (cloud: OpenAI, Voyage, Gemini, Ollama, …) | Most users. Cloud-backed embeddings, smaller image, fast pulls. |\n| `cocoindex/cocoindex-code:full` | ~5 GB | sentence-transformers (local) + LiteLLM | When you want local embeddings without an API key, or an offline-ready container. Heavier because of torch + transformers. |\n\nThe rest of this section uses `:latest` — substitute `:full` in the `image:` /\n`docker run` commands if you want the full variant.\n\n\u003e **Mac users running the `:full` variant:** local embedding inference is\n\u003e CPU-only inside Docker, because Docker on macOS can't access Apple's Metal\n\u003e (MPS) GPU. If you want local embeddings and fast inference, install\n\u003e natively instead: `pipx install 'cocoindex-code[full]'`. The `:latest`\n\u003e (slim) variant is unaffected — LiteLLM runs the model on the provider's\n\u003e side, so Docker vs. native makes no difference.\n\n### Quick start — `docker compose up -d`\n\nBring it up in one line — no clone needed (bash / zsh):\n\n```bash\n# macOS / Windows\ndocker compose -f \u003c(curl -L https://raw.githubusercontent.com/cocoindex-io/cocoindex-code/refs/heads/main/docker/docker-compose.yml) up -d\n\n# Linux (aligns file ownership on bind-mounted paths with your host user)\nPUID=$(id -u) PGID=$(id -g) docker compose -f \u003c(curl -L https://raw.githubusercontent.com/cocoindex-io/cocoindex-code/refs/heads/main/docker/docker-compose.yml) up -d\n```\n\nOr grab [`docker/docker-compose.yml`](./docker/docker-compose.yml) and run `docker compose up -d` next to it (works on any shell, including Windows cmd / PowerShell).\n\nBy default your home directory is mounted into the container (set\n`COCOINDEX_HOST_WORKSPACE` to narrow this to a specific code folder). Index\ndata and the embedding model cache persist in a Docker volume across\nrestarts. Your global settings file at `$HOME/.cocoindex_code/global_settings.yml`\nis visible and editable on the host; edits take effect on your next `ccc` command.\n\n\u003e **Pick a different image:** set `COCOINDEX_CODE_IMAGE` to override the\n\u003e default. For example, the `:full` variant or GHCR:\n\u003e ```bash\n\u003e COCOINDEX_CODE_IMAGE=cocoindex/cocoindex-code:full docker compose up -d\n\u003e COCOINDEX_CODE_IMAGE=ghcr.io/cocoindex-io/cocoindex-code:latest docker compose up -d\n\u003e ```\n\n### Or: `docker run`\n\n\u003cdetails\u003e\n\u003csummary\u003eDocker Desktop (macOS / Windows)\u003c/summary\u003e\n\n```bash\ndocker run -d --name cocoindex-code \\\n  --volume \"$HOME:/workspace\" \\\n  --volume cocoindex-data:/var/cocoindex \\\n  -e COCOINDEX_CODE_HOST_PATH_MAPPING=\"/workspace=$HOME\" \\\n  cocoindex/cocoindex-code:latest\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eLinux (with \u003ccode\u003ePUID\u003c/code\u003e/\u003ccode\u003ePGID\u003c/code\u003e)\u003c/summary\u003e\n\n```bash\ndocker run -d --name cocoindex-code \\\n  -e PUID=$(id -u) -e PGID=$(id -g) \\\n  --volume \"$HOME:/workspace\" \\\n  --volume cocoindex-data:/var/cocoindex \\\n  -e COCOINDEX_CODE_HOST_PATH_MAPPING=\"/workspace=$HOME\" \\\n  cocoindex/cocoindex-code:latest\n```\n\u003c/details\u003e\n\n### Shell wrapper for `ccc` commands\n\nPaste this into `~/.bashrc` / `~/.zshrc` so `ccc` feels native on the host\nand picks up the right project based on your current directory:\n\n```bash\nccc() {\n  docker exec -it -e COCOINDEX_CODE_HOST_CWD=\"$PWD\" cocoindex-code ccc \"$@\"\n}\n```\n\nNow `cd` into any project under your workspace and run `ccc init`, `ccc index`,\n`ccc search ...`, `ccc status`, etc. — it just works.\n\n### Connect your coding agent\n\n\u003cdetails\u003e\n\u003csummary\u003eClaude Code\u003c/summary\u003e\n\nRegister MCP from inside the target project so `$PWD` points there:\n\n```bash\nclaude mcp add cocoindex-code -- docker exec -i \\\n  -e COCOINDEX_CODE_HOST_CWD=\"$PWD\" cocoindex-code ccc mcp\n```\n\nOr via `.mcp.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"cocoindex-code\": {\n      \"type\": \"stdio\",\n      \"command\": \"docker\",\n      \"args\": [\n        \"exec\",\n        \"-i\",\n        \"-e\",\n        \"COCOINDEX_CODE_HOST_CWD=${PWD}\",\n        \"cocoindex-code\",\n        \"ccc\",\n        \"mcp\"\n      ]\n    }\n  }\n}\n```\n\n\u003e Note: use `-i` (not `-it`). The `-t` flag allocates a terminal, which\n\u003e interferes with MCP's JSON messaging over stdin/stdout — only add it for\n\u003e interactive `ccc` commands like `ccc init`.\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eCodex\u003c/summary\u003e\n\n```bash\ncodex mcp add cocoindex-code -- docker exec -i \\\n  -e COCOINDEX_CODE_HOST_CWD=\"$PWD\" cocoindex-code ccc mcp\n```\n\u003c/details\u003e\n\n### Upgrading from an older image\n\nEarlier images used separate `cocoindex-db` and `cocoindex-model-cache`\nvolumes; the current image consolidates them into a single `cocoindex-data`\nvolume. Before pulling the new image, drop the old container and volumes —\nindexes rebuild on your next `ccc index`, and the embedding model is\nre-populated automatically on first start:\n\n```bash\ndocker rm -f cocoindex-code\ndocker volume rm cocoindex-db cocoindex-model-cache\n```\n\n### Configuration via environment variables\n\nPass configuration to `docker run` / compose with `-e`:\n\n```bash\n# Extra extensions (e.g. Typesafe Config, SBT build files)\n-e COCOINDEX_CODE_EXTRA_EXTENSIONS=\"conf,sbt\"\n\n# Exclude build artefacts (Scala/SBT example)\n-e COCOINDEX_CODE_EXCLUDE_PATTERNS='[\"**/target/**\",\"**/.bloop/**\",\"**/.metals/**\"]'\n\n# Set an API key\n-e VOYAGE_API_KEY=your-key\n```\n\n\u003e **Security note:** mounting `$HOME` gives the container read/write access\n\u003e to everything under it. If that's too broad, bind-mount a narrower\n\u003e directory instead (`COCOINDEX_HOST_WORKSPACE=/path/to/code`).\n\n### Build the image locally\n\n```bash\ndocker build -t cocoindex-code:local -f docker/Dockerfile .\n```\n\n## Features\n- **Semantic Code Search**: Find relevant code using natural language queries when grep doesn't work well, and save tokens immediately.\n- **Ultra Performant**: ⚡ Built on top of ultra performant [Rust indexing engine](https://github.com/cocoindex-io/cocoindex). Only re-indexes changed files for fast updates.\n- **Multi-Language Support**: Python, JavaScript/TypeScript, Rust, Go, Java, C/C++, C#, SQL, Shell, and more.\n- **Embedded**: Portable and just works, no database setup required!\n- **Flexible Embeddings**: Local SentenceTransformers via the `[full]` extra (free, no API key!) or 100+ cloud providers via LiteLLM.\n\n## Configuration\n\nConfiguration lives in two YAML files, both created automatically by `ccc init`.\n\n### User Settings (`~/.cocoindex_code/global_settings.yml`)\n\nShared across all projects. Controls the embedding model and environment variables for the daemon.\n\n```yaml\nembedding:\n  provider: sentence-transformers                    # or \"litellm\"\n  model: Snowflake/snowflake-arctic-embed-xs\n  device: mps                                        # optional: cpu, cuda, mps (auto-detected if omitted)\n  min_interval_ms: 300                               # optional: pace LiteLLM embedding requests to reduce 429s; defaults to 5 for LiteLLM\n\n  # Optional extra kwargs passed to the embedder, separately for indexing vs query.\n  # `ccc init` auto-populates these for known models (e.g. Cohere, Voyage, Nvidia NIM,\n  # nomic-ai code-retrieval models, Snowflake arctic-embed).\n  # indexing_params:\n  #   input_type: search_document        # litellm: input_type, dimensions\n  # query_params:\n  #   input_type: search_query           # sentence-transformers: prompt_name\n\nenvs:                                                # extra environment variables for the daemon\n  OPENAI_API_KEY: your-key                           # only needed if not already in your shell environment\n```\n\n\u003e **Note:** The daemon inherits your shell environment. If an API key (e.g. `OPENAI_API_KEY`) is already set as an environment variable, you don't need to duplicate it in `envs`. The `envs` field is only for values that aren't in your environment.\n\n\u003e **Custom location:** set `COCOINDEX_CODE_DIR` to place `global_settings.yml` somewhere other than `~/.cocoindex_code/` — useful if you want the file to live alongside your projects (e.g. on a synced folder).\n\n#### `indexing_params` / `query_params`\n\nSome embedding models expose different modes for documents vs queries (asymmetric retrieval). For example, Cohere's v3 models want `input_type: search_document` when embedding corpus content and `input_type: search_query` when embedding a user query; several SentenceTransformers models use `prompt_name: passage` / `prompt_name: query` for the same purpose. These knobs live under `indexing_params` and `query_params`:\n\n```yaml\nembedding:\n  provider: litellm\n  model: cohere/embed-english-v3.0\n  indexing_params:\n    input_type: search_document\n  query_params:\n    input_type: search_query\n```\n\n`ccc init` populates these automatically for models it recognizes — including all Cohere v3, Voyage, Nvidia NIM, Gemini embedding (`gemini/gemini-embedding-*`, `gemini/text-embedding-*`, `gemini/embedding-*` — LiteLLM auto-maps `input_type` to Gemini's `task_type`), `nomic-ai/CodeRankEmbed`, `nomic-ai/nomic-embed-code`, `nomic-ai/nomic-embed-text-v1`/`v1.5`, `mixedbread-ai/mxbai-embed-large-v1`, and the `Snowflake/snowflake-arctic-embed-*` family — and prints the chosen defaults. For other models, it leaves a commented-out template under `embedding:` so you can fill it in by hand.\n\nOpenAI embeddings (`text-embedding-3-*`, `text-embedding-ada-002`) are intentionally not in the list: they're symmetric and have no equivalent knob.\n\n**Accepted keys:** `prompt_name` (sentence-transformers), `input_type` and `dimensions` (litellm). Other keys are rejected at daemon startup with a clear error.\n\n**Doctor checks both sides.** `ccc doctor` exercises the model once with `indexing_params` and once with `query_params`, reporting each as a separate `Model Check (indexing)` / `Model Check (query)` entry — so a misconfiguration on one side is diagnosable without hiding behind the other.\n\n**Legacy-bridge warning:** if you're upgrading from an earlier version and your `global_settings.yml` uses `nomic-ai/CodeRankEmbed` or `nomic-ai/nomic-embed-code` without `indexing_params` / `query_params`, the daemon continues to apply the previous behavior (`prompt_name: query` at query time) and prints a one-time warning asking you to make the setting explicit. You can silence the warning by adding an empty block such as `query_params: {}`.\n\n### Project Settings (`\u003cproject\u003e/.cocoindex_code/settings.yml`)\n\nPer-project. Controls which files to index.\n\n```yaml\ninclude_patterns:\n  - \"**/*.py\"\n  - \"**/*.js\"\n  - \"**/*.ts\"\n  - \"**/*.rs\"\n  - \"**/*.go\"\n  # ... (sensible defaults for 28+ file types)\n\nexclude_patterns:\n  - \"**/.*\"                # hidden directories\n  - \"**/__pycache__\"\n  - \"**/node_modules\"\n  - \"**/dist\"\n  # ...\n\nlanguage_overrides:\n  - ext: inc               # treat .inc files as PHP\n    lang: php\n\nchunkers:\n  - ext: toml              # use a custom chunker for .toml files\n    module: example_toml_chunker:toml_chunker\n```\n\n\u003e `.cocoindex_code/` is automatically added to `.gitignore` during init.\n\nUse `chunkers` when you want to control how a file type is split into chunks before indexing.\n\n`module: example_toml_chunker:toml_chunker` means:\n- `example_toml_chunker` is a local Python module\n- `toml_chunker` is the function inside that module\n\nIn practice, this usually means:\n- you create a Python file in your project, for example `example_toml_chunker.py`\n- you add a function in that file\n- you point `settings.yml` at it with `module.path:function_name`\n\nThe function should use this signature:\n\n```python\nfrom pathlib import Path\nfrom cocoindex_code.chunking import Chunk\n\ndef my_chunker(path: Path, content: str) -\u003e tuple[str | None, list[Chunk]]:\n    ...\n```\n\n- `path` is the file being indexed\n- `content` is the full text of that file\n- return `language_override` as a string like `\"toml\"` if you want to override language detection\n- return `None` as `language_override` if you want to keep the detected language\n- return a `list[Chunk]` with the chunks you want stored in the index\n\nSee [`src/cocoindex_code/chunking.py`](./src/cocoindex_code/chunking.py) for the public types and [`tests/example_toml_chunker.py`](./tests/example_toml_chunker.py) for a complete example.\n\n## Embedding Models\n\nWith the `[full]` extra installed, `ccc init` defaults to a local SentenceTransformers model ([Snowflake/snowflake-arctic-embed-xs](https://huggingface.co/Snowflake/snowflake-arctic-embed-xs)) — no API key required. To use a different model, edit `~/.cocoindex_code/global_settings.yml`.\n\n\u003e The `envs` entries below are only needed if the key isn't already in your shell environment — the daemon inherits your environment automatically.\n\n\u003cdetails\u003e\n\u003csummary\u003eOllama (Local)\u003c/summary\u003e\n\n```yaml\nembedding:\n  model: ollama/nomic-embed-text\n```\n\nSet `OLLAMA_API_BASE` in `envs:` if your Ollama server is not at `http://localhost:11434`.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eOpenAI\u003c/summary\u003e\n\n```yaml\nembedding:\n  model: text-embedding-3-small\n  min_interval_ms: 300                               # optional: override the 5ms LiteLLM default\nenvs:\n  OPENAI_API_KEY: your-api-key\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eAzure OpenAI\u003c/summary\u003e\n\n```yaml\nembedding:\n  model: azure/your-deployment-name\nenvs:\n  AZURE_API_KEY: your-api-key\n  AZURE_API_BASE: https://your-resource.openai.azure.com\n  AZURE_API_VERSION: \"2024-06-01\"\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eGemini\u003c/summary\u003e\n\n```yaml\nembedding:\n  model: gemini/gemini-embedding-001\nenvs:\n  GEMINI_API_KEY: your-api-key\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eMistral\u003c/summary\u003e\n\n```yaml\nembedding:\n  model: mistral/mistral-embed\nenvs:\n  MISTRAL_API_KEY: your-api-key\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eVoyage (Code-Optimized)\u003c/summary\u003e\n\n```yaml\nembedding:\n  model: voyage/voyage-code-3\nenvs:\n  VOYAGE_API_KEY: your-api-key\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eCohere\u003c/summary\u003e\n\n```yaml\nembedding:\n  model: cohere/embed-v4.0\nenvs:\n  COHERE_API_KEY: your-api-key\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eAWS Bedrock\u003c/summary\u003e\n\n```yaml\nembedding:\n  model: bedrock/amazon.titan-embed-text-v2:0\nenvs:\n  AWS_ACCESS_KEY_ID: your-access-key\n  AWS_SECRET_ACCESS_KEY: your-secret-key\n  AWS_REGION_NAME: us-east-1\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eNebius\u003c/summary\u003e\n\n```yaml\nembedding:\n  model: nebius/BAAI/bge-en-icl\nenvs:\n  NEBIUS_API_KEY: your-api-key\n```\n\n\u003c/details\u003e\n\nAny [LiteLLM-supported model](https://docs.litellm.ai/docs/embedding/supported_embedding) works. When using a LiteLLM model, set `provider: litellm` (or omit `provider` — LiteLLM is the default for non-`sentence-transformers` models).\n\n### Local SentenceTransformers Models\n\nSet `provider: sentence-transformers` and use any [SentenceTransformers](https://www.sbert.net/) model (no API key required).\n\n**Example — general purpose text model:**\n```yaml\nembedding:\n  provider: sentence-transformers\n  model: nomic-ai/nomic-embed-text-v1.5\n```\n\n**GPU-optimised code retrieval:**\n\n[`nomic-ai/CodeRankEmbed`](https://huggingface.co/nomic-ai/CodeRankEmbed) delivers significantly better code retrieval than the default model. It is 137M parameters, requires ~1 GB VRAM, and has an 8192-token context window.\n\n```yaml\nembedding:\n  provider: sentence-transformers\n  model: nomic-ai/CodeRankEmbed\n```\n\n**Note:** Switching models requires re-indexing your codebase (`ccc reset \u0026\u0026 ccc index`) since the vector dimensions differ.\n\n## Supported Languages\n\n| Language | Aliases | File Extensions |\n|----------|---------|-----------------|\n| c | | `.c` |\n| cpp | c++ | `.cpp`, `.cc`, `.cxx`, `.h`, `.hpp` |\n| csharp | csharp, cs | `.cs` |\n| css | | `.css`, `.scss` |\n| dtd | | `.dtd` |\n| fortran | f, f90, f95, f03 | `.f`, `.f90`, `.f95`, `.f03` |\n| go | golang | `.go` |\n| html | | `.html`, `.htm` |\n| java | | `.java` |\n| javascript | js | `.js` |\n| json | | `.json` |\n| kotlin | | `.kt`, `.kts` |\n| lua | | `.lua` |\n| markdown | md | `.md`, `.mdx` |\n| pascal | pas, dpr, delphi | `.pas`, `.dpr` |\n| php | | `.php` |\n| python | | `.py` |\n| r | | `.r` |\n| ruby | | `.rb` |\n| rust | rs | `.rs` |\n| scala | | `.scala` |\n| solidity | | `.sol` |\n| sql | | `.sql` |\n| swift | | `.swift` |\n| toml | | `.toml` |\n| tsx | | `.tsx` |\n| typescript | ts | `.ts` |\n| xml | | `.xml` |\n| yaml | | `.yaml`, `.yml` |\n\n### Custom Database Location\n\nBy default, index databases (`cocoindex.db` and `target_sqlite.db`) live alongside settings in `\u003cproject\u003e/.cocoindex_code/`. When running in Docker, you may want the databases on the container's native filesystem for performance (LMDB doesn't work well on mounted volumes) while keeping the source code and settings on a mounted volume.\n\nSet `COCOINDEX_CODE_DB_PATH_MAPPING` to remap database locations by path prefix:\n\n```bash\nCOCOINDEX_CODE_DB_PATH_MAPPING=/workspace=/db-files\n```\n\nWith this mapping, a project at `/workspace/myrepo` stores its databases in `/db-files/myrepo/` instead of `/workspace/myrepo/.cocoindex_code/`. Settings files remain in the original location.\n\nMultiple mappings are comma-separated and resolved in order (first match wins):\n\n```bash\nCOCOINDEX_CODE_DB_PATH_MAPPING=/workspace=/db-files,/workspace2=/db-files2\n```\n\nBoth source and target must be absolute paths. If no mapping matches, the default location is used.\n\n## Troubleshooting\n\nRun `ccc doctor` to diagnose common issues. It checks your settings, daemon health, embedding model, file matching, and index status — all in one command.\n\n### `sqlite3.Connection object has no attribute enable_load_extension`\n\nSome Python installations (e.g. the one pre-installed on macOS) ship with a SQLite library that doesn't enable extensions.\n\n**macOS fix:** Install Python through [Homebrew](https://brew.sh/):\n\n```bash\nbrew install python3\n```\n\nThen re-install cocoindex-code (see [Get Started](#get-started--zero-config-lets-go) for install options):\n\nUsing pipx:\n```bash\npipx install cocoindex-code       # first install\npipx upgrade cocoindex-code       # upgrade\n```\n\nUsing uv (install or upgrade):\n```bash\nuv tool install --upgrade cocoindex-code\n```\n\n## Legacy: Environment Variables\n\nIf you previously configured `cocoindex-code` via environment variables, the `cocoindex-code` MCP command still reads them and auto-migrates to YAML settings on first run. We recommend switching to the YAML settings for new setups.\n\n| Environment Variable | YAML Equivalent |\n|---------------------|-----------------|\n| `COCOINDEX_CODE_EMBEDDING_MODEL` | `embedding.model` in `global_settings.yml` |\n| `COCOINDEX_CODE_DEVICE` | `embedding.device` in `global_settings.yml` |\n| `COCOINDEX_CODE_ROOT_PATH` | Run `ccc init` in your project root instead |\n| `COCOINDEX_CODE_EXCLUDED_PATTERNS` | `exclude_patterns` in project `settings.yml` |\n| `COCOINDEX_CODE_EXTRA_EXTENSIONS` | `include_patterns` + `language_overrides` in project `settings.yml` |\n\n## Large codebase / Enterprise\n[CocoIndex](https://github.com/cocoindex-io/cocoindex) is an ultra efficient indexing engine that also works on large codebases at scale for enterprises. In enterprise scenarios it is a lot more efficient to share indexes with teammates when there are large or many repos. We also have advanced features like branch dedupe etc designed for enterprise users.\n\nIf you need help with remote setup, please email our maintainer linghua@cocoindex.io, happy to help!\n\n## Contributing\n\nWe welcome contributions! Before you start, please install the [pre-commit](https://pre-commit.com/) hooks so that linting, formatting, type checking, and tests run automatically before each commit:\n\n```bash\npip install pre-commit\npre-commit install\n```\n\nThis catches common issues — trailing whitespace, lint errors (Ruff), type errors (mypy), and test failures — before they reach CI.\n\nFor more details, see our [contributing guide](https://cocoindex.io/docs/contributing/guide).\n\n## License\n\nApache-2.0\n","funding_links":[],"categories":["Uncategorized"],"sub_categories":["Uncategorized"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcocoindex-io%2Fcocoindex-code","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcocoindex-io%2Fcocoindex-code","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcocoindex-io%2Fcocoindex-code/lists"}