https://github.com/zilliztech/codeindexer
An open-source code indexing and search tool implemented with Milvus vector database and popular embedding models. You can build your AI Coding IDE or code search plugin with it.
https://github.com/zilliztech/codeindexer
agent agentic-rag ai-coding chrome-extension code-generation code-search cursor embedding ide mcp merkle-tree nodejs openai rag semantic-search typescript vector-database vibe-coding voyage-ai vscode-extension
Last synced: 3 months ago
JSON representation
An open-source code indexing and search tool implemented with Milvus vector database and popular embedding models. You can build your AI Coding IDE or code search plugin with it.
- Host: GitHub
- URL: https://github.com/zilliztech/codeindexer
- Owner: zilliztech
- License: mit
- Created: 2025-06-06T02:12:47.000Z (4 months ago)
- Default Branch: master
- Last Pushed: 2025-06-28T10:21:11.000Z (3 months ago)
- Last Synced: 2025-06-28T11:31:36.631Z (3 months ago)
- Topics: agent, agentic-rag, ai-coding, chrome-extension, code-generation, code-search, cursor, embedding, ide, mcp, merkle-tree, nodejs, openai, rag, semantic-search, typescript, vector-database, vibe-coding, voyage-ai, vscode-extension
- Language: TypeScript
- Homepage:
- Size: 1.51 MB
- Stars: 1
- Watchers: 0
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# π CodeIndexer
[](https://opensource.org/licenses/MIT)
[](https://nodejs.org/)
[](https://marketplace.visualstudio.com/items?itemName=zilliz.semanticcodesearch)
[](https://www.npmjs.com/package/@code-indexer/core)
[](https://www.npmjs.com/package/@code-indexer/mcp)
[](https://twitter.com/zilliz_universe)An open-source code indexing and semantic search tool implemented by Milvus vector database and popular embedding models. You can build your AI Coding IDE or code search plugin with it.
## π Why CodeIndexer?
In the **AI-first development era**, traditional keyword-based search is no longer sufficient for modern software development:
### π **The AI Coding Revolution**
- **AI-Powered IDEs** like Cursor and GitHub Copilot are transforming development workflows
- **Growing demand** for intelligent code assistance and semantic understanding
- **Modern codebases** contain millions of lines across hundreds of files, making manual navigation inefficient### β **Current Limitations**
- Regex and keyword-based search miss **contextual relationships**
- Developers waste time navigating large codebases manually
- Knowledge transfer between team members is inefficient
- Traditional search tools can't bridge the gap between **human intent** and **code implementation**### β **Our Solution**
CodeIndexer bridges the gap between human understanding and code discovery through:
- **Semantic search** with natural language queries like *"find authentication functions"*
- **AI-powered understanding** of code meaning and relationships
- **Universal integration** across multiple platforms and development environments> π‘ **Find code by describing functionality, not just keywords** - Discover existing solutions before writing duplicate code.
## β¨ Features
- π **Semantic Code Search**: Ask questions like *"find functions that handle user authentication"* instead of guessing keywords
- π **Intelligent Indexing**: Automatically index entire codebases and build semantic vector databases with contextual understanding
- π― **Context-Aware Discovery**: Find related code snippets based on meaning, not just text matching
- β‘ **Incremental File Synchronization**: Efficient change detection using Merkle trees to only re-index modified files
- π§© **Smart Chunking**: AST-based code splitting that preserves context and structure
- π **Developer Productivity**: Significantly reduce time spent searching for relevant code and discovering existing solutions
- π§ **Embedding Providers**: Support for OpenAI, VoyageAI, Ollama as embedding providers
- πΎ **Vector Storage**: Integrated with Milvus/Zilliz Cloud for efficient storage and retrieval
- π οΈ **VSCode Integration**: Built-in VSCode extension for seamless development workflow
- π€ **MCP Support**: Model Context Protocol integration for AI agent interactions
- π **Progress Tracking**: Real-time progress feedback during indexing operations
- π¨ **Customizable**: Configurable file extensions, ignore patterns, and embedding models## ποΈ Architecture
CodeIndexer is a monorepo containing three main packages:
### Core Components
- **`@code-indexer/core`**: Core indexing engine with embedding and vector database integration
- **VSCode Extension**: Semantic Code Search extension for Visual Studio Code
- **`@code-indexer/mcp`**: Model Context Protocol server for AI agent integration### Supported Technologies
- **Embedding Providers**: [OpenAI](https://openai.com), [VoyageAI](https://voyageai.com), [Ollama](https://ollama.ai)
- **Vector Databases**: [Milvus](https://milvus.io) or [Zilliz Cloud](https://zilliz.com/cloud)(fully managed vector database as a service)
- **Code Splitters**: AST-based splitter (with automatic fallback), LangChain character-based splitter
- **Languages**: TypeScript, JavaScript, Python, Java, C++, C#, Go, Rust, PHP, Ruby, Swift, Kotlin, Scala, Markdown
- **Development Tools**: VSCode, Model Context Protocol## π Quick Start (for the core package)
### Prerequisites
- Node.js >= 20.0.0
- pnpm >= 10.0.0
- Milvus database
- OpenAI or VoyageAI API key### Installation
```bash
# Using npm
npm install @code-indexer/core# Using pnpm
pnpm add @code-indexer/core# Using yarn
yarn add @code-indexer/core
```### Prepare Environment Variables
#### OpenAI API key
See [OpenAI Documentation](https://platform.openai.com/docs/api-reference) for more details to get your API key.
```bash
OPENAI_API_KEY=your-openai-api-key
```#### Milvus configuration
**Optional 1**: Self-hosted Milvus
See [Milvus Documentation](https://milvus.io/docs/install_standalone-docker-compose.md) for more details to install Milvus.
- `MILVUS_ADDRESS` is the address of your Milvus instance
- (Optional)`MILVUS_TOKEN` is the token of your Milvus instance, which can be left empty if you don't use token-based authentication.
```bash
MILVUS_ADDRESS=localhost:19530
MILVUS_TOKEN=your-milvus-token
```
**Optional 2**: Zilliz Cloud(fully managed vector database as a service, you can [use it for free](https://zilliz.com/cloud))- `MILVUS_ADDRESS` is the Public Endpoint of your Zilliz Cloud instance
- `MILVUS_TOKEN` is the token of your Zilliz Cloud instance.
```bash
MILVUS_ADDRESS=https://xxx-xxxxxxxxxxxx.serverless.gcp-us-west1.cloud.zilliz.com
MILVUS_TOKEN=xxxxxxx
```### Basic Usage
[@code-indexer/core](packages/core/README.md)
Core indexing engine that provides the fundamental functionality for code indexing and semantic search. Handles embedding generation, vector storage, and search operations.```typescript
import { CodeIndexer, MilvusVectorDatabase, OpenAIEmbedding } from '@code-indexer/core';// Initialize embedding provider
const embedding = new OpenAIEmbedding({
apiKey: process.env.OPENAI_API_KEY || 'your-openai-api-key',
model: 'text-embedding-3-small'
});// Initialize vector database
const vectorDatabase = new MilvusVectorDatabase({
address: process.env.MILVUS_ADDRESS || 'localhost:19530',
token: process.env.MILVUS_TOKEN || ''
});// Create indexer instance
const indexer = new CodeIndexer({
embedding,
vectorDatabase
});// Index your codebase with progress tracking
const stats = await indexer.indexCodebase('./your-project', (progress) => {
console.log(`${progress.phase} - ${progress.percentage}%`);
});
console.log(`Indexed ${stats.indexedFiles} files, ${stats.totalChunks} chunks`);// Perform semantic search
const results = await indexer.semanticSearch('./your-project', 'vector database operations', 5);
results.forEach(result => {
console.log(`File: ${result.relativePath}:${result.startLine}-${result.endLine}`);
console.log(`Score: ${(result.score * 100).toFixed(2)}%`);
console.log(`Content: ${result.content.substring(0, 100)}...`);
});
```## π¦ Built on Core
All the following packages are built on top of the `@code-indexer/core` engine, extending its capabilities to different platforms and use cases. They leverage the core's semantic search and indexing functionality to provide specialized interfaces and integrations.
> π Each package has its own detailed documentation and usage examples. Click the links below to learn more.
### [@code-indexer/mcp](packages/mcp/README.md)
Model Context Protocol (MCP) server that enables AI assistants and agents to interact with CodeIndexer through a standardized protocol. Exposes indexing and search capabilities via MCP tools.
Cursor
Go to: `Settings` -> `Cursor Settings` -> `MCP` -> `Add new global MCP server`
Pasting the following configuration into your Cursor `~/.cursor/mcp.json` file is the recommended approach. You may also install in a specific project by creating `.cursor/mcp.json` in your project folder. See [Cursor MCP docs](https://docs.cursor.com/context/model-context-protocol) for more info.
```json
{
"mcpServers": {
"code-indexer": {
"command": "npx",
"args": ["-y", "@code-indexer/mcp@latest"],
"env": {
"OPENAI_API_KEY": "your-openai-api-key",
"MILVUS_ADDRESS": "localhost:19530"
}
}
}
}
```Claude Desktop
Add to your Claude Desktop configuration:
```json
{
"mcpServers": {
"code-indexer": {
"command": "npx",
"args": ["@code-indexer/mcp@latest"],
"env": {
"OPENAI_API_KEY": "your-openai-api-key",
"MILVUS_ADDRESS": "localhost:19530"
}
}
}
}
```Claude Code
Use the command line interface to add the CodeIndexer MCP server:
```bash
# Add the CodeIndexer MCP server
claude mcp add code-indexer -e OPENAI_API_KEY=your-openai-api-key -e MILVUS_ADDRESS=localhost:19530 -- npx @code-indexer/mcp@latest```
See the [Claude Code MCP documentation](https://docs.anthropic.com/en/docs/claude-code/mcp) for more details about MCP server management.
Windsurf
Windsurf supports MCP configuration through a JSON file. Add the following configuration to your Windsurf MCP settings:
```json
{
"mcpServers": {
"code-indexer": {
"command": "npx",
"args": ["-y", "@code-indexer/mcp@latest"],
"env": {
"OPENAI_API_KEY": "your-openai-api-key",
"MILVUS_ADDRESS": "localhost:19530"
}
}
}
}
```VS Code
The CodeIndexer MCP server can be used with VS Code through MCP-compatible extensions. Add the following configuration to your VS Code MCP settings:
```json
{
"mcpServers": {
"code-indexer": {
"command": "npx",
"args": ["-y", "@code-indexer/mcp@latest"],
"env": {
"OPENAI_API_KEY": "your-openai-api-key",
"MILVUS_ADDRESS": "localhost:19530"
}
}
}
}
```Cherry Studio
Cherry Studio allows for visual MCP server configuration through its settings interface. While it doesn't directly support manual JSON configuration, you can add a new server via the GUI:
1. Navigate to **Settings β MCP Servers β Add Server**.
2. Fill in the server details:
- **Name**: `code-indexer`
- **Type**: `STDIO`
- **Command**: `npx`
- **Arguments**: `["@code-indexer/mcp@latest"]`
- **Environment Variables**:
- `OPENAI_API_KEY`: `your-openai-api-key`
- `MILVUS_ADDRESS`: `localhost:19530`
3. Save the configuration to activate the server.Cline
Cline uses a JSON configuration file to manage MCP servers. To integrate the provided MCP server configuration:
1. Open Cline and click on the **MCP Servers** icon in the top navigation bar.
2. Select the **Installed** tab, then click **Advanced MCP Settings**.
3. In the `cline_mcp_settings.json` file, add the following configuration:
```json
{
"mcpServers": {
"code-indexer": {
"command": "npx",
"args": ["@code-indexer/mcp@latest"],
"env": {
"OPENAI_API_KEY": "your-openai-api-key",
"MILVUS_ADDRESS": "localhost:19530"
}
}
}
}
```4. Save the file.
Augment
To configure Code Indexer MCP in Augment Code, you can use either the graphical interface or manual configuration.
#### **A. Using the Augment Code UI**
1. Click the hamburger menu.
2. Select **Settings**.
3. Navigate to the **Tools** section.
4. Click the **+ Add MCP** button.
5. Enter the following command:
```
npx @code-indexer/mcp@latest
```6. Name the MCP: **Code Indexer**.
7. Click the **Add** button.
------
#### **B. Manual Configuration**
1. Press Cmd/Ctrl Shift P or go to the hamburger menu in the Augment panel
2. Select Edit Settings
3. Under Advanced, click Edit in settings.json
4. Add the server configuration to the `mcpServers` array in the `augment.advanced` object```json
"augment.advanced": {
"mcpServers": [
{
"name": "code-indexer",
"command": "npx",
"args": ["-y", "@code-indexer/mcp@latest"]
}
]
}
```Gemini CLI
Gemini CLI requires manual configuration through a JSON file:
1. Create or edit the `~/.gemini/settings.json` file.
2. Add the following configuration:
```json
{
"mcpServers": {
"code-indexer": {
"command": "npx",
"args": ["@code-indexer/mcp@latest"],
"env": {
"OPENAI_API_KEY": "your-openai-api-key",
"MILVUS_ADDRESS": "localhost:19530"
}
}
}
}
```3. Save the file and restart Gemini CLI to apply the changes.
Roo Code
Roo Code utilizes a JSON configuration file for MCP servers:
1. Open Roo Code and navigate to **Settings β MCP Servers β Edit Global Config**.
2. In the `mcp_settings.json` file, add the following configuration:
```json
{
"mcpServers": {
"code-indexer": {
"command": "npx",
"args": ["@code-indexer/mcp@latest"],
"env": {
"OPENAI_API_KEY": "your-openai-api-key",
"MILVUS_ADDRESS": "localhost:19530"
}
}
}
}
```3. Save the file to activate the server.
Other MCP Clients
The server uses stdio transport and follows the standard MCP protocol. It can be integrated with any MCP-compatible client by running:
```bash
npx @code-indexer/mcp@latest
```### [VSCode Extension](packages/vscode-extension/README.md)
Visual Studio Code extension that integrates CodeIndexer directly into your IDE. Provides an intuitive interface for semantic code search and navigation.1. **Direct Link**: [Install from VS Code Marketplace](https://marketplace.visualstudio.com/items?itemName=zilliz.semanticcodesearch)
2. **Manual Search**:
- Open Extensions view in VSCode (Ctrl+Shift+X or Cmd+Shift+X on Mac)
- Search for "Semantic Code Search"
- Click Install
## π οΈ Development
### Setup Development Environment
```bash
# Clone repository
git clone https://github.com/zilliztech/CodeIndexer.git
cd CodeIndexer# Install dependencies
pnpm install# Build all packages
pnpm build# Start development mode
pnpm dev
```### Building
```bash
# Build all packages
pnpm build# Build specific package
pnpm build:core
pnpm build:vscode
pnpm build:mcp
```### Running Examples
```bash
# Development with file watching
cd examples/basic-usage
pnpm dev
```### Supported File Extensions
By default, CodeIndexer supports:
- Programming languages: `.ts`, `.tsx`, `.js`, `.jsx`, `.py`, `.java`, `.cpp`, `.c`, `.h`, `.hpp`, `.cs`, `.go`, `.rs`, `.php`, `.rb`, `.swift`, `.kt`, `.scala`, `.m`, `.mm`
- Documentation: `.md`, `.markdown`### Ignore Patterns
Common directories and files are automatically ignored:
- `node_modules/**`, `dist/**`, `build/**`
- `.git/**`, `.vscode/**`, `.idea/**`
- `*.log`, `*.min.js`, `*.map`## π Examples
Check the `/examples` directory for complete usage examples:
- **Basic Usage**: Simple indexing and search example
## π€ Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details on how to get started.
**Package-specific contributing guides:**
- [Core Package Contributing](packages/core/CONTRIBUTING.md)
- [MCP Server Contributing](packages/mcp/CONTRIBUTING.md)
- [VSCode Extension Contributing](packages/vscode-extension/CONTRIBUTING.md)## πΊοΈ Roadmap
- [x] AST-based code analysis for improved understanding
- [x] Support for additional embedding providers
- [ ] Agent-based interactive search mode
- [ ] Enhanced code chunking strategies
- [ ] Search result ranking optimization
- [ ] Robust Chrome Extension## π License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## π Links
- [GitHub Repository](https://github.com/zilliztech/CodeIndexer)
- [VSCode Marketplace](https://marketplace.visualstudio.com/items?itemName=zilliz.semanticcodesearch)
- [Milvus Documentation](https://milvus.io/docs)
- [Zilliz Cloud](https://zilliz.com/cloud)