An open API service indexing awesome lists of open source software.

https://github.com/hpbyte/h-codex

A semantic code search tool for intelligent, cross-repo context retrieval.
https://github.com/hpbyte/h-codex

abstract-syntax-tree agent ai claude-code code-indexing code-search cursor embedding java mcp nodejs openai postgresql rag semantic-search tree-sitter typescript vector-database vibe-coding

Last synced: about 2 months ago
JSON representation

A semantic code search tool for intelligent, cross-repo context retrieval.

Awesome Lists containing this project

README

          

# h-codex

A semantic code search tool for intelligent, cross-repo context retrieval.

## ✨ Features

- **AST-Based Chunking**: Intelligent code parsing using Abstract Syntax Trees for optimal chunk boundaries
- **Embedding & Semantic Search**: Using OpenAI's `text-embedding-3-small` model (support for `voyage-code-3` planned)
- **Vector Database**: PostgreSQL with pgvector extension for efficient similarity search
- **Multi-Language Support**: TypeScript, JavaScript, and extensible for other languages
- **Multi-Project Support**: Index and search multiple projects
- **MCP Integration**: Seamlessly connects with AI coding assistants through Model Context Protocol

## πŸš€ Demo

![demo](./assets/demo-1.gif)

## πŸ’» Getting Started

h-codex can be integrated with AI assistants through the Model Context Protocol.

### Example with Claude Desktop

Edit your `claude_mcp_settings.json` file:

```json
{
"mcpServers": {
"h-codex": {
"command": "npx",
"args": ["@hpbyte/h-codex-mcp"],
"env": {
"LLM_API_KEY": "your_llm_api_key_here",
"LLM_BASE_URL": "your_llm_base_url_here (default is openai baseurl: https://api.openai.com/v1)",
"DB_CONNECTION_STRING": "postgresql://postgres:password@localhost:5432/h-codex"
}
}
}
}
```

## πŸ› οΈ Development

### Prerequisites

- [Node.js](https://nodejs.org/) (v18+)
- [pnpm](https://pnpm.io/) - Package manager
- [Docker](https://www.docker.com/) - For running PostgreSQL with pgvector
- OpenAI API key for embeddings

### Getting Started

1. **Clone the repository**

```bash
git clone https://github.com/hpbyte/h-codex.git
cd h-codex
```

2. **Set up environment variables**

```bash
cp packages/core/.env.example packages/core/.env
```

Edit the `.env` file with your OpenAI API key and other configuration options.

3. **Install dependencies**

```bash
pnpm install
```

4. **Start PostgreSQL database**

```bash
cd dev && docker compose up -d
```

5. **Set up the database**

```bash
pnpm run db:migrate
```

6. **Start development server**

```bash
pnpm dev
```

## πŸ”§ Configuration Options

| Environment Variable | Description | Default |
|------------------------|----------------------------------| ------------------------------------------------------- |
| `LLM_API_KEY` | LLM API key for embeddings | Required |
| `LLM_BASE_URL` | LLM Base url key for embeddings | `https://api.openai.com/v1` |
| `EMBEDDING_MODEL` | OpenAI model for embeddings | `text-embedding-3-small` |
| `CHUNK_SIZE` | Maximum chunk size in characters | `1000` |
| `SEARCH_RESULTS_LIMIT` | Max search results returned | `10` |
| `SIMILARITY_THRESHOLD` | Minimum similarity for results | `0.5` |
| `DB_CONNECTION_STRING` | PostgreSQL connection string | `postgresql://postgres:password@localhost:5432/h-codex` |

## πŸ—οΈ Architecture

```mermaid
graph TD
subgraph "Core Package"
subgraph "Ingestion Pipeline"
Explorer["Explorer
(file discovery)"]
Chunker["Chunker
(AST parsing & chunking)"]
Embedder["Embedder
(semantic embeddings)"]
Indexer["Indexer
(orchestration)"]

Explorer --> Chunker
Chunker --> Embedder
Embedder --> Indexer
end

subgraph "Storage Layer"
Repository["Repository"]
end

Indexer --> Repository
Repository --> Database[(PostgreSQL Vector Database)]
end

subgraph "MCP Package"
MCPServer["MCP Server"]
CodeIndexTool["Code Index Tool"]
CodeSearchTool["Code Search Tool"]

MCPServer --> CodeIndexTool
MCPServer --> CodeSearchTool
end

CodeIndexTool --> Indexer
CodeSearchTool --> Repository
```

## πŸ—ΊοΈ Roadmap

- Support for additional embedding providers (Voyage AI)
- Enhanced language support with more tree-sitter parsers

## πŸ“„ License

This project is licensed under the MIT License