https://github.com/hpbyte/h-codex

A semantic code search tool for intelligent, cross-repo context retrieval.
https://github.com/hpbyte/h-codex

abstract-syntax-tree agent ai claude-code code-indexing code-search cursor embedding java mcp nodejs openai postgresql rag semantic-search tree-sitter typescript vector-database vibe-coding

Last synced: 7 months ago
JSON representation

A semantic code search tool for intelligent, cross-repo context retrieval.

Host: GitHub
URL: https://github.com/hpbyte/h-codex
Owner: hpbyte
License: mit
Created: 2025-06-21T07:46:10.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-08-07T08:18:52.000Z (7 months ago)
Last Synced: 2025-08-07T08:29:34.646Z (7 months ago)
Topics: abstract-syntax-tree, agent, ai, claude-code, code-indexing, code-search, cursor, embedding, java, mcp, nodejs, openai, postgresql, rag, semantic-search, tree-sitter, typescript, vector-database, vibe-coding
Language: TypeScript
Homepage:
Size: 1.69 MB
Stars: 15
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# h-codex

A semantic code search tool for intelligent, cross-repo context retrieval.

## ✨ Features

- **AST-Based Chunking**: Intelligent code parsing using Abstract Syntax Trees for optimal chunk boundaries
- **Embedding & Semantic Search**: Using OpenAI's `text-embedding-3-small` model (support for `voyage-code-3` planned)
- **Vector Database**: PostgreSQL with pgvector extension for efficient similarity search
- **Multi-Language Support**: TypeScript, JavaScript, and extensible for other languages
- **Multi-Project Support**: Index and search multiple projects
- **MCP Integration**: Seamlessly connects with AI coding assistants through Model Context Protocol

## 🚀 Demo

![demo](./assets/demo-1.gif)

## 💻 Getting Started

h-codex can be integrated with AI assistants through the Model Context Protocol.

### Example with Claude Desktop

Edit your `claude_mcp_settings.json` file:

```json
{
"mcpServers": {
"h-codex": {
"command": "npx",
"args": ["@hpbyte/h-codex-mcp"],
"env": {
"LLM_API_KEY": "your_llm_api_key_here",
"LLM_BASE_URL": "your_llm_base_url_here (default is openai baseurl: https://api.openai.com/v1)",
"DB_CONNECTION_STRING": "postgresql://postgres:password@localhost:5432/h-codex"
}
}
}
}
```

## 🛠️ Development

### Prerequisites

- [Node.js](https://nodejs.org/) (v18+)
- [pnpm](https://pnpm.io/) - Package manager
- [Docker](https://www.docker.com/) - For running PostgreSQL with pgvector
- OpenAI API key for embeddings

### Getting Started

1. **Clone the repository**

```bash
git clone https://github.com/hpbyte/h-codex.git
cd h-codex
```

2. **Set up environment variables**

```bash
cp packages/core/.env.example packages/core/.env
```

Edit the `.env` file with your OpenAI API key and other configuration options.

3. **Install dependencies**

```bash
pnpm install
```

4. **Start PostgreSQL database**

```bash
cd dev && docker compose up -d
```

5. **Set up the database**

```bash
pnpm run db:migrate
```

6. **Start development server**

```bash
pnpm dev
```

## 🔧 Configuration Options

| Environment Variable | Description | Default |
|------------------------|----------------------------------| ------------------------------------------------------- |
| `LLM_API_KEY` | LLM API key for embeddings | Required |
| `LLM_BASE_URL` | LLM Base url key for embeddings | `https://api.openai.com/v1` |
| `EMBEDDING_MODEL` | OpenAI model for embeddings | `text-embedding-3-small` |
| `CHUNK_SIZE` | Maximum chunk size in characters | `1000` |
| `SEARCH_RESULTS_LIMIT` | Max search results returned | `10` |
| `SIMILARITY_THRESHOLD` | Minimum similarity for results | `0.5` |
| `DB_CONNECTION_STRING` | PostgreSQL connection string | `postgresql://postgres:password@localhost:5432/h-codex` |

## 🏗️ Architecture

```mermaid
graph TD
subgraph "Core Package"
subgraph "Ingestion Pipeline"
Explorer["Explorer
(file discovery)"]
Chunker["Chunker
(AST parsing & chunking)"]
Embedder["Embedder
(semantic embeddings)"]
Indexer["Indexer
(orchestration)"]

Explorer --> Chunker
Chunker --> Embedder
Embedder --> Indexer
end

subgraph "Storage Layer"
Repository["Repository"]
end

Indexer --> Repository
Repository --> Database[(PostgreSQL Vector Database)]
end

subgraph "MCP Package"
MCPServer["MCP Server"]
CodeIndexTool["Code Index Tool"]
CodeSearchTool["Code Search Tool"]

MCPServer --> CodeIndexTool
MCPServer --> CodeSearchTool
end

CodeIndexTool --> Indexer
CodeSearchTool --> Repository
```

## 🗺️ Roadmap

- Support for additional embedding providers (Voyage AI)
- Enhanced language support with more tree-sitter parsers

## 📄 License

This project is licensed under the MIT License

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hpbyte/h-codex

Awesome Lists containing this project

README