https://github.com/hpbyte/h-codex
A semantic code search tool for intelligent, cross-repo context retrieval.
https://github.com/hpbyte/h-codex
abstract-syntax-tree agent ai claude-code code-indexing code-search cursor embedding java mcp nodejs openai postgresql rag semantic-search tree-sitter typescript vector-database vibe-coding
Last synced: about 2 months ago
JSON representation
A semantic code search tool for intelligent, cross-repo context retrieval.
- Host: GitHub
- URL: https://github.com/hpbyte/h-codex
- Owner: hpbyte
- License: mit
- Created: 2025-06-21T07:46:10.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-08-07T08:18:52.000Z (about 2 months ago)
- Last Synced: 2025-08-07T08:29:34.646Z (about 2 months ago)
- Topics: abstract-syntax-tree, agent, ai, claude-code, code-indexing, code-search, cursor, embedding, java, mcp, nodejs, openai, postgresql, rag, semantic-search, tree-sitter, typescript, vector-database, vibe-coding
- Language: TypeScript
- Homepage:
- Size: 1.69 MB
- Stars: 15
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# h-codex
A semantic code search tool for intelligent, cross-repo context retrieval.
## β¨ Features
- **AST-Based Chunking**: Intelligent code parsing using Abstract Syntax Trees for optimal chunk boundaries
- **Embedding & Semantic Search**: Using OpenAI's `text-embedding-3-small` model (support for `voyage-code-3` planned)
- **Vector Database**: PostgreSQL with pgvector extension for efficient similarity search
- **Multi-Language Support**: TypeScript, JavaScript, and extensible for other languages
- **Multi-Project Support**: Index and search multiple projects
- **MCP Integration**: Seamlessly connects with AI coding assistants through Model Context Protocol## π Demo

## π» Getting Started
h-codex can be integrated with AI assistants through the Model Context Protocol.
### Example with Claude Desktop
Edit your `claude_mcp_settings.json` file:
```json
{
"mcpServers": {
"h-codex": {
"command": "npx",
"args": ["@hpbyte/h-codex-mcp"],
"env": {
"LLM_API_KEY": "your_llm_api_key_here",
"LLM_BASE_URL": "your_llm_base_url_here (default is openai baseurl: https://api.openai.com/v1)",
"DB_CONNECTION_STRING": "postgresql://postgres:password@localhost:5432/h-codex"
}
}
}
}
```## π οΈ Development
### Prerequisites
- [Node.js](https://nodejs.org/) (v18+)
- [pnpm](https://pnpm.io/) - Package manager
- [Docker](https://www.docker.com/) - For running PostgreSQL with pgvector
- OpenAI API key for embeddings### Getting Started
1. **Clone the repository**
```bash
git clone https://github.com/hpbyte/h-codex.git
cd h-codex
```2. **Set up environment variables**
```bash
cp packages/core/.env.example packages/core/.env
```Edit the `.env` file with your OpenAI API key and other configuration options.
3. **Install dependencies**
```bash
pnpm install
```4. **Start PostgreSQL database**
```bash
cd dev && docker compose up -d
```5. **Set up the database**
```bash
pnpm run db:migrate
```6. **Start development server**
```bash
pnpm dev
```## π§ Configuration Options
| Environment Variable | Description | Default |
|------------------------|----------------------------------| ------------------------------------------------------- |
| `LLM_API_KEY` | LLM API key for embeddings | Required |
| `LLM_BASE_URL` | LLM Base url key for embeddings | `https://api.openai.com/v1` |
| `EMBEDDING_MODEL` | OpenAI model for embeddings | `text-embedding-3-small` |
| `CHUNK_SIZE` | Maximum chunk size in characters | `1000` |
| `SEARCH_RESULTS_LIMIT` | Max search results returned | `10` |
| `SIMILARITY_THRESHOLD` | Minimum similarity for results | `0.5` |
| `DB_CONNECTION_STRING` | PostgreSQL connection string | `postgresql://postgres:password@localhost:5432/h-codex` |## ποΈ Architecture
```mermaid
graph TD
subgraph "Core Package"
subgraph "Ingestion Pipeline"
Explorer["Explorer
(file discovery)"]
Chunker["Chunker
(AST parsing & chunking)"]
Embedder["Embedder
(semantic embeddings)"]
Indexer["Indexer
(orchestration)"]Explorer --> Chunker
Chunker --> Embedder
Embedder --> Indexer
endsubgraph "Storage Layer"
Repository["Repository"]
endIndexer --> Repository
Repository --> Database[(PostgreSQL Vector Database)]
endsubgraph "MCP Package"
MCPServer["MCP Server"]
CodeIndexTool["Code Index Tool"]
CodeSearchTool["Code Search Tool"]MCPServer --> CodeIndexTool
MCPServer --> CodeSearchTool
endCodeIndexTool --> Indexer
CodeSearchTool --> Repository
```## πΊοΈ Roadmap
- Support for additional embedding providers (Voyage AI)
- Enhanced language support with more tree-sitter parsers## π License
This project is licensed under the MIT License