https://github.com/lambdamechanic/groma
https://github.com/lambdamechanic/groma
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/lambdamechanic/groma
- Owner: lambdamechanic
- Created: 2025-04-24T06:06:42.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-12-31T01:35:46.000Z (5 months ago)
- Last Synced: 2026-01-04T02:20:17.090Z (5 months ago)
- Language: Rust
- Size: 888 KB
- Stars: 5
- Watchers: 1
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome - lambdamechanic/groma - (Rust)
README
# Groma
A semantic code search tool for Git repositories that uses vector embeddings to find relevant files based on natural language queries.
## Two Versions Available
### 1. `groma` - Cloud-based with Qdrant
- Uses **Qdrant** vector database (requires Docker)
- Uses **OpenAI API** for embeddings (requires API key, costs money)
- Higher quality embeddings
- Needs internet connection
### 2. `groma-lancedb` - Fully Local & Free
- Uses **LanceDB** (embedded, no server needed)
- Uses **local fastembed model** (AllMiniLML6V2)
- 100% offline, no API calls
- Completely free
- Your code never leaves your machine
## Installation
```bash
# Clone the repository
git clone https://github.com/yourusername/groma.git
cd groma
# Build both versions
cargo build --release --features qdrant --bin groma
cargo build --release --features lancedb --bin groma-lancedb
# Install to your PATH
cp target/release/groma ~/.local/bin/
cp target/release/groma-lancedb ~/.local/bin/
```
## Usage
Both versions use the same command-line interface:
```bash
# Basic usage - pipe your query through stdin
echo "authentication logic" | groma /path/to/repo --cutoff 0.3
# Or use the LanceDB version (no setup needed!)
echo "authentication logic" | groma-lancedb /path/to/repo --cutoff 0.3
```
### Options
- `--cutoff` - Similarity threshold (0.0-1.0, default: 0.7)
- `--suppress-updates` - Skip indexing, query existing data only
- `--debug` - Enable debug logging
## Setup Requirements
### For `groma` (Qdrant version)
1. **Start Qdrant Docker container:**
```bash
docker run -p 6334:6334 -v ~/.qdrant_data:/qdrant/storage qdrant/qdrant
```
2. **Set OpenAI API key:**
```bash
export OPENAI_API_KEY='your-api-key-here'
```
3. **Optional - Set custom Qdrant URL:**
```bash
export QDRANT_URL='http://your-qdrant-host:6334'
```
### For `groma-lancedb` (Local version)
**No setup required!** Just run it. The first run will download the embedding model (~80MB) automatically.
## MCP Server Mode (LanceDB only)
`groma-lancedb` can run as an MCP (Model Context Protocol) server:
```bash
# Run as MCP server
groma-lancedb mcp
# With debug logging (logs to /tmp/groma.log)
groma-lancedb mcp --debug
```
### MCP Configuration
Add to your MCP client config (e.g., Claude Desktop):
```json
{
"mcpServers": {
"groma": {
"command": "/path/to/groma-lancedb",
"args": ["mcp"]
}
}
}
```
The MCP server provides a `query` tool for semantic code search:
- **query**: Search query string
- **folder**: Repository path to search
- **cutoff**: Similarity threshold (0.0-1.0, default 0.3)
## How It Works
1. **Indexing**: On first run, Groma scans your Git repository and creates embeddings for all tracked files
2. **Incremental Updates**: Subsequent runs only process changed files
3. **Semantic Search**: Your query is embedded and compared against the indexed files
4. **Results**: Returns relevant file paths and content snippets in JSON format
## File Filtering
Both versions respect:
- `.gitignore` - Files ignored by Git are not indexed
- `.gromaignore` - Additional patterns to exclude from indexing
- Only Git-tracked files are processed
- Binary files are automatically skipped
## Output Format
Results are returned as JSON for easy integration with other tools:
```json
{
"path": "src/auth.rs",
"score": 0.82,
"content": "impl Authentication {\n pub fn verify_token..."
}
```
## Integration with Aider
Groma works great with [aider](https://aider.chat) for AI-assisted coding:
```bash
# Use with aider's --read flag
aider --read $(echo "authentication" | groma . --cutoff 0.3 | jq -r '.path')
# Or use the helper script
aider --read $(groma-files "authentication logic" .)
```
## Performance Comparison
| Feature | `groma` (Qdrant) | `groma-lancedb` (Local) |
|---------|------------------|-------------------------|
| Setup Required | Docker + API Key | None |
| Internet Required | Yes | No |
| Cost | OpenAI API fees | Free |
| Privacy | API calls | 100% local |
| Embedding Quality | Higher | Good |
| Speed | Fast after indexing | Fast after indexing |
| Storage | External (Qdrant) | Local (.groma_lancedb) |
## Why Groma?
The name comes from the [groma](https://en.wikipedia.org/wiki/Groma_(surveying)), a surveying instrument used in the Roman Empire. Just as the ancient groma helped surveyors find structure in the physical landscape, this tool helps you find relevant files within your codebase.
## License
MIT