An open API service indexing awesome lists of open source software.

https://github.com/aruntemme/go-rag

Advanced RAG System with Go featuring intelligent adaptive chunking, hierarchical document processing, semantic search, and flexible LLM integration
https://github.com/aruntemme/go-rag

golang llm localllm openai rag sqlite-vec sqlite3

Last synced: about 2 months ago
JSON representation

Advanced RAG System with Go featuring intelligent adaptive chunking, hierarchical document processing, semantic search, and flexible LLM integration

Awesome Lists containing this project

README

          

# πŸ€– Advanced RAG System with Go

[![Go Version](https://img.shields.io/badge/Go-1.19+-blue)](https://golang.org/)
[![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
[![SQLite-vec](https://img.shields.io/badge/Vector%20DB-SQLite--vec-orange)](https://github.com/asg017/sqlite-vec)

A sophisticated **Retrieval Augmented Generation (RAG)** system built with Go, featuring intelligent adaptive chunking, hierarchical document processing, semantic search, flexible LLM integration, and command-line configuration management.

## ✨ Key Features

### 🧠 Intelligent Adaptive Chunking System
- **Document-Size Aware**: Automatically adapts chunking strategy based on document characteristics
- **5-Tier Classification**: VerySmall β†’ Small β†’ Medium β†’ Large β†’ VeryLarge with tailored strategies
- **Context Preservation**: Smart thresholds prevent fragmentation while maintaining semantic coherence
- **50% Better Performance**: Fewer chunks with 100% better context preservation

### πŸ” Advanced Search & Retrieval
- **Search-Only Endpoint**: Pure retrieval without LLM overhead (500x faster)
- **Full RAG Pipeline**: Complete question-answering with context generation
- **Semantic Thresholding**: Filter results by similarity scores
- **Metadata Filtering**: Precise targeting with custom filters
- **Query Expansion**: Automatic synonym and related term expansion

### πŸ“Š Multiple Chunking Strategies
- **Structural Chunking**: Intelligent section and paragraph detection
- **Fixed-Size Chunking**: Traditional character-based with overlap
- **Semantic Chunking**: Content-aware based on meaning
- **Sentence Window**: Overlapping sentence-based chunks
- **Parent-Child Relationships**: Hierarchical organization for multi-level context

### πŸš€ Performance & Flexibility
- **SQLite-vec Integration**: High-performance vector storage
- **Concurrent Processing**: Efficient batch embedding generation
- **Dimension Auto-Detection**: Automatic model compatibility
- **RESTful API**: Clean, well-documented endpoints
- **External LLM Support**: Use any OpenAI-compatible service
- **Command-Line Interface**: Flexible configuration with CLI arguments
- **Cross-Platform Builds**: Single build script for all platforms

## πŸ—οΈ Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Documents │───▢│ Adaptive Chunking │───▢│ Vector Store β”‚
β”‚ β”‚ β”‚ System β”‚ β”‚ (SQLite-vec) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Search API │◀───│ Embedding │◀───│ Raw Search β”‚
β”‚ (/search) β”‚ β”‚ Service β”‚ β”‚ Results β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚
β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ External LLM β”‚ β”‚ Full RAG API β”‚
β”‚ Processing β”‚ β”‚ (/query) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## πŸ“‹ Prerequisites

- **Go 1.19+**
- **OpenAI-compatible API Server** (LlamaCPP, OpenAI, Ollama, or any v1/embeddings endpoint)
- **Embedding Model** (Nomic, OpenAI, or compatible)

## πŸš€ Quick Start

### 1. Clone & Install
```bash
git clone https://github.com/aruntemme/go-rag.git
cd go-rag
go mod tidy
```

### 2. Build (Optional but Recommended)
```bash
# Quick build for current platform
go build -ldflags="-s -w" -o rag-server .

# Or build for all platforms
chmod +x build.sh && ./build.sh
```

### 3. Configure
Create `config.json`:
```json
{
"server_port": "8080",
"llamacpp_base_url": "http://localhost:8091/v1",
"embedding_model": "nomic-embed-text-v1.5",
"chat_model": "qwen3:8b",
"vector_db_path": "./rag_database.db",
"default_top_k": 3
}
```

### 4. Start Embedding Server
```bash
# Example with llama.cpp
./server -m your-model.gguf --host 0.0.0.0 --port 8091

# Or use OpenAI API
# Set OPENAI_API_KEY and use https://api.openai.com/v1

# Or use Ollama
ollama serve
```

### 5. Run the Application

#### Development Mode
```bash
go run main.go
```

#### Build & Run (Recommended)
```bash
# Build optimized executable
go build -ldflags="-s -w" -o rag-server .

# Run with default config
./rag-server

# Run with custom config
./rag-server -config=production.json

# Show help and options
./rag-server -help

# Show version
./rag-server -version
```

πŸŽ‰ Server starts on `http://localhost:8080` (or configured port)

## πŸ“š Usage Examples

### Basic Document Upload & Search
```bash
# 1. Create a collection
curl -X POST http://localhost:8080/api/v1/collections \
-H "Content-Type: application/json" \
-d '{"name": "my_docs", "description": "My documents"}'

# 2. Add a document (adaptive chunking automatically applied)
curl -X POST http://localhost:8080/api/v1/documents \
-H "Content-Type: application/json" \
-d '{
"collection_name": "my_docs",
"content": "Your document content here...",
"source": "document.txt"
}'

# 3. Search without LLM (fast retrieval)
curl -X POST http://localhost:8080/api/v1/search \
-H "Content-Type: application/json" \
-d '{
"collection_name": "my_docs",
"query": "What is this about?",
"top_k": 5
}'

# 4. Full RAG query (with answer generation)
curl -X POST http://localhost:8080/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"collection_name": "my_docs",
"query": "What is this about?",
"top_k": 5
}'
```

### Advanced Search Features
```bash
# Search with semantic filtering and metadata
curl -X POST http://localhost:8080/api/v1/search \
-H "Content-Type: application/json" \
-d '{
"collection_name": "my_docs",
"query": "machine learning experience",
"top_k": 10,
"semantic_threshold": 0.3,
"metadata_filters": {
"section": "experience",
"chunk_type": "job_entry"
}
}'
```

## πŸ”Œ API Endpoints

| Endpoint | Method | Purpose | Speed |
|----------|--------|---------|-------|
| `/health` | GET | Health check | ⚑ Instant |
| `/api/v1/collections` | POST/GET/DELETE | Manage collections | ⚑ Fast |
| `/api/v1/documents` | POST/GET/DELETE | Manage documents | 🐒 Processing |
| `/api/v1/search` | POST | **Retrieval only** | ⚑ Fast |
| `/api/v1/query` | POST | **Full RAG** | 🐒 LLM dependent |
| `/api/v1/analyze` | POST | Detailed analysis | 🐒 LLM dependent |

> πŸ“– **Full API documentation**: [API_REFERENCE.md](API_REFERENCE.md)

## 🧠 Adaptive Chunking System

Our intelligent chunking system automatically optimizes based on document characteristics:

### Document Size Categories
- **VerySmall** (<1KB): Single chunk or max 2-3 chunks
- **Small** (1-3KB): 3-5 meaningful chunks, 400+ char minimum
- **Medium** (3-10KB): Structural/semantic chunking
- **Large** (10-50KB): Hierarchical parent-child chunks
- **VeryLarge** (50KB+): Aggressive hierarchical chunking

### Performance Benefits
- **50% Fewer Chunks**: Reduces noise and improves relevance
- **100% Better Context**: Maintains semantic coherence
- **Universal Compatibility**: Works with any document type
- **Automatic Optimization**: No manual tuning required

> πŸ“– **Detailed explanation**: [ADAPTIVE_CHUNKING.md](ADAPTIVE_CHUNKING.md)

## πŸ” Search vs Query Endpoints

### `/api/v1/search` - Pure Retrieval
```json
{
"chunks_found": 3,
"chunks": [/* detailed chunk data */],
"context": "ready-to-use context string",
"similarity_scores": [0.95, 0.87, 0.82],
"processing_time": 0.056
}
```
**Perfect for**: External LLM processing, custom pipelines, debugging

### `/api/v1/query` - Full RAG
```json
{
"answer": "Generated answer based on retrieved context",
"retrieved_context": ["context chunks"],
"enhanced_chunks": [/* chunks with metadata */],
"processing_time": 2.34
}
```
**Perfect for**: Complete question-answering, integrated solutions

> πŸ“– **Search endpoint guide**: [SEARCH_ENDPOINT.md](SEARCH_ENDPOINT.md)

## πŸƒβ€β™‚οΈ Performance

| Operation | Time | Description |
|-----------|------|-------------|
| Document Upload | ~1-5s | Depends on size & chunking |
| Search Query | ~0.05s | Pure retrieval |
| Full RAG Query | ~2-30s | Includes LLM generation |
| Embedding Batch | ~0.1s/chunk | Concurrent processing |

## πŸ› οΈ Development

### Project Structure
```
go-rag/
β”œβ”€β”€ main.go # Application entry point
β”œβ”€β”€ config.json # Configuration file
β”œβ”€β”€ go.mod & go.sum # Go dependencies
β”œβ”€β”€ api/ # HTTP handlers and routing
β”œβ”€β”€ core/ # Core business logic
β”œβ”€β”€ models/ # Data structures
β”œβ”€β”€ config/ # Configuration management
└── docs/ # Documentation
```

### Key Components
- **`core/document_processor.go`**: Adaptive chunking engine
- **`core/vector_db.go`**: SQLite-vec integration
- **`core/rag_service.go`**: RAG pipeline orchestration
- **`api/handlers.go`**: HTTP API handlers

## πŸš€ Building & Deployment

### Command-Line Options
The application supports flexible configuration through command-line arguments:

```bash
Usage: ./rag-server [options]

Options:
-config string
Path to configuration file (default "config.json")
-help
Show help information
-version
Show version information

Examples:
./rag-server # Use default config.json
./rag-server -config=prod.json # Use custom config file
./rag-server -config=/path/to/config # Use absolute path
./rag-server -help # Show help
./rag-server -version # Show version
```

### Build Options

#### Single Platform Build
```bash
# Development build
go build -o rag-server .

# Optimized production build
go build -ldflags="-s -w" -o rag-server .
```

#### Cross-Platform Build
```bash
# Use provided build script for all platforms
chmod +x build.sh
./build.sh

# Manual cross-compilation (note: CGO required for sqlite-vec)
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o rag-server-linux .
CGO_ENABLED=1 GOOS=windows GOARCH=amd64 go build -ldflags="-s -w" -o rag-server.exe .
CGO_ENABLED=1 GOOS=darwin GOARCH=arm64 go build -ldflags="-s -w" -o rag-server-macos-arm64 .
```

> **⚠️ Note**: Cross-platform builds require appropriate CGO toolchains for each target platform due to sqlite-vec dependency. Build script will attempt all platforms but may fail for platforms without proper CGO setup.

### Deployment Configurations

#### Development
```json
{
"server_port": "8080",
"llamacpp_base_url": "http://localhost:8091/v1",
"embedding_model": "nomic-embed-text-v1.5",
"chat_model": "qwen3:8b",
"vector_db_path": "./rag_database.db",
"default_top_k": 3
}
```

#### Production
```json
{
"server_port": "80",
"llamacpp_base_url": "https://your-llm-api.com/v1",
"embedding_model": "text-embedding-ada-002",
"chat_model": "gpt-4",
"vector_db_path": "/data/rag_database.db",
"default_top_k": 5
}
```

### Docker Deployment (Optional)
```dockerfile
FROM golang:1.23-alpine AS builder
RUN apk add --no-cache gcc musl-dev sqlite-dev
WORKDIR /app
COPY . .
RUN CGO_ENABLED=1 go build -ldflags="-s -w" -o rag-server .

FROM alpine:latest
RUN apk --no-cache add ca-certificates sqlite
WORKDIR /root/
COPY --from=builder /app/rag-server .
COPY configs/ ./configs/
EXPOSE 8080
CMD ["./rag-server", "-config=configs/production.json"]
```

```bash
# Build and run with custom config
docker build -t rag-server .
docker run -p 8080:8080 -v $(pwd)/data:/data rag-server ./rag-server -config=/data/custom.json
```

### Environment-Specific Deployments
```bash
# Development
./rag-server -config=configs/dev.json

# Staging
./rag-server -config=configs/staging.json

# Production
./rag-server -config=configs/production.json
```

## 🀝 Contributing

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## πŸ“„ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## πŸ™ Acknowledgments

- [SQLite-vec](https://github.com/asg017/sqlite-vec) for high-performance vector storage
- [Gin](https://github.com/gin-gonic/gin) for the web framework
- [LlamaCPP](https://github.com/ggerganov/llama.cpp) for embedding and LLM services

## πŸ“ž Support

- πŸ“– **Documentation**: Check [API_REFERENCE.md](API_REFERENCE.md)
- πŸ› **Issues**: [GitHub Issues](https://github.com/aruntemme/go-rag/issues)
- πŸ’¬ **Discussions**: [GitHub Discussions](https://github.com/aruntemme/go-rag/discussions)

---

Built with ❀️ using Go and modern RAG techniques