https://github.com/aruntemme/go-rag
Advanced RAG System with Go featuring intelligent adaptive chunking, hierarchical document processing, semantic search, and flexible LLM integration
https://github.com/aruntemme/go-rag
golang llm localllm openai rag sqlite-vec sqlite3
Last synced: about 2 months ago
JSON representation
Advanced RAG System with Go featuring intelligent adaptive chunking, hierarchical document processing, semantic search, and flexible LLM integration
- Host: GitHub
- URL: https://github.com/aruntemme/go-rag
- Owner: aruntemme
- License: mit
- Created: 2025-05-25T14:13:40.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-25T14:52:11.000Z (about 1 year ago)
- Last Synced: 2025-06-03T07:01:08.551Z (about 1 year ago)
- Topics: golang, llm, localllm, openai, rag, sqlite-vec, sqlite3
- Language: Go
- Homepage:
- Size: 4.31 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# π€ Advanced RAG System with Go
[](https://golang.org/)
[](LICENSE)
[](https://github.com/asg017/sqlite-vec)
A sophisticated **Retrieval Augmented Generation (RAG)** system built with Go, featuring intelligent adaptive chunking, hierarchical document processing, semantic search, flexible LLM integration, and command-line configuration management.
## β¨ Key Features
### π§ Intelligent Adaptive Chunking System
- **Document-Size Aware**: Automatically adapts chunking strategy based on document characteristics
- **5-Tier Classification**: VerySmall β Small β Medium β Large β VeryLarge with tailored strategies
- **Context Preservation**: Smart thresholds prevent fragmentation while maintaining semantic coherence
- **50% Better Performance**: Fewer chunks with 100% better context preservation
### π Advanced Search & Retrieval
- **Search-Only Endpoint**: Pure retrieval without LLM overhead (500x faster)
- **Full RAG Pipeline**: Complete question-answering with context generation
- **Semantic Thresholding**: Filter results by similarity scores
- **Metadata Filtering**: Precise targeting with custom filters
- **Query Expansion**: Automatic synonym and related term expansion
### π Multiple Chunking Strategies
- **Structural Chunking**: Intelligent section and paragraph detection
- **Fixed-Size Chunking**: Traditional character-based with overlap
- **Semantic Chunking**: Content-aware based on meaning
- **Sentence Window**: Overlapping sentence-based chunks
- **Parent-Child Relationships**: Hierarchical organization for multi-level context
### π Performance & Flexibility
- **SQLite-vec Integration**: High-performance vector storage
- **Concurrent Processing**: Efficient batch embedding generation
- **Dimension Auto-Detection**: Automatic model compatibility
- **RESTful API**: Clean, well-documented endpoints
- **External LLM Support**: Use any OpenAI-compatible service
- **Command-Line Interface**: Flexible configuration with CLI arguments
- **Cross-Platform Builds**: Single build script for all platforms
## ποΈ Architecture
```
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Documents βββββΆβ Adaptive Chunking βββββΆβ Vector Store β
β β β System β β (SQLite-vec) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Search API ββββββ Embedding ββββββ Raw Search β
β (/search) β β Service β β Results β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββ
β External LLM β β Full RAG API β
β Processing β β (/query) β
βββββββββββββββββββ ββββββββββββββββββββ
```
## π Prerequisites
- **Go 1.19+**
- **OpenAI-compatible API Server** (LlamaCPP, OpenAI, Ollama, or any v1/embeddings endpoint)
- **Embedding Model** (Nomic, OpenAI, or compatible)
## π Quick Start
### 1. Clone & Install
```bash
git clone https://github.com/aruntemme/go-rag.git
cd go-rag
go mod tidy
```
### 2. Build (Optional but Recommended)
```bash
# Quick build for current platform
go build -ldflags="-s -w" -o rag-server .
# Or build for all platforms
chmod +x build.sh && ./build.sh
```
### 3. Configure
Create `config.json`:
```json
{
"server_port": "8080",
"llamacpp_base_url": "http://localhost:8091/v1",
"embedding_model": "nomic-embed-text-v1.5",
"chat_model": "qwen3:8b",
"vector_db_path": "./rag_database.db",
"default_top_k": 3
}
```
### 4. Start Embedding Server
```bash
# Example with llama.cpp
./server -m your-model.gguf --host 0.0.0.0 --port 8091
# Or use OpenAI API
# Set OPENAI_API_KEY and use https://api.openai.com/v1
# Or use Ollama
ollama serve
```
### 5. Run the Application
#### Development Mode
```bash
go run main.go
```
#### Build & Run (Recommended)
```bash
# Build optimized executable
go build -ldflags="-s -w" -o rag-server .
# Run with default config
./rag-server
# Run with custom config
./rag-server -config=production.json
# Show help and options
./rag-server -help
# Show version
./rag-server -version
```
π Server starts on `http://localhost:8080` (or configured port)
## π Usage Examples
### Basic Document Upload & Search
```bash
# 1. Create a collection
curl -X POST http://localhost:8080/api/v1/collections \
-H "Content-Type: application/json" \
-d '{"name": "my_docs", "description": "My documents"}'
# 2. Add a document (adaptive chunking automatically applied)
curl -X POST http://localhost:8080/api/v1/documents \
-H "Content-Type: application/json" \
-d '{
"collection_name": "my_docs",
"content": "Your document content here...",
"source": "document.txt"
}'
# 3. Search without LLM (fast retrieval)
curl -X POST http://localhost:8080/api/v1/search \
-H "Content-Type: application/json" \
-d '{
"collection_name": "my_docs",
"query": "What is this about?",
"top_k": 5
}'
# 4. Full RAG query (with answer generation)
curl -X POST http://localhost:8080/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"collection_name": "my_docs",
"query": "What is this about?",
"top_k": 5
}'
```
### Advanced Search Features
```bash
# Search with semantic filtering and metadata
curl -X POST http://localhost:8080/api/v1/search \
-H "Content-Type: application/json" \
-d '{
"collection_name": "my_docs",
"query": "machine learning experience",
"top_k": 10,
"semantic_threshold": 0.3,
"metadata_filters": {
"section": "experience",
"chunk_type": "job_entry"
}
}'
```
## π API Endpoints
| Endpoint | Method | Purpose | Speed |
|----------|--------|---------|-------|
| `/health` | GET | Health check | β‘ Instant |
| `/api/v1/collections` | POST/GET/DELETE | Manage collections | β‘ Fast |
| `/api/v1/documents` | POST/GET/DELETE | Manage documents | π’ Processing |
| `/api/v1/search` | POST | **Retrieval only** | β‘ Fast |
| `/api/v1/query` | POST | **Full RAG** | π’ LLM dependent |
| `/api/v1/analyze` | POST | Detailed analysis | π’ LLM dependent |
> π **Full API documentation**: [API_REFERENCE.md](API_REFERENCE.md)
## π§ Adaptive Chunking System
Our intelligent chunking system automatically optimizes based on document characteristics:
### Document Size Categories
- **VerySmall** (<1KB): Single chunk or max 2-3 chunks
- **Small** (1-3KB): 3-5 meaningful chunks, 400+ char minimum
- **Medium** (3-10KB): Structural/semantic chunking
- **Large** (10-50KB): Hierarchical parent-child chunks
- **VeryLarge** (50KB+): Aggressive hierarchical chunking
### Performance Benefits
- **50% Fewer Chunks**: Reduces noise and improves relevance
- **100% Better Context**: Maintains semantic coherence
- **Universal Compatibility**: Works with any document type
- **Automatic Optimization**: No manual tuning required
> π **Detailed explanation**: [ADAPTIVE_CHUNKING.md](ADAPTIVE_CHUNKING.md)
## π Search vs Query Endpoints
### `/api/v1/search` - Pure Retrieval
```json
{
"chunks_found": 3,
"chunks": [/* detailed chunk data */],
"context": "ready-to-use context string",
"similarity_scores": [0.95, 0.87, 0.82],
"processing_time": 0.056
}
```
**Perfect for**: External LLM processing, custom pipelines, debugging
### `/api/v1/query` - Full RAG
```json
{
"answer": "Generated answer based on retrieved context",
"retrieved_context": ["context chunks"],
"enhanced_chunks": [/* chunks with metadata */],
"processing_time": 2.34
}
```
**Perfect for**: Complete question-answering, integrated solutions
> π **Search endpoint guide**: [SEARCH_ENDPOINT.md](SEARCH_ENDPOINT.md)
## πββοΈ Performance
| Operation | Time | Description |
|-----------|------|-------------|
| Document Upload | ~1-5s | Depends on size & chunking |
| Search Query | ~0.05s | Pure retrieval |
| Full RAG Query | ~2-30s | Includes LLM generation |
| Embedding Batch | ~0.1s/chunk | Concurrent processing |
## π οΈ Development
### Project Structure
```
go-rag/
βββ main.go # Application entry point
βββ config.json # Configuration file
βββ go.mod & go.sum # Go dependencies
βββ api/ # HTTP handlers and routing
βββ core/ # Core business logic
βββ models/ # Data structures
βββ config/ # Configuration management
βββ docs/ # Documentation
```
### Key Components
- **`core/document_processor.go`**: Adaptive chunking engine
- **`core/vector_db.go`**: SQLite-vec integration
- **`core/rag_service.go`**: RAG pipeline orchestration
- **`api/handlers.go`**: HTTP API handlers
## π Building & Deployment
### Command-Line Options
The application supports flexible configuration through command-line arguments:
```bash
Usage: ./rag-server [options]
Options:
-config string
Path to configuration file (default "config.json")
-help
Show help information
-version
Show version information
Examples:
./rag-server # Use default config.json
./rag-server -config=prod.json # Use custom config file
./rag-server -config=/path/to/config # Use absolute path
./rag-server -help # Show help
./rag-server -version # Show version
```
### Build Options
#### Single Platform Build
```bash
# Development build
go build -o rag-server .
# Optimized production build
go build -ldflags="-s -w" -o rag-server .
```
#### Cross-Platform Build
```bash
# Use provided build script for all platforms
chmod +x build.sh
./build.sh
# Manual cross-compilation (note: CGO required for sqlite-vec)
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o rag-server-linux .
CGO_ENABLED=1 GOOS=windows GOARCH=amd64 go build -ldflags="-s -w" -o rag-server.exe .
CGO_ENABLED=1 GOOS=darwin GOARCH=arm64 go build -ldflags="-s -w" -o rag-server-macos-arm64 .
```
> **β οΈ Note**: Cross-platform builds require appropriate CGO toolchains for each target platform due to sqlite-vec dependency. Build script will attempt all platforms but may fail for platforms without proper CGO setup.
### Deployment Configurations
#### Development
```json
{
"server_port": "8080",
"llamacpp_base_url": "http://localhost:8091/v1",
"embedding_model": "nomic-embed-text-v1.5",
"chat_model": "qwen3:8b",
"vector_db_path": "./rag_database.db",
"default_top_k": 3
}
```
#### Production
```json
{
"server_port": "80",
"llamacpp_base_url": "https://your-llm-api.com/v1",
"embedding_model": "text-embedding-ada-002",
"chat_model": "gpt-4",
"vector_db_path": "/data/rag_database.db",
"default_top_k": 5
}
```
### Docker Deployment (Optional)
```dockerfile
FROM golang:1.23-alpine AS builder
RUN apk add --no-cache gcc musl-dev sqlite-dev
WORKDIR /app
COPY . .
RUN CGO_ENABLED=1 go build -ldflags="-s -w" -o rag-server .
FROM alpine:latest
RUN apk --no-cache add ca-certificates sqlite
WORKDIR /root/
COPY --from=builder /app/rag-server .
COPY configs/ ./configs/
EXPOSE 8080
CMD ["./rag-server", "-config=configs/production.json"]
```
```bash
# Build and run with custom config
docker build -t rag-server .
docker run -p 8080:8080 -v $(pwd)/data:/data rag-server ./rag-server -config=/data/custom.json
```
### Environment-Specific Deployments
```bash
# Development
./rag-server -config=configs/dev.json
# Staging
./rag-server -config=configs/staging.json
# Production
./rag-server -config=configs/production.json
```
## π€ Contributing
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## π License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## π Acknowledgments
- [SQLite-vec](https://github.com/asg017/sqlite-vec) for high-performance vector storage
- [Gin](https://github.com/gin-gonic/gin) for the web framework
- [LlamaCPP](https://github.com/ggerganov/llama.cpp) for embedding and LLM services
## π Support
- π **Documentation**: Check [API_REFERENCE.md](API_REFERENCE.md)
- π **Issues**: [GitHub Issues](https://github.com/aruntemme/go-rag/issues)
- π¬ **Discussions**: [GitHub Discussions](https://github.com/aruntemme/go-rag/discussions)
---
Built with β€οΈ using Go and modern RAG techniques