https://github.com/chicogong/stream-relay-go
A lightweight Go streaming relay for LLM/TTS APIs with production-grade observability and policy controls
https://github.com/chicogong/stream-relay-go
anthropic api-gateway docker gin golang grafana llm llm-proxy observability openai prometheus proxy rate-limiting siliconflow sse streaming tts
Last synced: 4 months ago
JSON representation
A lightweight Go streaming relay for LLM/TTS APIs with production-grade observability and policy controls
- Host: GitHub
- URL: https://github.com/chicogong/stream-relay-go
- Owner: chicogong
- License: mit
- Created: 2025-12-21T06:18:55.000Z (6 months ago)
- Default Branch: master
- Last Pushed: 2025-12-24T06:39:37.000Z (6 months ago)
- Last Synced: 2025-12-25T01:33:19.880Z (6 months ago)
- Topics: anthropic, api-gateway, docker, gin, golang, grafana, llm, llm-proxy, observability, openai, prometheus, proxy, rate-limiting, siliconflow, sse, streaming, tts
- Language: Go
- Size: 529 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Stream Relay Go
[](https://github.com/chicogong/stream-relay-go/actions/workflows/ci.yml)
[](https://goreportcard.com/report/github.com/chicogong/stream-relay-go)
[](https://codecov.io/gh/chicogong/stream-relay-go)
[](https://golang.org)
[](LICENSE)
[](CONTRIBUTING.md)
English | [简体中文](README_zh.md)
A lightweight, high-performance streaming relay for LLM and TTS APIs with built-in observability.
## ✨ Features
- **🚀 Low-latency Streaming** - Token-by-token SSE streaming with immediate flush
- **🔐 Auto Authentication** - Automatic Bearer token injection for upstream APIs
- **📊 Real-time Monitoring** - Prometheus metrics + beautiful Grafana dashboards
- **🎯 Multi-provider Support** - SiliconFlow, OpenAI, Anthropic, Azure TTS
- **⚡ Zero Dependency** - Optional Redis/ClickHouse, works standalone
- **🛡️ Production Ready** - Rate limiting, health checks, graceful shutdown
## 🏗️ Architecture
```
Client Request
↓
API Key Auth
↓
Rate Limiting
↓
Upstream Auth Injection
↓
SSE Streaming Proxy ← → Upstream API
↓
Metrics Collection
↓
Client Response
```
## 🚀 Quick Start
### Prerequisites
- Go 1.21+
- (Optional) Docker for monitoring stack
### Installation
```bash
# Clone the repository
git clone https://github.com/chicogong/stream-relay-go.git
cd stream-relay-go
# Build
make build
# Run
./bin/relay -config configs/config.yaml
```
### Configuration
1. Copy the example environment file:
```bash
cp .env.example .env
```
2. Add your API keys to `.env`:
```bash
SILICONFLOW_API_KEY=sk-your-key-here
OPENAI_API_KEY=sk-your-key-here
ANTHROPIC_API_KEY=sk-ant-your-key-here
```
3. Start the relay:
```bash
make dev
```
The relay will start on `http://localhost:8080`
### Testing
```bash
# Health check
curl http://localhost:8080/healthz
# Streaming request
curl -N http://localhost:8080/v1/chat/completions \
-H 'Authorization: Bearer sk-relay-test-key-123' \
-H 'Content-Type: application/json' \
-d '{
"model": "Qwen/Qwen2.5-7B-Instruct",
"messages": [{"role": "user", "content": "Hello"}],
"stream": true,
"max_tokens": 20
}'
```
## 📊 Monitoring
### Start Grafana + Prometheus
```bash
cd deployments/grafana
docker-compose up -d
```
### Access Dashboards
- **Grafana**: http://localhost:3000 (admin/admin)
- **Prometheus**: http://localhost:9090
- **Metrics Endpoint**: http://localhost:8080/metrics
### Beautiful Dashboard

The dashboard provides real-time insights:
- 📊 **Total Requests** - Cumulative request count
- ✅ **Success Rate** - Real-time success percentage (color-coded: 🟢 >99%, 🟡 >95%, 🟠 >90%, 🔴 <90%)
- 📈 **Request Rate** - Requests per minute with smooth curves
- ⏱️ **Response Time** - p50/p95/p99 latency percentiles
- 🔥 **Heatmap** - Visual latency distribution
- 🚨 **Error Monitoring** - Instant error detection with alerts
### 🚀 Enhanced Dashboard with Logs

The enhanced dashboard (`enhanced-dashboard.json`) includes **15 comprehensive panels** with integrated log viewing:
**Metrics Panels:**
- 📊 Total Requests, Success Rate, Avg Response Time
- 🔗 Active Connections, Error Count, Storage Latency
- 📈 Request Rate Trend & Response Time Percentiles (p50/p95/p99)
- 🎯 Requests by Route (donut chart)
- 📊 Status Code Distribution (2xx/4xx/5xx bar gauge)
- 🚨 Error Types Table & Active Connections Over Time
- 🔥 Request Latency Heatmap
- 📋 Recent Activity Log Table
**Log Integration (Loki):**
- 📝 Live application logs with filtering
- 🔍 Search logs by level (ERROR, INFO, DEBUG)
- 📊 Unified metrics + logs view for faster debugging
**Setup:**
The enhanced monitoring stack includes Loki + Promtail for log aggregation. See [deployments/grafana/README.md](deployments/grafana/README.md) for full setup instructions.
### Generate Demo Traffic
```bash
# Run the test script to generate sample requests
./test_relay.sh
# Or manually send requests
for i in {1..10}; do
curl -N http://localhost:8080/v1/chat/completions \
-H 'Authorization: Bearer sk-relay-test-key-123' \
-H 'Content-Type: application/json' \
-d "{\"model\": \"Qwen/Qwen2.5-7B-Instruct\", \"messages\": [{\"role\": \"user\", \"content\": \"Count to $i\"}], \"stream\": true, \"max_tokens\": 20}"
done
```
Watch the metrics update in real-time at http://localhost:3000
> 💡 **Tip**: Use `./scripts/generate-demo.sh` to populate the dashboard with demo traffic!
## 📁 Project Structure
```
stream-relay-go/
├── cmd/relay/ # Application entry point
├── internal/ # Core implementation
│ ├── config.go # Configuration management
│ ├── proxy.go # Streaming proxy logic
│ ├── server.go # HTTP server setup
│ ├── metrics.go # Prometheus metrics
│ ├── limiter.go # Rate limiting
│ └── storage.go # Optional storage layer
├── configs/ # Configuration files
├── deployments/ # Docker & Grafana configs
└── docs/ # Documentation
```
## ⚙️ Configuration
### Server
```yaml
server:
port: 8080
timeout: 300s
max_body_size: 10485760 # 10MB
```
### Routes
```yaml
routes:
- name: siliconflow
path: /v1/chat/completions
upstream: https://api.siliconflow.cn
auth_header: Authorization
auth_env: SILICONFLOW_API_KEY
kind: sse
```
### Rate Limiting
```yaml
rate_limit:
enabled: true
default: 100 # requests per minute per tenant
burst: 20
```
## 🔧 Advanced Usage
### Custom Routes
Add custom routes in `configs/config.yaml`:
```yaml
routes:
- name: custom-provider
path: /custom/path
upstream: https://api.custom.com
auth_header: X-API-Key
auth_env: CUSTOM_API_KEY
kind: sse
```
### Storage Backend
Enable optional storage for detailed logging:
```yaml
storage:
redis:
addr: localhost:6379
password: ""
db: 0
```
## 📈 Metrics
The relay exposes comprehensive Prometheus metrics at `/metrics` endpoint:
### Core Metrics
| Metric Name | Type | Description | Labels |
|-------------|------|-------------|--------|
| `relay_requests_total` | Counter | Total number of requests processed | `route`, `status` (2xx/4xx/5xx) |
| `relay_duration_ms` | Histogram | Request duration in milliseconds | `route` |
| `relay_errors_total` | Counter | Total number of errors | `route`, `type` |
| `relay_active_connections` | Gauge | Current number of active connections | `route` |
| `relay_storage_write_ms` | Histogram | Storage write latency in milliseconds | - |
### Histogram Buckets
- **Duration Buckets**: 100ms, 500ms, 1s, 2s, 5s, 10s, 30s, 60s
- **Storage Write Buckets**: 1ms, 5ms, 10ms, 50ms, 100ms, 500ms, 1s
### Example Queries
```promql
# Request rate (requests per minute)
rate(relay_requests_total[1m]) * 60
# Average latency
rate(relay_duration_ms_sum[1m]) / rate(relay_duration_ms_count[1m])
# P95 latency
histogram_quantile(0.95, rate(relay_duration_ms_bucket[1m]))
# Success rate
sum(relay_requests_total{status="2xx"}) / sum(relay_requests_total) * 100
# Error rate
rate(relay_errors_total[1m])
# Active connections by route
relay_active_connections
```
### Grafana Dashboard
Import `deployments/grafana/beautiful-dashboard.json` for a pre-configured dashboard with:
- Real-time request rate
- Latency percentiles (p50, p95, p99)
- Success rate gauge
- Error monitoring
- Request heatmap
- Recent activity table
## 🤝 Contributing
Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for details.
## 📝 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🙏 Acknowledgments
- Built with [Gin](https://github.com/gin-gonic/gin)
- Monitoring powered by [Prometheus](https://prometheus.io) and [Grafana](https://grafana.com)
- Inspired by best practices in API gateway design
## 📮 Support
- 🐛 [Report Bug](https://github.com/chicogong/stream-relay-go/issues)
- 💡 [Request Feature](https://github.com/chicogong/stream-relay-go/issues)
- 📧 Email: your-email@example.com
---
**Made with ❤️ for the LLM community**