https://github.com/pmbstyle/openrouter-proxy
Nodejs OpenRouter proxy inference that provides all nessary endpoints for your LLM application.
https://github.com/pmbstyle/openrouter-proxy
llm-inference llm-proxy llm-webservice nodejs openrouter-api
Last synced: about 2 months ago
JSON representation
Nodejs OpenRouter proxy inference that provides all nessary endpoints for your LLM application.
- Host: GitHub
- URL: https://github.com/pmbstyle/openrouter-proxy
- Owner: pmbstyle
- Created: 2025-10-19T13:35:24.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2026-01-12T20:53:59.000Z (5 months ago)
- Last Synced: 2026-01-12T22:58:56.780Z (5 months ago)
- Topics: llm-inference, llm-proxy, llm-webservice, nodejs, openrouter-api
- Language: TypeScript
- Homepage:
- Size: 235 KB
- Stars: 2
- Watchers: 0
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# OpenRouter Proxy Service
A high-performance, stateless OpenRouter proxy service built with Node.js, TypeScript, and Express. Provides REST API and WebSocket streaming capabilities for LLM inference.
## Features
- **Dual Interface**: REST API for standard requests and WebSocket for streaming
- **Comprehensive Parameter Support**: System prompts, model/provider selection, temperature, tools, etc.
- **Multi-modal Support**: Text, audio, and image generation capabilities
- **Robust Error Handling**: Graceful failure recovery and informative error responses
- **High Performance**: Optimized for speed and low latency
- **IP-based Rate Limiting**: Protection against abuse while maintaining simplicity
- **Usage tracking**: Global and per API key tracking requests, tokens, cost, and models
## Quick Start
### Prerequisites
- Node.js 20+
- npm or yarn
- OpenRouter API key
### Installation
1. Clone the repository:
```bash
git clone https://github.com/pmbstyle/openrouter-proxy.git
cd openrouter-proxy
```
2. Install dependencies:
```bash
npm install
```
3. Set up environment variables:
```bash
cp .env.example .env
# Edit .env with your OpenRouter API key
```
4. Build the project:
```bash
npm run build
```
5. Start the server:
```bash
npm start
```
For development:
```bash
npm run dev
```
## Configuration
The service uses environment variables for configuration. See `.env.example` for all available options:
### Required Variables
- `OPENROUTER_API_KEY`: Your OpenRouter API key
### Optional Variables
- `PORT`: Server port (default: 3000)
- `HOST`: Server host (default: 0.0.0.0)
- `NODE_ENV`: Environment (development/production/test)
- `LOG_LEVEL`: Logging level (debug/info/warn/error)
- `RATE_LIMIT_WINDOW_MS`: Rate limit window in milliseconds (default: 900000)
- `RATE_LIMIT_MAX_REQUESTS`: Max requests per window (default: 100)
- `WS_MAX_CONNECTIONS`: Max WebSocket connections (default: 1000)
- `WS_HEARTBEAT_INTERVAL`: WebSocket heartbeat interval (default: 30000)
- `MAX_CONCURRENT_REQUESTS`: Max concurrent requests (default: 100)
- `REQUEST_TIMEOUT`: Request timeout in milliseconds (default: 30000)
- `ALLOWED_ORIGINS`: Comma-separated list of allowed CORS origins (e.g., `http://localhost:3001,https://example.com`). Defaults to `http://localhost:3000,http://localhost:3001` in development mode. Leave empty to block all cross-origin requests in production.
- `CORS_CREDENTIALS`: Enable CORS credentials (default: true)
## API Endpoints
### Health Check
#### `GET /health`
Check service health status.
**Response:**
```json
{
"status": "healthy",
"timestamp": "2024-01-01T00:00:00.000Z",
"uptime": 123.45,
"version": "1.0.0",
"environment": "production"
}
```
### Inference Endpoints
#### `POST /api/v1/inference`
Create a completion using the specified model.
**Request Body:**
```json
{
"model": "openai/gpt-4o",
"messages": [
{
"role": "user",
"content": "Hello, world!"
}
],
"temperature": 0.7,
"max_tokens": 100,
"stream": false
}
```
**Response:**
```json
{
"id": "chatcmpl-123",
"choices": [
{
"finish_reason": "stop",
"message": {
"content": "Hello! How can I help you today?",
"role": "assistant"
}
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 15,
"total_tokens": 25
},
"model": "openai/gpt-4o",
"created": 1704067200,
"object": "chat.completion"
}
```
#### `POST /api/v1/inference/stream`
Create a streaming completion using the specified model.
**Request Body:**
```json
{
"model": "openai/gpt-4o",
"messages": [
{
"role": "user",
"content": "Tell me a story"
}
],
"temperature": 0.7,
"max_tokens": 500,
"stream": true
}
```
**Response:** Server-Sent Events stream
```
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{"content":" upon"},"finish_reason":null}]}
data: [DONE]
```
### Model Endpoints
#### `GET /api/v1/models`
List all available models with optional filtering and pagination.
**Query Parameters:**
- `provider` (optional): Filter by provider (e.g., "openai", "anthropic")
- `search` (optional): Search in model name or description
- `limit` (optional): Number of models to return (default: 50, max: 100)
- `offset` (optional): Number of models to skip (default: 0)
**Example:**
```
GET /api/v1/models?provider=openai&search=gpt&limit=10&offset=0
```
**Response:**
```json
{
"data": [
{
"id": "openai/gpt-4o",
"name": "GPT-4o",
"description": "Most advanced GPT-4 model",
"context_length": 128000,
"pricing": {
"prompt": "0.005",
"completion": "0.015"
},
"supported_parameters": ["temperature", "max_tokens", "top_p"],
"is_moderated": true,
"max_completion_tokens": 4096
}
],
"pagination": {
"total": 150,
"limit": 10,
"offset": 0,
"hasMore": true
}
}
```
#### `GET /api/v1/models/:id`
Get detailed information about a specific model.
**Example:**
```
GET /api/v1/models/openai/gpt-4o
```
**Response:**
```json
{
"data": {
"id": "openai/gpt-4o",
"name": "GPT-4o",
"description": "Most advanced GPT-4 model",
"context_length": 128000,
"pricing": {
"prompt": "0.005",
"completion": "0.015"
},
"supported_parameters": ["temperature", "max_tokens", "top_p", "frequency_penalty", "presence_penalty"],
"is_moderated": true,
"max_completion_tokens": 4096
}
}
```
#### `GET /api/v1/models/:id/parameters`
Get supported parameters for a specific model.
**Example:**
```
GET /api/v1/models/openai/gpt-4o/parameters
```
**Response:**
```json
{
"data": {
"model": "openai/gpt-4o",
"supported_parameters": [
"temperature",
"max_tokens",
"top_p",
"frequency_penalty",
"presence_penalty",
"stop",
"stream"
]
}
}
```
#### `GET /api/v1/models/:id/pricing`
Get pricing information for a specific model.
**Example:**
```
GET /api/v1/models/openai/gpt-4o/pricing
```
**Response:**
```json
{
"data": {
"model": "openai/gpt-4o",
"pricing": {
"prompt": "0.005",
"completion": "0.015"
}
}
}
```
#### `GET /api/v1/models/top`
Get top models by context length.
**Query Parameters:**
- `limit` (optional): Number of models to return (default: 10)
**Example:**
```
GET /api/v1/models/top?limit=5
```
**Response:**
```json
{
"data": [
{
"id": "anthropic/claude-3-5-sonnet-20241022",
"name": "Claude 3.5 Sonnet",
"context_length": 200000,
"pricing": {
"prompt": "0.003",
"completion": "0.015"
}
}
]
}
```
#### `GET /api/v1/models/search`
Search models by query.
**Query Parameters:**
- `q` (required): Search query
- `limit` (optional): Number of results to return (default: 20)
**Example:**
```
GET /api/v1/models/search?q=code&limit=5
```
**Response:**
```json
{
"data": [
{
"id": "openai/gpt-4o",
"name": "GPT-4o",
"description": "Most advanced GPT-4 model with code capabilities"
}
],
"query": "code",
"total": 25
}
```
#### `GET /api/v1/models/providers`
Get all available providers.
**Response:**
```json
{
"data": [
"openai",
"anthropic",
"google",
"meta",
"mistral"
]
}
```
#### `GET /api/v1/models/providers/:provider`
Get models by provider.
**Example:**
```
GET /api/v1/models/providers/openai
```
**Response:**
```json
{
"data": [
{
"id": "openai/gpt-4o",
"name": "GPT-4o",
"context_length": 128000
}
],
"provider": "openai"
}
```
## WebSocket API
### Connection
Connect to the WebSocket endpoint:
```
ws://localhost:3000/ws
```
### Message Types
#### Inference Request
```json
{
"type": "inference_request",
"id": "req-123",
"data": {
"model": "openai/gpt-4o",
"messages": [
{
"role": "user",
"content": "Hello, world!"
}
],
"temperature": 0.7,
"max_tokens": 100
}
}
```
#### Inference Response
```json
{
"type": "inference_response",
"id": "req-123",
"data": {
"content": "Hello! How can I help you today?",
"finish_reason": "stop",
"usage": {
"prompt_tokens": 10,
"completion_tokens": 15,
"total_tokens": 25
},
"model": "openai/gpt-4o",
"created": 1704067200
}
}
```
#### Heartbeat
```json
{
"type": "heartbeat",
"timestamp": 1704067200000
}
```
#### Error
```json
{
"type": "error",
"id": "req-123",
"error": {
"code": 400,
"message": "Invalid model",
"type": "validation"
}
}
```
#### Close
```json
{
"type": "close",
"reason": "Client requested close",
"code": 1000
}
```
## Usage Examples
### REST API Example (JavaScript)
```javascript
// Standard completion
const response = await fetch('http://localhost:3000/api/v1/inference', {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'openai/gpt-4o',
messages: [
{ role: 'user', content: 'Hello, world!' }
],
temperature: 0.7,
max_tokens: 100
})
});
const data = await response.json();
console.log(data.choices[0].message.content);
```
### Streaming Example (JavaScript)
```javascript
// Streaming completion
const response = await fetch('http://localhost:3000/api/v1/inference/stream', {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'openai/gpt-4o',
messages: [
{ role: 'user', content: 'Tell me a story' }
],
temperature: 0.7,
max_tokens: 500,
stream: true
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') break;
try {
const parsed = JSON.parse(data);
if (parsed.choices?.[0]?.delta?.content) {
console.log(parsed.choices[0].delta.content);
}
} catch (e) {
// Ignore invalid JSON
}
}
}
}
```
### WebSocket Example (JavaScript)
```javascript
const ws = new WebSocket('ws://localhost:3000/ws');
ws.onopen = () => {
// Send inference request
ws.send(JSON.stringify({
type: 'inference_request',
id: 'req-123',
data: {
model: 'openai/gpt-4o',
messages: [
{ role: 'user', content: 'Hello, world!' }
],
temperature: 0.7
}
}));
};
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
switch (message.type) {
case 'inference_response':
if (message.data.content) {
console.log(message.data.content);
}
if (message.data.finish_reason) {
console.log('Finished:', message.data.finish_reason);
}
break;
case 'error':
console.error('Error:', message.error.message);
break;
case 'heartbeat':
console.log('Heartbeat received');
break;
}
};
ws.onclose = () => {
console.log('WebSocket connection closed');
};
```
### Python Example
```python
import requests
import json
# Standard completion
response = requests.post('http://localhost:3000/api/v1/inference',
json={
'model': 'openai/gpt-4o',
'messages': [
{'role': 'user', 'content': 'Hello, world!'}
],
'temperature': 0.7,
'max_tokens': 100
}
)
data = response.json()
print(data['choices'][0]['message']['content'])
```
### cURL Examples
```bash
# Standard completion
curl -X POST http://localhost:3000/api/v1/inference \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"messages": [
{"role": "user", "content": "Hello, world!"}
],
"temperature": 0.7,
"max_tokens": 100
}'
# List models
curl http://localhost:3000/api/v1/models
# Get model details
curl http://localhost:3000/api/v1/models/openai/gpt-4o
# Search models
curl "http://localhost:3000/api/v1/models/search?q=gpt&limit=5"
```
## Error Handling
### Error Response Format
All errors follow this format:
```json
{
"error": {
"code": 400,
"message": "Validation error",
"type": "validation",
"details": {
"field": "model",
"message": "Model is required"
}
}
}
```
### Error Types
- `validation`: Request validation failed
- `rate_limit`: Rate limit exceeded
- `openrouter`: OpenRouter API error
- `internal`: Internal server error
### Common Error Codes
- `400`: Bad Request - Invalid request data
- `404`: Not Found - Model or endpoint not found
- `429`: Too Many Requests - Rate limit exceeded
- `500`: Internal Server Error - Server error
- `502`: Bad Gateway - OpenRouter API error
- `503`: Service Unavailable - Service temporarily unavailable
## Rate Limiting
The service implements IP-based rate limiting:
- **Default**: 100 requests per 15 minutes per IP
- **Inference endpoints**: 50 requests per 15 minutes per IP
- **WebSocket**: 5 connections per minute per IP
Rate limit headers are included in responses:
- `X-RateLimit-Limit`: Maximum requests allowed
- `X-RateLimit-Remaining`: Requests remaining in current window
- `X-RateLimit-Reset`: Time when the rate limit resets
## Development
### Scripts
- `npm run dev` - Start development server with hot reload
- `npm run build` - Build the project
- `npm start` - Start production server
- `npm test` - Run tests
- `npm run test:watch` - Run tests in watch mode
- `npm run test:coverage` - Run tests with coverage
- `npm run lint` - Run ESLint
- `npm run lint:fix` - Fix ESLint errors
### Project Structure
```
src/
├── controllers/ # Request handlers
├── services/ # Business logic
├── middleware/ # Express middleware
├── routes/ # API routes
├── types/ # TypeScript definitions
├── utils/ # Utility functions
├── app.ts # Express app setup
└── server.ts # Server entry point
```
## Testing
The project includes comprehensive tests:
- **Unit tests**: Test individual functions and classes
- **Integration tests**: Test complete request/response cycles
- **Load tests**: Test performance under load
Run tests:
```bash
npm test
```
## Docker
### Build and Run
```bash
# Build the image
docker build -f docker/Dockerfile -t llm-proxy .
# Run the container
docker run -p 3000:3000 -e OPENROUTER_API_KEY=your-key llm-proxy
```
### Docker Compose
```bash
# Start all services
docker-compose -f docker/docker-compose.yml up -d
# Stop all services
docker-compose -f docker/docker-compose.yml down
```
## Monitoring
The service provides monitoring endpoints:
- `GET /health` - Health check with uptime and version info
## Security
- IP-based rate limiting
- Input validation and sanitization
- CORS protection
- Security headers (Helmet)
- No authentication required (stateless design)
### CORS Configuration
The service uses Cross-Origin Resource Sharing (CORS) to control which domains can access the API. By default:
- **Development mode**: Allows `http://localhost:3000` and `http://localhost:3001`
- **Production mode**: Blocks all cross-origin requests unless explicitly configured
To configure allowed origins, set the `ALLOWED_ORIGINS` environment variable:
```bash
# Single origin
ALLOWED_ORIGINS=http://localhost:3001
# Multiple origins (comma-separated)
ALLOWED_ORIGINS=http://localhost:3001,http://localhost:3000,https://yourdomain.com
# Allow all origins (not recommended for production)
ALLOWED_ORIGINS=*
```
**Common CORS issues:**
If you see CORS errors in your browser console:
1. Ensure your client's origin is in `ALLOWED_ORIGINS`
2. Check that the origin includes the protocol (http/https) and port
3. For local development, either:
- Set `NODE_ENV=development` to use default localhost origins
- Or explicitly set `ALLOWED_ORIGINS=http://localhost:YOUR_PORT`
## Performance
- Connection pooling for OpenRouter API
- Efficient WebSocket handling
- Memory-optimized streaming
- Request/response compression
- Caching for model information
- Stateless design for horizontal scaling
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Submit a pull request
## License
MIT License - see LICENSE file for details.
## Support
For issues and questions, please open an issue on GitHub.