https://github.com/ukrocks007/ai-gateway-kit
Provider-agnostic AI gateway with capability-based routing, in-memory rate limiting, and observability hooks.
https://github.com/ukrocks007/ai-gateway-kit
ai capability-routing fallback gateway gemini github-models llm observability openai rate-limiting
Last synced: 5 months ago
JSON representation
Provider-agnostic AI gateway with capability-based routing, in-memory rate limiting, and observability hooks.
- Host: GitHub
- URL: https://github.com/ukrocks007/ai-gateway-kit
- Owner: ukrocks007
- License: mit
- Created: 2026-01-01T17:44:56.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2026-01-02T10:37:05.000Z (5 months ago)
- Last Synced: 2026-01-07T00:36:22.027Z (5 months ago)
- Topics: ai, capability-routing, fallback, gateway, gemini, github-models, llm, observability, openai, rate-limiting
- Language: TypeScript
- Homepage:
- Size: 44.9 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ai-gateway-kit
A boring, provider-agnostic AI Gateway for Node.js.
This library exists to solve the “production gateway” problems around LLM usage:
- **Capability-based routing** (agents request *capabilities*, not models)
- **Ordered fallback** (graceful degradation, never silent failure)
- **In-memory rate limiting** (instance-scoped by design)
- **Observability hooks** (you choose logging/metrics/tracing)
## Why capability-based routing?
Model names change, providers change, and quotas fluctuate.
A gateway that routes by *capability* lets your agents stay stable while the model fleet evolves.
Example capabilities:
- `fast_text`
- `deep_reasoning`
- `search`
- `speech_to_text`
## Why in-memory state?
This kit intentionally uses **in-memory** rate limit state.
- Works in serverless environments (Vercel-compatible)
- No shared storage dependency
- Predictable failure modes
Trade-off: **multi-instance deployments do not share quotas**. Each instance enforces limits based on its own in-memory view.
If you need cross-instance coordination, you can replace the in-memory `RateLimitManager` with your own implementation.
## This is not a chat wrapper
This library is infrastructure:
- routing
- backoff
- fallbacks
- hooks
It does **not** provide prompt templates, product policies, UI, or agent logic.
## Install
```bash
npm install ai-gateway-kit
```
## Quick start
```ts
import { createAIGateway, createGitHubModelsProvider } from "ai-gateway-kit";
const gateway = createAIGateway({
models: [
{
id: "gpt-4o-mini",
provider: "github",
capabilities: ["fast_text"],
limits: { rpm: 15, rpd: 150, tpmInput: 150000, tpmOutput: 20000, concurrency: 3 }
}
],
providers: {
github: createGitHubModelsProvider({
token: process.env.GITHUB_TOKEN!
})
}
});
const result = await gateway.execute({
capability: "fast_text",
input: {
kind: "chat",
messages: [{ role: "user", content: "Say hi." }]
}
});
console.log(result.output);
```
**📚 [See more examples →](./examples/)**
## Core Features
### Capability-based routing
Route requests by capability, not model names. See [examples/02-capability-routing.ts](./examples/02-capability-routing.ts).
### Automatic fallback
Graceful degradation across models. See [examples/03-fallback-handling.ts](./examples/03-fallback-handling.ts).
### Rate limiting
In-memory rate limits (rpm, rpd, tpm, concurrency). See [examples/03-fallback-handling.ts](./examples/03-fallback-handling.ts).
### Multiple providers
GitHub Models, Gemini, or custom providers. See [examples/04-multi-provider.ts](./examples/04-multi-provider.ts).
### Advanced features
- JSON mode: [examples/06-json-mode.ts](./examples/06-json-mode.ts)
- Web search: [examples/07-search-capability.ts](./examples/07-search-capability.ts)
- Temperature control: [examples/08-temperature-control.ts](./examples/08-temperature-control.ts)
- Request cancellation: [examples/11-abort-requests.ts](./examples/11-abort-requests.ts)
- Dynamic registration: [examples/12-dynamic-registration.ts](./examples/12-dynamic-registration.ts)
## Providers
- **GitHub Models**: OpenAI models via GitHub ([docs](./examples/04-multi-provider.ts))
- **Gemini**: Google Gemini models with search ([docs](./examples/07-search-capability.ts))
- **Custom provider**: Implement `ProviderAdapter` interface
## Observability hooks
You can subscribe to lifecycle events without taking a dependency on any logging stack:
- `onRequestStart` - When a request begins
- `onRequestEnd` - When a request completes (success or failure)
- `onRateLimit` - When rate limits are encountered
- `onFallback` - When falling back to another model
- `onError` - When errors occur
**Example:** [examples/09-observability-hooks.ts](./examples/09-observability-hooks.ts)
```ts
import { createAIGateway, createGitHubModelsProvider, type GatewayHooks } from "ai-gateway-kit";
const hooks: GatewayHooks = {
onRequestStart: (event) => {
console.log(`Starting: ${event.modelId}`);
},
onRequestEnd: (event) => {
const duration = event.endedAt - event.startedAt;
console.log(`${event.ok ? 'Success' : 'Failed'}: ${event.modelId} (${duration}ms)`);
},
onRateLimit: (event) => {
console.log(`Rate limit: ${event.modelId} - ${event.decision.reason}`);
},
onFallback: (event) => {
console.log(`Fallback: ${event.fromModelId} → ${event.toModelId}`);
},
onError: (event) => {
console.error(`Error: ${event.modelId} - ${event.error.message}`);
}
};
const gateway = createAIGateway({
models: [...],
providers: {
github: createGitHubModelsProvider({ token: process.env.GITHUB_TOKEN! })
},
hooks
});
```
## Examples
The [examples](./examples/) directory contains comprehensive examples for all features:
| Example | Description |
|---------|-------------|
| [01-basic-setup.ts](./examples/01-basic-setup.ts) | Minimal setup to get started |
| [02-capability-routing.ts](./examples/02-capability-routing.ts) | Route by capability, not model name |
| [03-fallback-handling.ts](./examples/03-fallback-handling.ts) | Automatic fallback when rate limited |
| [04-multi-provider.ts](./examples/04-multi-provider.ts) | Use GitHub + Gemini together |
| [05-custom-routing.ts](./examples/05-custom-routing.ts) | Implement custom routing logic |
| [06-json-mode.ts](./examples/06-json-mode.ts) | Request structured JSON output |
| [07-search-capability.ts](./examples/07-search-capability.ts) | Web search with Gemini |
| [08-temperature-control.ts](./examples/08-temperature-control.ts) | Control creativity with temperature |
| [09-observability-hooks.ts](./examples/09-observability-hooks.ts) | Monitor with lifecycle hooks |
| [10-agent-tracking.ts](./examples/10-agent-tracking.ts) | Track multi-agent systems |
| [11-abort-requests.ts](./examples/11-abort-requests.ts) | Cancel in-flight requests |
| [12-dynamic-registration.ts](./examples/12-dynamic-registration.ts) | Add models at runtime |
**[View all examples →](./examples/)**
## License
MIT