https://github.com/resk-security/resk-caching
Resk-Caching is a Bun-based backend library and server designed for secure caching, embeddings orchestration, and vector database access. It prioritizes security (keeping secrets out of the frontend), high performance, and deep observability.
https://github.com/resk-security/resk-caching
ai bun caching chatbot llm redis
Last synced: 5 months ago
JSON representation
Resk-Caching is a Bun-based backend library and server designed for secure caching, embeddings orchestration, and vector database access. It prioritizes security (keeping secrets out of the frontend), high performance, and deep observability.
- Host: GitHub
- URL: https://github.com/resk-security/resk-caching
- Owner: Resk-Security
- License: apache-2.0
- Created: 2025-08-13T15:29:59.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-08-13T16:17:49.000Z (10 months ago)
- Last Synced: 2025-08-13T17:37:27.996Z (10 months ago)
- Topics: ai, bun, caching, chatbot, llm, redis
- Language: TypeScript
- Homepage:
- Size: 46.9 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
README
[](https://www.npmjs.com/package/resk-caching)
[](https://github.com/Resk-Security/resk-caching/blob/main/LICENSE)
[](https://www.npmjs.com/package/resk-caching)
[](https://github.com/Resk-Security/resk-caching/issues)
[](https://github.com/Resk-Security/resk-caching/stargazers)
[](https://github.com/Resk-Security/resk-caching/commits/main)
[](https://www.typescriptlang.org/)
[](https://github.com/Resk-Security/resk-caching)
## Full documentation
We provide a full documentation site (MkDocs). See `docs/` and the published site: [Resk-Caching Docs](https://resk-caching.readthedocs.io/en/latest/).
## Resk-Caching β LLM Response Caching with Vector Database Integration
Resk-Caching is a Bun-based backend library/server designed to **cache Large Language Model (LLM) responses using vector databases**, significantly reducing API costs while maintaining response quality and relevance.
### π― **Four Key GPTCache-Style Benefits**
Resk-Caching delivers the complete value proposition of intelligent LLM caching with four core benefits that transform how you build and scale AI applications:
#### π° **1. Massive Cost Reduction**
- **Up to 90% reduction** in LLM API costs through intelligent semantic caching
- **Real-time cost tracking** with provider-specific pricing (OpenAI, Anthropic, Google, etc.)
- **ROI analysis** showing exact savings from cache hits vs API calls
- **Cost breakdown** by provider, model, and time period
- **Automatic savings calculation** for every cached response
#### π **2. Performance Optimization**
- **Sub-5ms response times** for cached queries vs 500ms+ for API calls
- **Intelligent cache warming** strategies (popular, recent, predictive)
- **Real-time performance monitoring** with benchmarking and optimization recommendations
- **Slow query detection** with automated performance suggestions
- **Cache hit rate optimization** through advanced similarity algorithms
#### π§ͺ **3. Development & Testing Environment**
- **OpenAI-compatible API** for offline development without API costs
- **Mock LLM provider** with customizable responses and scenarios
- **Automated testing scenarios** with validation and metrics
- **Zero-cost development workflows** with realistic API simulation
- **Circuit breaker patterns** for resilient application development
#### π‘οΈ **4. Scalability & Availability**
- **Enhanced rate limiting bypass** with cache-first approach reducing API pressure
- **Circuit breaker patterns** with automatic failover and recovery
- **Health monitoring** and real-time system status
- **Automatic scaling** with proactive cache warming for traffic spikes
- **Graceful degradation** when external services fail
### π **How It Works**
1. **Pre-populated Response Database**: You maintain a database of high-quality LLM responses to common queries, stored as vector embeddings
2. **Semantic Matching**: When a new query arrives, the system finds the most semantically similar cached response
3. **Cost Savings**: Returns cached responses instead of making new API calls
4. **Response Selection**: Advanced algorithms allow you to choose specific responses based on business logic, user preferences, or A/B testing strategies
### π **Key Benefits**
β
**All Four GPTCache-Style Benefits Implemented:**
- **π° Massive Cost Reduction**: Up to 90% savings with real-time ROI tracking
- **π Performance Optimization**: Sub-5ms responses with intelligent cache warming
- **π§ͺ Development Environment**: OpenAI-compatible API for offline testing
- **π‘οΈ Scalability & Availability**: Circuit breakers and automatic failover
### Features
- **LLM Response Caching**: Store and retrieve LLM responses using vector similarity matching
- **Multiple Cache Backends**: in-memory, SQLite (local persistence), Redis (multi-instance)
- **Advanced Response Selection**: Deterministic, weighted, and randomized response selection algorithms
- **Vector Database Integration**: Optimized for semantic search and similarity matching
- **AES-GCM Encryption**: Secure cache-at-rest protection (optional via env key)
- **JWT-Protected API**: Secure access with rate limiting and abuse prevention
- **OpenAPI 3.1**: Auto-generated API documentation from Zod schemas
- **Performance Monitoring**: Prometheus metrics and OpenTelemetry tracing
- **Real-time Updates**: WebSockets for instant response distribution
### How we're different from other semantic caches
- **GPTCache**: Great Python-first cache. Resk-Caching focuses on Bun/TypeScript, ships with JWT-secured HTTP API, OpenAPI generation, built-in Prometheus/OTEL, and optional authenticated-at-rest encryption out of the box.
- **ModelCache**: Provides a semantic cache layer. Resk-Caching adds production concerns (rate-limit, security wrapper, metrics, tracing, OpenAPI, WebSockets) and pluggable backends with zero-code switching via `CACHE_BACKEND`.
- **Upstash Semantic Cache**: Managed vector-backed cache. Resk-Caching is open-source, self-hosted by default, and can run fully local with SQLite or purely in-memory while retaining encryption and observability.
- **Redis LangCache**: Managed Redis-based semantic cache. Resk-Caching supports Redis natively via Bun's RESP3 client while also offering SQLite and in-memory modes for portability and offline development.
- **SemantiCache (FAISS)**: FAISS-native library. Resk-Caching prioritizes a secure, observable HTTP surface with variant selection strategies and can integrate external vector DBs; no GPU dependency required.
If you need a secure, auditable cache service with operational tooling for teams, Resk-Caching is purpose-built for that surface.
### What each module is for
- **LLM Response Storage**: Store pre-computed LLM responses with their vector embeddings for fast retrieval
- **Caching Backends**: Choose between low-latency memory, local persistence (SQLite), or distributed (Redis) based on your scale
- **Response Selection Algorithms**: Implement deterministic, weighted, or randomized response selection based on business logic
- **Vector Similarity Matching**: Find the most semantically similar cached response to incoming queries
- **AES-GCM Encryption**: Protect sensitive LLM responses at rest with authenticated encryption
- **JWT + Rate Limiting**: Secure API access and prevent abuse while maintaining performance
- **Zod + OpenAPI**: Ensure data validation and provide always-in-sync API documentation
- **Performance Monitoring**: Track cache hit rates, response times, and cost savings in real-time
- **Real-time Distribution**: Instantly distribute responses across multiple instances and clients
## Prerequisites
### Vector Database Setup
Before using Resk-Caching, you need to have a **vector database** ready with pre-computed LLM responses. This is the foundation of the caching system:
1. **Response Database**: Create a collection of high-quality LLM responses to common queries
2. **Vector Embeddings**: Generate vector embeddings for each response using your preferred embedding model
3. **Metadata Storage**: Store additional context like response quality scores, categories, or business rules
4. **Similarity Index**: Ensure your vector database has proper indexing for fast similarity search
**Recommended Vector Databases:**
- **Pinecone**: Excellent for production use with high performance
- **Weaviate**: Open-source with great similarity search capabilities
- **Qdrant**: Fast and efficient for real-time applications
- **Chroma**: Simple local development and testing
## Install
```bash
# as a library (npm)
npm install resk-caching
# as a library (bun)
bun add resk-caching
```
## Quick Start
### Server Setup
```bash
# Install dependencies
bun install
# Start the server
bun run dev
# The server will be available at http://localhost:3000
```
### Step-by-step setup
1. Choose your key-value cache backend:
- `CACHE_BACKEND=memory` for local/dev
- `CACHE_BACKEND=sqlite` for single-node durability
- `CACHE_BACKEND=redis` for distributed/multi-instance
2. Choose your vector search strategy for semantic features:
- Default: in-memory vector store (process-local)
- Production: external vector DB (Pinecone/Qdrant/Weaviate/Chroma)
- Alternative: Redis RediSearch vectors or SQLite vector extensions
3. Ingest responses and embeddings (see Ingestion or `scripts/ingest-example.ts`).
4. Call `/api/semantic/store` and `/api/semantic/search`.
By default, semantic embeddings live in memory. To power vector search with Redis or SQLite, see the guides below.
### Vector search with Redis (RediSearch)
Use Redis Stack with RediSearch for vector similarity.
Example index and KNN search (1536-dim float32 cosine):
```bash
# Create index
redis-cli FT.CREATE idx:llm ON HASH PREFIX 1 llm: SCHEMA \
query TEXT \
embedding VECTOR HNSW 6 TYPE FLOAT32 DIM 1536 DISTANCE_METRIC COSINE \
category TAG SORTABLE \
metadata TEXT
# Insert (embedding must be raw float32 bytes)
redis-cli HSET llm:thank-you query "thank you" category "gratitude" \
embedding "$BINARY_FLOAT32" metadata "{\"tone\":\"friendly\"}"
# KNN search
redis-cli FT.SEARCH idx:llm "*=>[KNN 5 @embedding $vec AS score]" \
PARAMS 2 vec "$QUERY_EMBED_FLOAT32" \
SORTBY score DIALECT 2 RETURN 3 query category score
```
Notes:
- Convert `number[]` β `Float32Array` β bytes for `embedding` field.
- Keep response variants in a secondary key (e.g., `llm::responses`) and run variant selection after KNN.
### Vector search with SQLite (sqlite-vss/sqlite-vec)
Ship SQLite with a vector extension, then create a VSS table and join with metadata:
```sql
CREATE VIRTUAL TABLE vss_entries USING vss0(
id TEXT PRIMARY KEY,
embedding(1536)
);
CREATE TABLE llm_entries (
id TEXT PRIMARY KEY,
query TEXT NOT NULL,
category TEXT,
metadata TEXT
);
```
Insert and search:
```sql
-- insert: embedding blob is Float32 (vss_f32)
INSERT INTO vss_entries(id, embedding) VALUES (?, vss_f32(?));
INSERT INTO llm_entries(id, query, category, metadata) VALUES(?, ?, ?, ?);
-- KNN
SELECT e.id, l.query, vss_distance(e.embedding, vss_f32(?)) AS score
FROM vss_entries e
JOIN llm_entries l ON l.id = e.id
ORDER BY score ASC
LIMIT 5;
```
Notes:
- Convert `number[]` to Float32 blob for inserts and query embedding.
- Join back to your stored responses via `id` or `query`, then apply variant selection.
### Basic Usage Examples
#### 1. Store LLM Responses with Vector Embeddings
```bash
# Store multiple "thank you" responses with different tones
curl -X POST http://localhost:3000/api/semantic/store \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"query": "thank you",
"query_embedding": {
"vector": [0.1, 0.2, 0.3],
"dimension": 3
},
"responses": [
{
"id": "resp1",
"text": "You're welcome!",
"metadata": { "tone": "friendly", "formality": "casual" },
"quality_score": 0.95,
"category": "gratitude",
"tags": ["polite", "casual"]
},
{
"id": "resp2",
"text": "My pleasure!",
"metadata": { "tone": "professional", "formality": "formal" },
"quality_score": 0.92,
"category": "gratitude",
"tags": ["polite", "professional"]
},
{
"id": "resp3",
"text": "No problem at all!",
"metadata": { "tone": "casual", "formality": "informal" },
"quality_score": 0.88,
"category": "gratitude",
"tags": ["casual", "friendly"]
}
],
"variant_strategy": "weighted",
"weights": [3, 2, 1],
"seed": "user:123"
}'
```
**Response:**
```json
{
"success": true,
"message": "LLM responses stored successfully",
"entry_id": "thank you",
"responses_count": 3
}
```
#### 2. Semantic Search for Similar Queries
```bash
# Search for responses to "merci" (French thank you)
curl -X POST http://localhost:3000/api/semantic/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"query": "merci",
"query_embedding": {
"vector": [0.11, 0.19, 0.29],
"dimension": 3
},
"limit": 2,
"similarity_threshold": 0.8
}'
```
**Response:**
```json
{
"success": true,
"search_result": {
"query": "merci",
"query_embedding": {
"vector": [0.11, 0.19, 0.29],
"dimension": 3
},
"matches": [
{
"entry": {
"query": "thank you",
"responses": [...],
"variant_strategy": "weighted",
"weights": [3, 2, 1]
},
"similarity_score": 0.997,
"selected_response": {
"id": "resp1",
"text": "You're welcome!",
"metadata": { "tone": "friendly" }
}
}
],
"total_matches": 1,
"search_time_ms": 2
}
}
```
#### 3. Get All Responses for a Query
```bash
# Retrieve all stored responses for "thank you"
curl -X GET "http://localhost:3000/api/semantic/responses?query=thank%20you" \
-H "Authorization: Bearer YOUR_API_KEY"
```
**Response:**
```json
{
"success": true,
"entry": {
"query": "thank you",
"query_embedding": {
"vector": [0.1, 0.2, 0.3],
"dimension": 3
},
"responses": [
{
"id": "resp1",
"text": "You're welcome!",
"metadata": { "tone": "friendly" }
},
{
"id": "resp2",
"text": "My pleasure!",
"metadata": { "tone": "professional" }
},
{
"id": "resp3",
"text": "No problem at all!",
"metadata": { "tone": "casual" }
}
],
"variant_strategy": "weighted",
"weights": [3, 2, 1],
"created_at": "2024-01-15T10:30:00.000Z",
"last_accessed": "2024-01-15T10:35:00.000Z"
}
}
```
#### 4. Get Cache Statistics
```bash
# View cache performance metrics
curl -X GET http://localhost:3000/api/semantic/stats \
-H "Authorization: Bearer YOUR_API_KEY"
```
**Response:**
```json
{
"success": true,
"cache_type": "InMemoryVectorCache",
"message": "Stats endpoint - implementation needed"
}
```
### Advanced Usage Examples
#### Store Multiple Query Types
```bash
# Store responses for different types of greetings
curl -X POST http://localhost:3000/api/semantic/store \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"query": "hello",
"query_embedding": {
"vector": [0.9, 0.8, 0.7],
"dimension": 3
},
"responses": [
{
"id": "hello1",
"text": "Hi there!",
"metadata": { "tone": "friendly", "time_of_day": "any" }
},
{
"id": "hello2",
"text": "Hello! How are you?",
"metadata": { "tone": "polite", "time_of_day": "morning" }
}
],
"variant_strategy": "round-robin"
}'
```
#### Search with Different Similarity Thresholds
```bash
# Strict similarity matching (only very similar queries)
curl -X POST http://localhost:3000/api/semantic/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"query": "thanks a lot",
"query_embedding": {
"vector": [0.15, 0.25, 0.35],
"dimension": 3
},
"limit": 1,
"similarity_threshold": 0.95
}'
```
### Metrics and Monitoring
The system automatically tracks comprehensive metrics for all semantic operations:
- **Semantic Searches**: Total count, duration, and success rates
- **Vector Similarity**: Distribution of similarity scores
- **Response Storage**: Count of stored LLM responses by strategy
- **Cache Performance**: Entry counts and access patterns
- **Response Selection**: Variant strategy usage and performance
Access metrics at `/api/metrics` endpoint (Prometheus format).
### Performance Characteristics
- **Search Speed**: Typical semantic searches complete in <5ms
- **Memory Usage**: Efficient in-memory storage with configurable TTL
- **Scalability**: Designed for thousands of cached responses
- **Accuracy**: High-precision vector similarity using cosine distance
### Best Practices
1. **Vector Dimensions**: Use consistent embedding dimensions across your system
2. **Similarity Thresholds**: Start with 0.7-0.8 for production use
3. **Response Variety**: Store 3-5 responses per query for good variant selection
4. **Metadata**: Include rich metadata for better response selection
5. **TTL Management**: Set appropriate expiration times for dynamic content
## Environment variables
- PORT (default 3000)
- JWT_SECRET
- CACHE_BACKEND = memory | sqlite | redis
- REDIS_URL (for Redis backend)
- CACHE_ENCRYPTION_KEY (base64, 32 bytes)
- RATE_LIMIT_WINDOW_MS (default 900000)
- RATE_LIMIT_MAX (default 1000)
- OTEL_EXPORTER_OTLP_ENDPOINT (traces), OTEL_SERVICE_NAME
### Cache backends explained
- **In-memory** (`CACHE_BACKEND=memory`):
- Fastest single-process store (Map-based), ideal for development and ephemeral caches
- Per-key TTL stored alongside values; expired entries are lazily evicted on access
- No cross-process sharing and no durability
- **SQLite** (`CACHE_BACKEND=sqlite`):
- Local durability using Bun's SQLite; table `kv(key TEXT PRIMARY KEY, value TEXT, expiresAt INTEGER)`
- Upsert semantics on `set`, TTL computed client-side and stored in `expiresAt`
- Expired rows are pruned lazily on `get`; `clear()` wipes the table
- File path defaults to `resk-cache.sqlite`
- **Redis** (`CACHE_BACKEND=redis`, `REDIS_URL=...`):
- Distributed, multi-instance cache using Bun's native RESP3 client
- Values are JSON-serialized with optional TTL via `EXPIRE`
- Prefix isolation via `rc:`; `clear()` scans and deletes only `rc:*` keys
- Helpers for experiments (round-robin counters, sets/lists for variants, optional pub/sub)
## π API Endpoints - Complete Reference
### Core Cache Endpoints
- `GET /health` - Health check endpoint
- `POST /api/cache` (JWT) - Store simple key-value pairs
- `POST /api/cache/query` (JWT) - Retrieve cached values
- `DELETE /api/cache` (JWT) - Clear all cache
- `GET /api/openapi.json` - OpenAPI 3.1 specification from Zod schemas
- `GET /api/metrics` - Prometheus metrics exposition
### π° Cost Tracking Endpoints (NEW!)
- `POST /api/cost/record` (JWT) - Record LLM API cost for a request
- `GET /api/cost/analysis` (JWT) - Get comprehensive cost analysis and ROI
- `GET /api/cost/breakdown` (JWT) - Cost breakdown by provider and model
- `GET /api/cost/recent` (JWT) - Get recent cost entries
- `POST /api/cost/pricing` (JWT) - Add custom pricing for provider/model
- `GET /api/cost/pricing` (JWT) - Get all configured pricing
### π Performance Optimization Endpoints (NEW!)
- `POST /api/performance/record` (JWT) - Record performance metrics
- `GET /api/performance/benchmarks` (JWT) - Get performance benchmarks
- `GET /api/performance/slow-queries` (JWT) - Detect slow queries
- `GET /api/performance/recommendations` (JWT) - Get optimization recommendations
- `POST /api/performance/warming/start` (JWT) - Start cache warming strategy
- `GET /api/performance/warming/progress` (JWT) - Get cache warming progress
- `GET /api/performance/metrics` (JWT) - Get recent performance metrics
### π§ͺ Development & Testing Endpoints (NEW!)
- `POST /api/testing/chat/completions` (JWT) - OpenAI-compatible chat completions
- `POST /api/testing/mock/responses` (JWT) - Add custom mock responses
- `GET /api/testing/mock/responses` (JWT) - Get all mock responses
- `POST /api/testing/scenarios` (JWT) - Add test scenarios
- `GET /api/testing/scenarios` (JWT) - Get all test scenarios
- `POST /api/testing/scenarios/run` (JWT) - Run specific test scenario
- `POST /api/testing/scenarios/run-all` (JWT) - Run all test scenarios
- `GET /api/testing/history` (JWT) - Get request history
- `POST /api/testing/scenarios/defaults` (JWT) - Load default test scenarios
- `GET /api/testing/health` (JWT) - Get system health status
- `GET /api/testing/circuit-breakers` (JWT) - Get circuit breaker statistics
### Semantic Search Endpoints
- `POST /api/semantic/store` (JWT) - Store LLM responses with vector embeddings
- `POST /api/semantic/search` (JWT) - Search for similar queries using semantic similarity
- `GET /api/semantic/responses` (JWT) - Get all responses for a specific query
- `GET /api/semantic/stats` (JWT) - Get cache statistics and performance metrics
## Semantic Search & Response Selection
### How It Works
1. **Store Responses**: First, store your pre-computed LLM responses with their vector embeddings
2. **User Query**: When a user sends a message (e.g., "merci", "merci pour ta rΓ©ponse")
3. **Vector Search**: The system finds semantically similar queries in your database
4. **Response Selection**: Uses advanced algorithms to choose the most appropriate response
5. **Return Result**: Sends back a varied, contextually relevant response
### Example: Thank You Responses
Store multiple responses for "thank you" queries:
```json
{
"query": "thank you",
"query_embedding": {
"vector": [0.1, 0.2, 0.3, 0.4, 0.5],
"dimension": 5
},
"responses": [
{
"id": "thank_1",
"text": "You're welcome! I'm glad I could help.",
"metadata": {"tone": "friendly", "formality": "casual"},
"quality_score": 0.9,
"category": "gratitude"
},
{
"id": "thank_2",
"text": "My pleasure! Feel free to ask if you need anything else.",
"metadata": {"tone": "professional", "formality": "formal"},
"quality_score": 0.85,
"category": "gratitude"
}
],
"variant_strategy": "weighted",
"weights": [3, 2],
"seed": "user:123"
}
```
### Response Selection Strategies
- **random**: Uniform random selection for variety
- **round-robin**: Cycles through responses systematically
- **deterministic**: Stable selection based on seed (user ID, conversation ID)
- **weighted**: Probability-based selection according to quality scores or preferences
### Search for Similar Queries
When a user sends "merci pour ta rΓ©ponse", the system:
1. Converts the message to a vector embedding
2. Finds similar queries in the database (e.g., "thank you", "thanks", "merci")
3. Selects the best match based on similarity score
4. Applies the variant strategy to choose a response
5. Returns the selected response with metadata
This approach ensures users get varied, contextually appropriate responses while maintaining the high quality of pre-approved LLM outputs.
## Library usage (TypeScript)
```ts
import { selectCache, globalCostTracker, globalPerformanceOptimizer } from "resk-caching";
// Basic cache usage
const cache = selectCache();
await cache.set("key", { payload: true }, 60);
const val = await cache.get("key");
// Cost tracking integration
const cacheResult = await cache.search(query);
if (cacheResult) {
// Cache hit - record savings
globalCostTracker.recordCost({
provider: "openai",
model: "gpt-4",
inputTokens: 150,
outputTokens: 200,
cacheHit: true
});
} else {
// Cache miss - record actual cost
const response = await llmApi.createCompletion(query);
globalCostTracker.recordCost({
provider: "openai",
model: "gpt-4",
inputTokens: response.usage.prompt_tokens,
outputTokens: response.usage.completion_tokens,
cacheHit: false
});
}
// Performance monitoring
globalPerformanceOptimizer.recordMetric({
operation: 'search',
duration: responseTime,
cacheHit: !!cacheResult,
backend: 'redis'
});
```
## π Comprehensive Examples
### π° Cost Tracking Example
```typescript
// examples/cost-tracking-example.ts
import { CostTracker } from "resk-caching";
const tracker = new CostTracker();
// Record API costs
tracker.recordCost({
provider: "openai",
model: "gpt-4",
inputTokens: 150,
outputTokens: 300,
cacheHit: false
});
// Get ROI analysis
const analysis = tracker.getCostAnalysis(30); // 30 days
console.log(`Total Savings: $${analysis.totalSavings}`);
console.log(`ROI: ${analysis.roiPercentage}%`);
```
### π Performance Optimization Example
```typescript
// examples/performance-optimization-example.ts
import { PerformanceOptimizer } from "resk-caching";
const optimizer = new PerformanceOptimizer();
// Start cache warming
await optimizer.startCacheWarming({
strategy: 'popular',
batchSize: 20,
maxEntries: 1000
});
// Get optimization recommendations
const recommendations = optimizer.getOptimizationRecommendations();
recommendations.forEach(rec => {
console.log(`${rec.type}: ${rec.description}`);
});
```
### π§ͺ Development & Testing Example
```typescript
// examples/development-testing-example.ts
import { MockLLMProvider } from "resk-caching";
const mockProvider = new MockLLMProvider();
// OpenAI-compatible API for development
const response = await mockProvider.createChatCompletion({
model: "gpt-3.5-turbo",
messages: [{ role: "user", content: "Hello!" }]
});
// Run automated test scenarios
const testResults = await mockProvider.runAllTestScenarios();
console.log(`Tests passed: ${testResults.filter(r => r.passed).length}`);
```
### π Complete Demo
```bash
# Run the comprehensive demo showcasing all four benefits
npm run example:demo
# Or run individual examples
npm run example:cost-tracking
npm run example:performance
npm run example:development
```
## OpenAPI and clients
- Fetch the spec: GET /api/openapi.json
- Use your preferred OpenAPI generator to produce clients/SDKs
## Observability
- Prometheus metrics at /api/metrics
- OpenTelemetry tracing via OTLP HTTP exporter (configure OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_SERVICE_NAME)
- Correlation-ID header propagated for easier debugging
## Security model (summary)
- Secrets only on the server (env/secret manager). No secrets in frontend
- TLS transport; JWT short-lived; per-user/IP rate-limit
- Optional AES-GCM encryption at rest for persisted cache entries
- Structured logs with correlation-id; metrics and traces for forensics
## License
Apache-2.0 β see LICENSE
## Vector Database Integration
### Overview
Resk-Caching supports multiple vector database backends for similarity search and semantic caching. The system can ingest documents, compute embeddings, and store them in vector databases for efficient retrieval.
### Supported Vector Databases
- **Chroma**: Local or hosted ChromaDB instances
- **Pinecone**: Managed vector database service
- **Weaviate**: Open-source vector database
- **Milvus**: High-performance vector database
- **Custom adapters**: Extend for your specific needs
### Environment Configuration
```bash
# Vector Database Type
export VECTORDB_TYPE=pinecone # or chroma, weaviate, milvus
# Embedding Provider
export EMBEDDING_PROVIDER=openai # or huggingface, sentence-transformers
export EMBEDDING_MODEL=text-embedding-ada-002 # OpenAI model name
export OPENAI_API_KEY=your_openai_key_here
# Pinecone Configuration
export PINECONE_API_KEY=your_pinecone_key
export PINECONE_INDEX_HOST=https://your-index.pinecone.io
export PINECONE_INDEX_NAME=your-index-name
# Chroma Configuration
export CHROMA_HOST=localhost
export CHROMA_PORT=8000
export CHROMA_COLLECTION_NAME=documents
# Weaviate Configuration
export WEAVIATE_URL=http://localhost:8080
export WEAVIATE_API_KEY=your_weaviate_key
export WEAVIATE_CLASS_NAME=Document
# Milvus Configuration
export MILVUS_HOST=localhost
export MILVUS_PORT=19530
export MILVUS_COLLECTION_NAME=documents
# Batch Processing
export BATCH_SIZE=100 # Documents per batch for embeddings
export UPSERT_BATCH=50 # Documents per batch for vector DB
```
### Ingestion Script
Use the provided ingestion script to batch process documents:
```bash
# Run ingestion
bun run scripts/ingest-example.ts
```
The script will:
1. Read documents from your source
2. Compute embeddings in batches
3. Store vectors in the configured database
4. Handle retries and error recovery
### Example Ingestion Code
```typescript
import { createVectorDBAdapter } from 'resk-caching';
import { createEmbeddingProvider } from 'resk-caching';
async function ingestDocuments(documents: Document[]) {
const vectorDB = createVectorDBAdapter();
const embeddings = createEmbeddingProvider();
// Process in batches
for (let i = 0; i < documents.length; i += BATCH_SIZE) {
const batch = documents.slice(i, i + BATCH_SIZE);
// Compute embeddings
const vectors = await embeddings.embedBatch(
batch.map(doc => doc.content)
);
// Prepare for storage
const vectorsWithMetadata = batch.map((doc, idx) => ({
id: doc.id,
vector: vectors[idx],
metadata: {
title: doc.title,
source: doc.source,
timestamp: doc.timestamp
}
}));
// Store in vector database
await vectorDB.upsertBatch(vectorsWithMetadata);
}
}
```
### Vector Search
```typescript
import { createVectorDBAdapter } from 'resk-caching';
async function searchSimilar(query: string, k: number = 5) {
const vectorDB = createVectorDBAdapter();
const embeddings = createEmbeddingProvider();
// Get query embedding
const queryVector = await embeddings.embed(query);
// Search for similar vectors
const results = await vectorDB.search(queryVector, {
k,
threshold: 0.7, // Similarity threshold
filters: {
source: 'knowledge_base',
timestamp: { $gte: '2024-01-01' }
}
});
return results;
}
```
### Performance Considerations
- **Batch sizes**: Larger batches (100-500) for embeddings, smaller (50-100) for vector DB operations
- **Parallel processing**: Use worker threads for CPU-intensive embedding computation
- **Caching**: Cache frequently accessed embeddings and search results
- **Indexing**: Ensure proper vector database indexes are created for your use case
### Monitoring and Metrics
The system provides metrics for:
- Embedding computation latency and throughput
- Vector database operation success rates
- Search query performance
- Cache hit rates for vector operations
Access metrics at `/api/metrics` endpoint.
## Next steps
- Docker image and multi-stage build for slim runtimes
- LangChain integration helper (middleware to consult cache before LLM calls)
- LlamaIndex and Vercel AI SDK adapters
- Pluggable vector stores (Qdrant, Weaviate, Pinecone) with adapters
- Background refresh policies and stale-while-revalidate
- Eviction strategies (LRU/LFU) and cache warming CLI
- Upstash Redis & Redis Cloud deployment templates
- Benchmarks and load-test recipes (k6/Artillery)