https://github.com/resk-security/resk-caching

Resk-Caching is a Bun-based backend library and server designed for secure caching, embeddings orchestration, and vector database access. It prioritizes security (keeping secrets out of the frontend), high performance, and deep observability.
https://github.com/resk-security/resk-caching
ai bun caching chatbot llm redis
Last synced: 5 months ago
JSON representation
Host: GitHub
URL: https://github.com/resk-security/resk-caching
Owner: Resk-Security
License: apache-2.0
Created: 2025-08-13T15:29:59.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-08-13T16:17:49.000Z (10 months ago)
Last Synced: 2025-08-13T17:37:27.996Z (10 months ago)
Topics: ai, bun, caching, chatbot, llm, redis
Language: TypeScript
Homepage:
Size: 46.9 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project

README

          [![NPM version](https://img.shields.io/npm/v/resk-caching.svg)](https://www.npmjs.com/package/resk-caching)

[![NPM License](https://img.shields.io/npm/l/resk-caching.svg)](https://github.com/Resk-Security/resk-caching/blob/main/LICENSE)

[![NPM Downloads](https://img.shields.io/npm/dt/resk-caching.svg)](https://www.npmjs.com/package/resk-caching)

[![GitHub issues](https://img.shields.io/github/issues/Resk-Security/resk-caching.svg)](https://github.com/Resk-Security/resk-caching/issues)

[![GitHub stars](https://img.shields.io/github/stars/Resk-Security/resk-caching.svg)](https://github.com/Resk-Security/resk-caching/stargazers)

[![GitHub last commit](https://img.shields.io/github/last-commit/Resk-Security/resk-caching.svg)](https://github.com/Resk-Security/resk-caching/commits/main)

[![TypeScript](https://img.shields.io/badge/TypeScript-^5.4.5-blue.svg)](https://www.typescriptlang.org/)

[![LLM Security](https://img.shields.io/badge/LLM-Security-red)](https://github.com/Resk-Security/resk-caching)

## Full documentation

We provide a full documentation site (MkDocs). See `docs/` and the published site: [Resk-Caching Docs](https://resk-caching.readthedocs.io/en/latest/).

## Resk-Caching — LLM Response Caching with Vector Database Integration

Resk-Caching is a Bun-based backend library/server designed to **cache Large Language Model (LLM) responses using vector databases**, significantly reducing API costs while maintaining response quality and relevance.

### 🎯 **Four Key GPTCache-Style Benefits**

Resk-Caching delivers the complete value proposition of intelligent LLM caching with four core benefits that transform how you build and scale AI applications:

#### 💰 **1. Massive Cost Reduction**

- **Up to 90% reduction** in LLM API costs through intelligent semantic caching

- **Real-time cost tracking** with provider-specific pricing (OpenAI, Anthropic, Google, etc.)

- **ROI analysis** showing exact savings from cache hits vs API calls

- **Cost breakdown** by provider, model, and time period

- **Automatic savings calculation** for every cached response

#### 🚀 **2. Performance Optimization**

- **Sub-5ms response times** for cached queries vs 500ms+ for API calls

- **Intelligent cache warming** strategies (popular, recent, predictive)

- **Real-time performance monitoring** with benchmarking and optimization recommendations

- **Slow query detection** with automated performance suggestions

- **Cache hit rate optimization** through advanced similarity algorithms

#### 🧪 **3. Development & Testing Environment**

- **OpenAI-compatible API** for offline development without API costs

- **Mock LLM provider** with customizable responses and scenarios

- **Automated testing scenarios** with validation and metrics

- **Zero-cost development workflows** with realistic API simulation

- **Circuit breaker patterns** for resilient application development

#### 🛡️ **4. Scalability & Availability**

- **Enhanced rate limiting bypass** with cache-first approach reducing API pressure

- **Circuit breaker patterns** with automatic failover and recovery

- **Health monitoring** and real-time system status

- **Automatic scaling** with proactive cache warming for traffic spikes

- **Graceful degradation** when external services fail

### 🔍 **How It Works**

1. **Pre-populated Response Database**: You maintain a database of high-quality LLM responses to common queries, stored as vector embeddings

2. **Semantic Matching**: When a new query arrives, the system finds the most semantically similar cached response

3. **Cost Savings**: Returns cached responses instead of making new API calls

4. **Response Selection**: Advanced algorithms allow you to choose specific responses based on business logic, user preferences, or A/B testing strategies

### 🚀 **Key Benefits**

✅ **All Four GPTCache-Style Benefits Implemented:**

- **💰 Massive Cost Reduction**: Up to 90% savings with real-time ROI tracking

- **🚀 Performance Optimization**: Sub-5ms responses with intelligent cache warming

- **🧪 Development Environment**: OpenAI-compatible API for offline testing

- **🛡️ Scalability & Availability**: Circuit breakers and automatic failover

### Features

- **LLM Response Caching**: Store and retrieve LLM responses using vector similarity matching

- **Multiple Cache Backends**: in-memory, SQLite (local persistence), Redis (multi-instance)

- **Advanced Response Selection**: Deterministic, weighted, and randomized response selection algorithms

- **Vector Database Integration**: Optimized for semantic search and similarity matching

- **AES-GCM Encryption**: Secure cache-at-rest protection (optional via env key)

- **JWT-Protected API**: Secure access with rate limiting and abuse prevention

- **OpenAPI 3.1**: Auto-generated API documentation from Zod schemas

- **Performance Monitoring**: Prometheus metrics and OpenTelemetry tracing

- **Real-time Updates**: WebSockets for instant response distribution

### How we're different from other semantic caches

- **GPTCache**: Great Python-first cache. Resk-Caching focuses on Bun/TypeScript, ships with JWT-secured HTTP API, OpenAPI generation, built-in Prometheus/OTEL, and optional authenticated-at-rest encryption out of the box.

- **ModelCache**: Provides a semantic cache layer. Resk-Caching adds production concerns (rate-limit, security wrapper, metrics, tracing, OpenAPI, WebSockets) and pluggable backends with zero-code switching via `CACHE_BACKEND`.

- **Upstash Semantic Cache**: Managed vector-backed cache. Resk-Caching is open-source, self-hosted by default, and can run fully local with SQLite or purely in-memory while retaining encryption and observability.

- **Redis LangCache**: Managed Redis-based semantic cache. Resk-Caching supports Redis natively via Bun's RESP3 client while also offering SQLite and in-memory modes for portability and offline development.

- **SemantiCache (FAISS)**: FAISS-native library. Resk-Caching prioritizes a secure, observable HTTP surface with variant selection strategies and can integrate external vector DBs; no GPU dependency required.

If you need a secure, auditable cache service with operational tooling for teams, Resk-Caching is purpose-built for that surface.

### What each module is for

- **LLM Response Storage**: Store pre-computed LLM responses with their vector embeddings for fast retrieval

- **Caching Backends**: Choose between low-latency memory, local persistence (SQLite), or distributed (Redis) based on your scale

- **Response Selection Algorithms**: Implement deterministic, weighted, or randomized response selection based on business logic

- **Vector Similarity Matching**: Find the most semantically similar cached response to incoming queries

- **AES-GCM Encryption**: Protect sensitive LLM responses at rest with authenticated encryption

- **JWT + Rate Limiting**: Secure API access and prevent abuse while maintaining performance

- **Zod + OpenAPI**: Ensure data validation and provide always-in-sync API documentation

- **Performance Monitoring**: Track cache hit rates, response times, and cost savings in real-time

- **Real-time Distribution**: Instantly distribute responses across multiple instances and clients

## Prerequisites

### Vector Database Setup

Before using Resk-Caching, you need to have a **vector database** ready with pre-computed LLM responses. This is the foundation of the caching system:

1. **Response Database**: Create a collection of high-quality LLM responses to common queries

2. **Vector Embeddings**: Generate vector embeddings for each response using your preferred embedding model

3. **Metadata Storage**: Store additional context like response quality scores, categories, or business rules

4. **Similarity Index**: Ensure your vector database has proper indexing for fast similarity search

**Recommended Vector Databases:**

- **Pinecone**: Excellent for production use with high performance

- **Weaviate**: Open-source with great similarity search capabilities

- **Qdrant**: Fast and efficient for real-time applications

- **Chroma**: Simple local development and testing

## Install

```bash

# as a library (npm)

npm install resk-caching

# as a library (bun)

bun add resk-caching

```

## Quick Start

### Server Setup

```bash

# Install dependencies

bun install

# Start the server

bun run dev

# The server will be available at http://localhost:3000

```

### Step-by-step setup

1. Choose your key-value cache backend:

   - `CACHE_BACKEND=memory` for local/dev

   - `CACHE_BACKEND=sqlite` for single-node durability

   - `CACHE_BACKEND=redis` for distributed/multi-instance

2. Choose your vector search strategy for semantic features:

   - Default: in-memory vector store (process-local)

   - Production: external vector DB (Pinecone/Qdrant/Weaviate/Chroma)

   - Alternative: Redis RediSearch vectors or SQLite vector extensions

3. Ingest responses and embeddings (see Ingestion or `scripts/ingest-example.ts`).

4. Call `/api/semantic/store` and `/api/semantic/search`.

By default, semantic embeddings live in memory. To power vector search with Redis or SQLite, see the guides below.

### Vector search with Redis (RediSearch)

Use Redis Stack with RediSearch for vector similarity.

Example index and KNN search (1536-dim float32 cosine):

```bash

# Create index

redis-cli FT.CREATE idx:llm ON HASH PREFIX 1 llm: SCHEMA \

  query TEXT \

  embedding VECTOR HNSW 6 TYPE FLOAT32 DIM 1536 DISTANCE_METRIC COSINE \

  category TAG SORTABLE \

  metadata TEXT

# Insert (embedding must be raw float32 bytes)

redis-cli HSET llm:thank-you query "thank you" category "gratitude" \

  embedding "$BINARY_FLOAT32" metadata "{\"tone\":\"friendly\"}"

# KNN search

redis-cli FT.SEARCH idx:llm "*=>[KNN 5 @embedding $vec AS score]" \

  PARAMS 2 vec "$QUERY_EMBED_FLOAT32" \

  SORTBY score DIALECT 2 RETURN 3 query category score

```

Notes:

- Convert `number[]` → `Float32Array` → bytes for `embedding` field.

- Keep response variants in a secondary key (e.g., `llm::responses`) and run variant selection after KNN.

### Vector search with SQLite (sqlite-vss/sqlite-vec)

Ship SQLite with a vector extension, then create a VSS table and join with metadata:

```sql

CREATE VIRTUAL TABLE vss_entries USING vss0(

  id TEXT PRIMARY KEY,

  embedding(1536)

);

CREATE TABLE llm_entries (

  id TEXT PRIMARY KEY,

  query TEXT NOT NULL,

  category TEXT,

  metadata TEXT

);

```

Insert and search:

```sql

-- insert: embedding blob is Float32 (vss_f32)

INSERT INTO vss_entries(id, embedding) VALUES (?, vss_f32(?));

INSERT INTO llm_entries(id, query, category, metadata) VALUES(?, ?, ?, ?);

-- KNN

SELECT e.id, l.query, vss_distance(e.embedding, vss_f32(?)) AS score

FROM vss_entries e

JOIN llm_entries l ON l.id = e.id

ORDER BY score ASC

LIMIT 5;

```

Notes:

- Convert `number[]` to Float32 blob for inserts and query embedding.

- Join back to your stored responses via `id` or `query`, then apply variant selection.

### Basic Usage Examples

#### 1. Store LLM Responses with Vector Embeddings

```bash

# Store multiple "thank you" responses with different tones

curl -X POST http://localhost:3000/api/semantic/store \

  -H "Content-Type: application/json" \

  -H "Authorization: Bearer YOUR_API_KEY" \

  -d '{

    "query": "thank you",

    "query_embedding": {

      "vector": [0.1, 0.2, 0.3],

      "dimension": 3

    },

    "responses": [

      {

        "id": "resp1",

        "text": "You're welcome!",

        "metadata": { "tone": "friendly", "formality": "casual" },

        "quality_score": 0.95,

        "category": "gratitude",

        "tags": ["polite", "casual"]

      },

      {

        "id": "resp2", 

        "text": "My pleasure!",

        "metadata": { "tone": "professional", "formality": "formal" },

        "quality_score": 0.92,

        "category": "gratitude",

        "tags": ["polite", "professional"]

      },

      {

        "id": "resp3",

        "text": "No problem at all!",

        "metadata": { "tone": "casual", "formality": "informal" },

        "quality_score": 0.88,

        "category": "gratitude",

        "tags": ["casual", "friendly"]

      }

    ],

    "variant_strategy": "weighted",

    "weights": [3, 2, 1],

    "seed": "user:123"

  }'

```

**Response:**

```json

{

  "success": true,

  "message": "LLM responses stored successfully",

  "entry_id": "thank you",

  "responses_count": 3

}

```

#### 2. Semantic Search for Similar Queries

```bash

# Search for responses to "merci" (French thank you)

curl -X POST http://localhost:3000/api/semantic/search \

  -H "Content-Type: application/json" \

  -H "Authorization: Bearer YOUR_API_KEY" \

  -d '{

    "query": "merci",

    "query_embedding": {

      "vector": [0.11, 0.19, 0.29],

      "dimension": 3

    },

    "limit": 2,

    "similarity_threshold": 0.8

  }'

```

**Response:**

```json

{

  "success": true,

  "search_result": {

    "query": "merci",

    "query_embedding": {

      "vector": [0.11, 0.19, 0.29],

      "dimension": 3

    },

    "matches": [

      {

        "entry": {

          "query": "thank you",

          "responses": [...],

          "variant_strategy": "weighted",

          "weights": [3, 2, 1]

        },

        "similarity_score": 0.997,

        "selected_response": {

          "id": "resp1",

          "text": "You're welcome!",

          "metadata": { "tone": "friendly" }

        }

      }

    ],

    "total_matches": 1,

    "search_time_ms": 2

  }

}

```

#### 3. Get All Responses for a Query

```bash

# Retrieve all stored responses for "thank you"

curl -X GET "http://localhost:3000/api/semantic/responses?query=thank%20you" \

  -H "Authorization: Bearer YOUR_API_KEY"

```

**Response:**

```json

{

  "success": true,

  "entry": {

    "query": "thank you",

    "query_embedding": {

      "vector": [0.1, 0.2, 0.3],

      "dimension": 3

    },

    "responses": [

      {

        "id": "resp1",

        "text": "You're welcome!",

        "metadata": { "tone": "friendly" }

      },

      {

        "id": "resp2",

        "text": "My pleasure!",

        "metadata": { "tone": "professional" }

      },

      {

        "id": "resp3",

        "text": "No problem at all!",

        "metadata": { "tone": "casual" }

      }

    ],

    "variant_strategy": "weighted",

    "weights": [3, 2, 1],

    "created_at": "2024-01-15T10:30:00.000Z",

    "last_accessed": "2024-01-15T10:35:00.000Z"

  }

}

```

#### 4. Get Cache Statistics

```bash

# View cache performance metrics

curl -X GET http://localhost:3000/api/semantic/stats \

  -H "Authorization: Bearer YOUR_API_KEY"

```

**Response:**

```json

{

  "success": true,

  "cache_type": "InMemoryVectorCache",

  "message": "Stats endpoint - implementation needed"

}

```

### Advanced Usage Examples

#### Store Multiple Query Types

```bash

# Store responses for different types of greetings

curl -X POST http://localhost:3000/api/semantic/store \

  -H "Content-Type: application/json" \

  -H "Authorization: Bearer YOUR_API_KEY" \

  -d '{

    "query": "hello",

    "query_embedding": {

      "vector": [0.9, 0.8, 0.7],

      "dimension": 3

    },

    "responses": [

      {

        "id": "hello1",

        "text": "Hi there!",

        "metadata": { "tone": "friendly", "time_of_day": "any" }

      },

      {

        "id": "hello2",

        "text": "Hello! How are you?",

        "metadata": { "tone": "polite", "time_of_day": "morning" }

      }

    ],

    "variant_strategy": "round-robin"

  }'

```

#### Search with Different Similarity Thresholds

```bash

# Strict similarity matching (only very similar queries)

curl -X POST http://localhost:3000/api/semantic/search \

  -H "Content-Type: application/json" \

  -H "Authorization: Bearer YOUR_API_KEY" \

  -d '{

    "query": "thanks a lot",

    "query_embedding": {

      "vector": [0.15, 0.25, 0.35],

      "dimension": 3

    },

    "limit": 1,

    "similarity_threshold": 0.95

  }'

```

### Metrics and Monitoring

The system automatically tracks comprehensive metrics for all semantic operations:

- **Semantic Searches**: Total count, duration, and success rates

- **Vector Similarity**: Distribution of similarity scores

- **Response Storage**: Count of stored LLM responses by strategy

- **Cache Performance**: Entry counts and access patterns

- **Response Selection**: Variant strategy usage and performance

Access metrics at `/api/metrics` endpoint (Prometheus format).

### Performance Characteristics

- **Search Speed**: Typical semantic searches complete in <5ms

- **Memory Usage**: Efficient in-memory storage with configurable TTL

- **Scalability**: Designed for thousands of cached responses

- **Accuracy**: High-precision vector similarity using cosine distance

### Best Practices

1. **Vector Dimensions**: Use consistent embedding dimensions across your system

2. **Similarity Thresholds**: Start with 0.7-0.8 for production use

3. **Response Variety**: Store 3-5 responses per query for good variant selection

4. **Metadata**: Include rich metadata for better response selection

5. **TTL Management**: Set appropriate expiration times for dynamic content

## Environment variables

- PORT (default 3000)

- JWT_SECRET

- CACHE_BACKEND = memory | sqlite | redis

- REDIS_URL (for Redis backend)

- CACHE_ENCRYPTION_KEY (base64, 32 bytes)

- RATE_LIMIT_WINDOW_MS (default 900000)

- RATE_LIMIT_MAX (default 1000)

- OTEL_EXPORTER_OTLP_ENDPOINT (traces), OTEL_SERVICE_NAME

### Cache backends explained

- **In-memory** (`CACHE_BACKEND=memory`):

  - Fastest single-process store (Map-based), ideal for development and ephemeral caches

  - Per-key TTL stored alongside values; expired entries are lazily evicted on access

  - No cross-process sharing and no durability

- **SQLite** (`CACHE_BACKEND=sqlite`):

  - Local durability using Bun's SQLite; table `kv(key TEXT PRIMARY KEY, value TEXT, expiresAt INTEGER)`

  - Upsert semantics on `set`, TTL computed client-side and stored in `expiresAt`

  - Expired rows are pruned lazily on `get`; `clear()` wipes the table

  - File path defaults to `resk-cache.sqlite`

- **Redis** (`CACHE_BACKEND=redis`, `REDIS_URL=...`):

  - Distributed, multi-instance cache using Bun's native RESP3 client

  - Values are JSON-serialized with optional TTL via `EXPIRE`

  - Prefix isolation via `rc:`; `clear()` scans and deletes only `rc:*` keys

  - Helpers for experiments (round-robin counters, sets/lists for variants, optional pub/sub)

## 🔗 API Endpoints - Complete Reference

### Core Cache Endpoints

- `GET /health` - Health check endpoint

- `POST /api/cache` (JWT) - Store simple key-value pairs

- `POST /api/cache/query` (JWT) - Retrieve cached values

- `DELETE /api/cache` (JWT) - Clear all cache

- `GET /api/openapi.json` - OpenAPI 3.1 specification from Zod schemas

- `GET /api/metrics` - Prometheus metrics exposition

### 💰 Cost Tracking Endpoints (NEW!)

- `POST /api/cost/record` (JWT) - Record LLM API cost for a request

- `GET /api/cost/analysis` (JWT) - Get comprehensive cost analysis and ROI

- `GET /api/cost/breakdown` (JWT) - Cost breakdown by provider and model

- `GET /api/cost/recent` (JWT) - Get recent cost entries

- `POST /api/cost/pricing` (JWT) - Add custom pricing for provider/model

- `GET /api/cost/pricing` (JWT) - Get all configured pricing

### 🚀 Performance Optimization Endpoints (NEW!)

- `POST /api/performance/record` (JWT) - Record performance metrics

- `GET /api/performance/benchmarks` (JWT) - Get performance benchmarks

- `GET /api/performance/slow-queries` (JWT) - Detect slow queries

- `GET /api/performance/recommendations` (JWT) - Get optimization recommendations

- `POST /api/performance/warming/start` (JWT) - Start cache warming strategy

- `GET /api/performance/warming/progress` (JWT) - Get cache warming progress

- `GET /api/performance/metrics` (JWT) - Get recent performance metrics

### 🧪 Development & Testing Endpoints (NEW!)

- `POST /api/testing/chat/completions` (JWT) - OpenAI-compatible chat completions

- `POST /api/testing/mock/responses` (JWT) - Add custom mock responses

- `GET /api/testing/mock/responses` (JWT) - Get all mock responses

- `POST /api/testing/scenarios` (JWT) - Add test scenarios

- `GET /api/testing/scenarios` (JWT) - Get all test scenarios

- `POST /api/testing/scenarios/run` (JWT) - Run specific test scenario

- `POST /api/testing/scenarios/run-all` (JWT) - Run all test scenarios

- `GET /api/testing/history` (JWT) - Get request history

- `POST /api/testing/scenarios/defaults` (JWT) - Load default test scenarios

- `GET /api/testing/health` (JWT) - Get system health status

- `GET /api/testing/circuit-breakers` (JWT) - Get circuit breaker statistics

### Semantic Search Endpoints

- `POST /api/semantic/store` (JWT) - Store LLM responses with vector embeddings

- `POST /api/semantic/search` (JWT) - Search for similar queries using semantic similarity

- `GET /api/semantic/responses` (JWT) - Get all responses for a specific query

- `GET /api/semantic/stats` (JWT) - Get cache statistics and performance metrics

## Semantic Search & Response Selection

### How It Works

1. **Store Responses**: First, store your pre-computed LLM responses with their vector embeddings

2. **User Query**: When a user sends a message (e.g., "merci", "merci pour ta réponse")

3. **Vector Search**: The system finds semantically similar queries in your database

4. **Response Selection**: Uses advanced algorithms to choose the most appropriate response

5. **Return Result**: Sends back a varied, contextually relevant response

### Example: Thank You Responses

Store multiple responses for "thank you" queries:

```json

{

  "query": "thank you",

  "query_embedding": {

    "vector": [0.1, 0.2, 0.3, 0.4, 0.5],

    "dimension": 5

  },

  "responses": [

    {

      "id": "thank_1",

      "text": "You're welcome! I'm glad I could help.",

      "metadata": {"tone": "friendly", "formality": "casual"},

      "quality_score": 0.9,

      "category": "gratitude"

    },

    {

      "id": "thank_2",

      "text": "My pleasure! Feel free to ask if you need anything else.",

      "metadata": {"tone": "professional", "formality": "formal"},

      "quality_score": 0.85,

      "category": "gratitude"

    }

  ],

  "variant_strategy": "weighted",

  "weights": [3, 2],

  "seed": "user:123"

}

```

### Response Selection Strategies

- **random**: Uniform random selection for variety

- **round-robin**: Cycles through responses systematically

- **deterministic**: Stable selection based on seed (user ID, conversation ID)

- **weighted**: Probability-based selection according to quality scores or preferences

### Search for Similar Queries

When a user sends "merci pour ta réponse", the system:

1. Converts the message to a vector embedding

2. Finds similar queries in the database (e.g., "thank you", "thanks", "merci")

3. Selects the best match based on similarity score

4. Applies the variant strategy to choose a response

5. Returns the selected response with metadata

This approach ensures users get varied, contextually appropriate responses while maintaining the high quality of pre-approved LLM outputs.

## Library usage (TypeScript)

```ts

import { selectCache, globalCostTracker, globalPerformanceOptimizer } from "resk-caching";

// Basic cache usage

const cache = selectCache();

await cache.set("key", { payload: true }, 60);

const val = await cache.get("key");

// Cost tracking integration

const cacheResult = await cache.search(query);

if (cacheResult) {

  // Cache hit - record savings

  globalCostTracker.recordCost({

    provider: "openai",

    model: "gpt-4", 

    inputTokens: 150,

    outputTokens: 200,

    cacheHit: true

  });

} else {

  // Cache miss - record actual cost

  const response = await llmApi.createCompletion(query);

  globalCostTracker.recordCost({

    provider: "openai",

    model: "gpt-4",

    inputTokens: response.usage.prompt_tokens,

    outputTokens: response.usage.completion_tokens,

    cacheHit: false

  });

}

// Performance monitoring

globalPerformanceOptimizer.recordMetric({

  operation: 'search',

  duration: responseTime,

  cacheHit: !!cacheResult,

  backend: 'redis'

});

```

## 📚 Comprehensive Examples

### 💰 Cost Tracking Example

```typescript

// examples/cost-tracking-example.ts

import { CostTracker } from "resk-caching";

const tracker = new CostTracker();

// Record API costs

tracker.recordCost({

  provider: "openai",

  model: "gpt-4",

  inputTokens: 150,

  outputTokens: 300,

  cacheHit: false

});

// Get ROI analysis

const analysis = tracker.getCostAnalysis(30); // 30 days

console.log(`Total Savings: $${analysis.totalSavings}`);

console.log(`ROI: ${analysis.roiPercentage}%`);

```

### 🚀 Performance Optimization Example

```typescript

// examples/performance-optimization-example.ts

import { PerformanceOptimizer } from "resk-caching";

const optimizer = new PerformanceOptimizer();

// Start cache warming

await optimizer.startCacheWarming({

  strategy: 'popular',

  batchSize: 20,

  maxEntries: 1000

});

// Get optimization recommendations

const recommendations = optimizer.getOptimizationRecommendations();

recommendations.forEach(rec => {

  console.log(`${rec.type}: ${rec.description}`);

});

```

### 🧪 Development & Testing Example

```typescript

// examples/development-testing-example.ts

import { MockLLMProvider } from "resk-caching";

const mockProvider = new MockLLMProvider();

// OpenAI-compatible API for development

const response = await mockProvider.createChatCompletion({

  model: "gpt-3.5-turbo",

  messages: [{ role: "user", content: "Hello!" }]

});

// Run automated test scenarios

const testResults = await mockProvider.runAllTestScenarios();

console.log(`Tests passed: ${testResults.filter(r => r.passed).length}`);

```

### 🌟 Complete Demo

```bash

# Run the comprehensive demo showcasing all four benefits

npm run example:demo

# Or run individual examples

npm run example:cost-tracking

npm run example:performance

npm run example:development

```

## OpenAPI and clients

- Fetch the spec: GET /api/openapi.json

- Use your preferred OpenAPI generator to produce clients/SDKs

## Observability

- Prometheus metrics at /api/metrics

- OpenTelemetry tracing via OTLP HTTP exporter (configure OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_SERVICE_NAME)

- Correlation-ID header propagated for easier debugging

## Security model (summary)

- Secrets only on the server (env/secret manager). No secrets in frontend

- TLS transport; JWT short-lived; per-user/IP rate-limit

- Optional AES-GCM encryption at rest for persisted cache entries

- Structured logs with correlation-id; metrics and traces for forensics

## License

Apache-2.0 — see LICENSE

## Vector Database Integration

### Overview

Resk-Caching supports multiple vector database backends for similarity search and semantic caching. The system can ingest documents, compute embeddings, and store them in vector databases for efficient retrieval.

### Supported Vector Databases

- **Chroma**: Local or hosted ChromaDB instances

- **Pinecone**: Managed vector database service

- **Weaviate**: Open-source vector database

- **Milvus**: High-performance vector database

- **Custom adapters**: Extend for your specific needs

### Environment Configuration

```bash

# Vector Database Type

export VECTORDB_TYPE=pinecone  # or chroma, weaviate, milvus

# Embedding Provider

export EMBEDDING_PROVIDER=openai  # or huggingface, sentence-transformers

export EMBEDDING_MODEL=text-embedding-ada-002  # OpenAI model name

export OPENAI_API_KEY=your_openai_key_here

# Pinecone Configuration

export PINECONE_API_KEY=your_pinecone_key

export PINECONE_INDEX_HOST=https://your-index.pinecone.io

export PINECONE_INDEX_NAME=your-index-name

# Chroma Configuration

export CHROMA_HOST=localhost

export CHROMA_PORT=8000

export CHROMA_COLLECTION_NAME=documents

# Weaviate Configuration

export WEAVIATE_URL=http://localhost:8080

export WEAVIATE_API_KEY=your_weaviate_key

export WEAVIATE_CLASS_NAME=Document

# Milvus Configuration

export MILVUS_HOST=localhost

export MILVUS_PORT=19530

export MILVUS_COLLECTION_NAME=documents

# Batch Processing

export BATCH_SIZE=100  # Documents per batch for embeddings

export UPSERT_BATCH=50  # Documents per batch for vector DB

```

### Ingestion Script

Use the provided ingestion script to batch process documents:

```bash

# Run ingestion

bun run scripts/ingest-example.ts

```

The script will:

1. Read documents from your source

2. Compute embeddings in batches

3. Store vectors in the configured database

4. Handle retries and error recovery

### Example Ingestion Code

```typescript

import { createVectorDBAdapter } from 'resk-caching';

import { createEmbeddingProvider } from 'resk-caching';

async function ingestDocuments(documents: Document[]) {

  const vectorDB = createVectorDBAdapter();

  const embeddings = createEmbeddingProvider();

  

  // Process in batches

  for (let i = 0; i < documents.length; i += BATCH_SIZE) {

    const batch = documents.slice(i, i + BATCH_SIZE);

    

    // Compute embeddings

    const vectors = await embeddings.embedBatch(

      batch.map(doc => doc.content)

    );

    

    // Prepare for storage

    const vectorsWithMetadata = batch.map((doc, idx) => ({

      id: doc.id,

      vector: vectors[idx],

      metadata: {

        title: doc.title,

        source: doc.source,

        timestamp: doc.timestamp

      }

    }));

    

    // Store in vector database

    await vectorDB.upsertBatch(vectorsWithMetadata);

  }

}

```

### Vector Search

```typescript

import { createVectorDBAdapter } from 'resk-caching';

async function searchSimilar(query: string, k: number = 5) {

  const vectorDB = createVectorDBAdapter();

  const embeddings = createEmbeddingProvider();

  

  // Get query embedding

  const queryVector = await embeddings.embed(query);

  

  // Search for similar vectors

  const results = await vectorDB.search(queryVector, {

    k,

    threshold: 0.7,  // Similarity threshold

    filters: {

      source: 'knowledge_base',

      timestamp: { $gte: '2024-01-01' }

    }

  });

  

  return results;

}

```

### Performance Considerations

- **Batch sizes**: Larger batches (100-500) for embeddings, smaller (50-100) for vector DB operations

- **Parallel processing**: Use worker threads for CPU-intensive embedding computation

- **Caching**: Cache frequently accessed embeddings and search results

- **Indexing**: Ensure proper vector database indexes are created for your use case

### Monitoring and Metrics

The system provides metrics for:

- Embedding computation latency and throughput

- Vector database operation success rates

- Search query performance

- Cache hit rates for vector operations

Access metrics at `/api/metrics` endpoint.

## Next steps

- Docker image and multi-stage build for slim runtimes

- LangChain integration helper (middleware to consult cache before LLM calls)

- LlamaIndex and Vercel AI SDK adapters

- Pluggable vector stores (Qdrant, Weaviate, Pinecone) with adapters

- Background refresh policies and stale-while-revalidate

- Eviction strategies (LRU/LFU) and cache warming CLI

- Upstash Redis & Redis Cloud deployment templates

- Benchmarks and load-test recipes (k6/Artillery)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/resk-security/resk-caching

Awesome Lists containing this project

README