{"id":30453860,"url":"https://github.com/resk-security/resk-caching","last_synced_at":"2026-01-20T16:54:42.439Z","repository":{"id":309753704,"uuid":"1037447415","full_name":"Resk-Security/resk-caching","owner":"Resk-Security","description":"Resk-Caching is a Bun-based backend library and server designed for secure caching, embeddings orchestration, and vector database access. It prioritizes security (keeping secrets out of the frontend), high performance, and deep observability.","archived":false,"fork":false,"pushed_at":"2025-08-13T16:17:49.000Z","size":48,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-13T17:37:27.996Z","etag":null,"topics":["ai","bun","caching","chatbot","llm","redis"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Resk-Security.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-13T15:29:59.000Z","updated_at":"2025-08-13T16:17:53.000Z","dependencies_parsed_at":"2025-08-13T17:51:23.715Z","dependency_job_id":null,"html_url":"https://github.com/Resk-Security/resk-caching","commit_stats":null,"previous_names":["resk-security/resk-caching"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/Resk-Security/resk-caching","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Resk-Security%2Fresk-caching","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Resk-Security%2Fresk-caching/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Resk-Security%2Fresk-caching/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Resk-Security%2Fresk-caching/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Resk-Security","download_url":"https://codeload.github.com/Resk-Security/resk-caching/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Resk-Security%2Fresk-caching/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271755406,"owners_count":24815398,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-23T02:00:09.327Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","bun","caching","chatbot","llm","redis"],"created_at":"2025-08-23T16:01:24.789Z","updated_at":"2026-01-20T16:54:42.432Z","avatar_url":"https://github.com/Resk-Security.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![NPM version](https://img.shields.io/npm/v/resk-caching.svg)](https://www.npmjs.com/package/resk-caching)\n[![NPM License](https://img.shields.io/npm/l/resk-caching.svg)](https://github.com/Resk-Security/resk-caching/blob/main/LICENSE)\n[![NPM Downloads](https://img.shields.io/npm/dt/resk-caching.svg)](https://www.npmjs.com/package/resk-caching)\n[![GitHub issues](https://img.shields.io/github/issues/Resk-Security/resk-caching.svg)](https://github.com/Resk-Security/resk-caching/issues)\n[![GitHub stars](https://img.shields.io/github/stars/Resk-Security/resk-caching.svg)](https://github.com/Resk-Security/resk-caching/stargazers)\n[![GitHub last commit](https://img.shields.io/github/last-commit/Resk-Security/resk-caching.svg)](https://github.com/Resk-Security/resk-caching/commits/main)\n[![TypeScript](https://img.shields.io/badge/TypeScript-^5.4.5-blue.svg)](https://www.typescriptlang.org/)\n[![LLM Security](https://img.shields.io/badge/LLM-Security-red)](https://github.com/Resk-Security/resk-caching)\n\n\n## Full documentation\n\nWe provide a full documentation site (MkDocs). See `docs/` and the published site: [Resk-Caching Docs](https://resk-caching.readthedocs.io/en/latest/).\n\n\n## Resk-Caching — LLM Response Caching with Vector Database Integration\n\nResk-Caching is a Bun-based backend library/server designed to **cache Large Language Model (LLM) responses using vector databases**, significantly reducing API costs while maintaining response quality and relevance.\n\n### 🎯 **Four Key GPTCache-Style Benefits**\n\nResk-Caching delivers the complete value proposition of intelligent LLM caching with four core benefits that transform how you build and scale AI applications:\n\n#### 💰 **1. Massive Cost Reduction**\n- **Up to 90% reduction** in LLM API costs through intelligent semantic caching\n- **Real-time cost tracking** with provider-specific pricing (OpenAI, Anthropic, Google, etc.)\n- **ROI analysis** showing exact savings from cache hits vs API calls\n- **Cost breakdown** by provider, model, and time period\n- **Automatic savings calculation** for every cached response\n\n#### 🚀 **2. Performance Optimization**\n- **Sub-5ms response times** for cached queries vs 500ms+ for API calls\n- **Intelligent cache warming** strategies (popular, recent, predictive)\n- **Real-time performance monitoring** with benchmarking and optimization recommendations\n- **Slow query detection** with automated performance suggestions\n- **Cache hit rate optimization** through advanced similarity algorithms\n\n#### 🧪 **3. Development \u0026 Testing Environment**\n- **OpenAI-compatible API** for offline development without API costs\n- **Mock LLM provider** with customizable responses and scenarios\n- **Automated testing scenarios** with validation and metrics\n- **Zero-cost development workflows** with realistic API simulation\n- **Circuit breaker patterns** for resilient application development\n\n#### 🛡️ **4. Scalability \u0026 Availability**\n- **Enhanced rate limiting bypass** with cache-first approach reducing API pressure\n- **Circuit breaker patterns** with automatic failover and recovery\n- **Health monitoring** and real-time system status\n- **Automatic scaling** with proactive cache warming for traffic spikes\n- **Graceful degradation** when external services fail\n\n### 🔍 **How It Works**\n\n1. **Pre-populated Response Database**: You maintain a database of high-quality LLM responses to common queries, stored as vector embeddings\n2. **Semantic Matching**: When a new query arrives, the system finds the most semantically similar cached response\n3. **Cost Savings**: Returns cached responses instead of making new API calls\n4. **Response Selection**: Advanced algorithms allow you to choose specific responses based on business logic, user preferences, or A/B testing strategies\n\n### 🚀 **Key Benefits**\n\n✅ **All Four GPTCache-Style Benefits Implemented:**\n- **💰 Massive Cost Reduction**: Up to 90% savings with real-time ROI tracking\n- **🚀 Performance Optimization**: Sub-5ms responses with intelligent cache warming\n- **🧪 Development Environment**: OpenAI-compatible API for offline testing\n- **🛡️ Scalability \u0026 Availability**: Circuit breakers and automatic failover\n\n\n\n### Features\n- **LLM Response Caching**: Store and retrieve LLM responses using vector similarity matching\n- **Multiple Cache Backends**: in-memory, SQLite (local persistence), Redis (multi-instance)\n- **Advanced Response Selection**: Deterministic, weighted, and randomized response selection algorithms\n- **Vector Database Integration**: Optimized for semantic search and similarity matching\n- **AES-GCM Encryption**: Secure cache-at-rest protection (optional via env key)\n- **JWT-Protected API**: Secure access with rate limiting and abuse prevention\n- **OpenAPI 3.1**: Auto-generated API documentation from Zod schemas\n- **Performance Monitoring**: Prometheus metrics and OpenTelemetry tracing\n- **Real-time Updates**: WebSockets for instant response distribution\n\n### How we're different from other semantic caches\n\n- **GPTCache**: Great Python-first cache. Resk-Caching focuses on Bun/TypeScript, ships with JWT-secured HTTP API, OpenAPI generation, built-in Prometheus/OTEL, and optional authenticated-at-rest encryption out of the box.\n- **ModelCache**: Provides a semantic cache layer. Resk-Caching adds production concerns (rate-limit, security wrapper, metrics, tracing, OpenAPI, WebSockets) and pluggable backends with zero-code switching via `CACHE_BACKEND`.\n- **Upstash Semantic Cache**: Managed vector-backed cache. Resk-Caching is open-source, self-hosted by default, and can run fully local with SQLite or purely in-memory while retaining encryption and observability.\n- **Redis LangCache**: Managed Redis-based semantic cache. Resk-Caching supports Redis natively via Bun's RESP3 client while also offering SQLite and in-memory modes for portability and offline development.\n- **SemantiCache (FAISS)**: FAISS-native library. Resk-Caching prioritizes a secure, observable HTTP surface with variant selection strategies and can integrate external vector DBs; no GPU dependency required.\n\nIf you need a secure, auditable cache service with operational tooling for teams, Resk-Caching is purpose-built for that surface.\n\n### What each module is for\n- **LLM Response Storage**: Store pre-computed LLM responses with their vector embeddings for fast retrieval\n- **Caching Backends**: Choose between low-latency memory, local persistence (SQLite), or distributed (Redis) based on your scale\n- **Response Selection Algorithms**: Implement deterministic, weighted, or randomized response selection based on business logic\n- **Vector Similarity Matching**: Find the most semantically similar cached response to incoming queries\n- **AES-GCM Encryption**: Protect sensitive LLM responses at rest with authenticated encryption\n- **JWT + Rate Limiting**: Secure API access and prevent abuse while maintaining performance\n- **Zod + OpenAPI**: Ensure data validation and provide always-in-sync API documentation\n- **Performance Monitoring**: Track cache hit rates, response times, and cost savings in real-time\n- **Real-time Distribution**: Instantly distribute responses across multiple instances and clients\n\n## Prerequisites\n\n### Vector Database Setup\nBefore using Resk-Caching, you need to have a **vector database** ready with pre-computed LLM responses. This is the foundation of the caching system:\n\n1. **Response Database**: Create a collection of high-quality LLM responses to common queries\n2. **Vector Embeddings**: Generate vector embeddings for each response using your preferred embedding model\n3. **Metadata Storage**: Store additional context like response quality scores, categories, or business rules\n4. **Similarity Index**: Ensure your vector database has proper indexing for fast similarity search\n\n**Recommended Vector Databases:**\n- **Pinecone**: Excellent for production use with high performance\n- **Weaviate**: Open-source with great similarity search capabilities\n- **Qdrant**: Fast and efficient for real-time applications\n- **Chroma**: Simple local development and testing\n\n## Install\n```bash\n# as a library (npm)\nnpm install resk-caching\n\n# as a library (bun)\nbun add resk-caching\n```\n\n## Quick Start\n\n### Server Setup\n\n```bash\n# Install dependencies\nbun install\n\n# Start the server\nbun run dev\n\n# The server will be available at http://localhost:3000\n```\n\n### Step-by-step setup\n\n1. Choose your key-value cache backend:\n   - `CACHE_BACKEND=memory` for local/dev\n   - `CACHE_BACKEND=sqlite` for single-node durability\n   - `CACHE_BACKEND=redis` for distributed/multi-instance\n2. Choose your vector search strategy for semantic features:\n   - Default: in-memory vector store (process-local)\n   - Production: external vector DB (Pinecone/Qdrant/Weaviate/Chroma)\n   - Alternative: Redis RediSearch vectors or SQLite vector extensions\n3. Ingest responses and embeddings (see Ingestion or `scripts/ingest-example.ts`).\n4. Call `/api/semantic/store` and `/api/semantic/search`.\n\nBy default, semantic embeddings live in memory. To power vector search with Redis or SQLite, see the guides below.\n\n### Vector search with Redis (RediSearch)\n\nUse Redis Stack with RediSearch for vector similarity.\n\nExample index and KNN search (1536-dim float32 cosine):\n```bash\n# Create index\nredis-cli FT.CREATE idx:llm ON HASH PREFIX 1 llm: SCHEMA \\\n  query TEXT \\\n  embedding VECTOR HNSW 6 TYPE FLOAT32 DIM 1536 DISTANCE_METRIC COSINE \\\n  category TAG SORTABLE \\\n  metadata TEXT\n\n# Insert (embedding must be raw float32 bytes)\nredis-cli HSET llm:thank-you query \"thank you\" category \"gratitude\" \\\n  embedding \"$BINARY_FLOAT32\" metadata \"{\\\"tone\\\":\\\"friendly\\\"}\"\n\n# KNN search\nredis-cli FT.SEARCH idx:llm \"*=\u003e[KNN 5 @embedding $vec AS score]\" \\\n  PARAMS 2 vec \"$QUERY_EMBED_FLOAT32\" \\\n  SORTBY score DIALECT 2 RETURN 3 query category score\n```\n\nNotes:\n- Convert `number[]` → `Float32Array` → bytes for `embedding` field.\n- Keep response variants in a secondary key (e.g., `llm:\u003cid\u003e:responses`) and run variant selection after KNN.\n\n### Vector search with SQLite (sqlite-vss/sqlite-vec)\n\nShip SQLite with a vector extension, then create a VSS table and join with metadata:\n\n```sql\nCREATE VIRTUAL TABLE vss_entries USING vss0(\n  id TEXT PRIMARY KEY,\n  embedding(1536)\n);\n\nCREATE TABLE llm_entries (\n  id TEXT PRIMARY KEY,\n  query TEXT NOT NULL,\n  category TEXT,\n  metadata TEXT\n);\n```\n\nInsert and search:\n```sql\n-- insert: embedding blob is Float32 (vss_f32)\nINSERT INTO vss_entries(id, embedding) VALUES (?, vss_f32(?));\nINSERT INTO llm_entries(id, query, category, metadata) VALUES(?, ?, ?, ?);\n\n-- KNN\nSELECT e.id, l.query, vss_distance(e.embedding, vss_f32(?)) AS score\nFROM vss_entries e\nJOIN llm_entries l ON l.id = e.id\nORDER BY score ASC\nLIMIT 5;\n```\n\nNotes:\n- Convert `number[]` to Float32 blob for inserts and query embedding.\n- Join back to your stored responses via `id` or `query`, then apply variant selection.\n\n### Basic Usage Examples\n\n#### 1. Store LLM Responses with Vector Embeddings\n\n```bash\n# Store multiple \"thank you\" responses with different tones\ncurl -X POST http://localhost:3000/api/semantic/store \\\n  -H \"Content-Type: application/json\" \\\n  -H \"Authorization: Bearer YOUR_API_KEY\" \\\n  -d '{\n    \"query\": \"thank you\",\n    \"query_embedding\": {\n      \"vector\": [0.1, 0.2, 0.3],\n      \"dimension\": 3\n    },\n    \"responses\": [\n      {\n        \"id\": \"resp1\",\n        \"text\": \"You're welcome!\",\n        \"metadata\": { \"tone\": \"friendly\", \"formality\": \"casual\" },\n        \"quality_score\": 0.95,\n        \"category\": \"gratitude\",\n        \"tags\": [\"polite\", \"casual\"]\n      },\n      {\n        \"id\": \"resp2\", \n        \"text\": \"My pleasure!\",\n        \"metadata\": { \"tone\": \"professional\", \"formality\": \"formal\" },\n        \"quality_score\": 0.92,\n        \"category\": \"gratitude\",\n        \"tags\": [\"polite\", \"professional\"]\n      },\n      {\n        \"id\": \"resp3\",\n        \"text\": \"No problem at all!\",\n        \"metadata\": { \"tone\": \"casual\", \"formality\": \"informal\" },\n        \"quality_score\": 0.88,\n        \"category\": \"gratitude\",\n        \"tags\": [\"casual\", \"friendly\"]\n      }\n    ],\n    \"variant_strategy\": \"weighted\",\n    \"weights\": [3, 2, 1],\n    \"seed\": \"user:123\"\n  }'\n```\n\n**Response:**\n```json\n{\n  \"success\": true,\n  \"message\": \"LLM responses stored successfully\",\n  \"entry_id\": \"thank you\",\n  \"responses_count\": 3\n}\n```\n\n#### 2. Semantic Search for Similar Queries\n\n```bash\n# Search for responses to \"merci\" (French thank you)\ncurl -X POST http://localhost:3000/api/semantic/search \\\n  -H \"Content-Type: application/json\" \\\n  -H \"Authorization: Bearer YOUR_API_KEY\" \\\n  -d '{\n    \"query\": \"merci\",\n    \"query_embedding\": {\n      \"vector\": [0.11, 0.19, 0.29],\n      \"dimension\": 3\n    },\n    \"limit\": 2,\n    \"similarity_threshold\": 0.8\n  }'\n```\n\n**Response:**\n```json\n{\n  \"success\": true,\n  \"search_result\": {\n    \"query\": \"merci\",\n    \"query_embedding\": {\n      \"vector\": [0.11, 0.19, 0.29],\n      \"dimension\": 3\n    },\n    \"matches\": [\n      {\n        \"entry\": {\n          \"query\": \"thank you\",\n          \"responses\": [...],\n          \"variant_strategy\": \"weighted\",\n          \"weights\": [3, 2, 1]\n        },\n        \"similarity_score\": 0.997,\n        \"selected_response\": {\n          \"id\": \"resp1\",\n          \"text\": \"You're welcome!\",\n          \"metadata\": { \"tone\": \"friendly\" }\n        }\n      }\n    ],\n    \"total_matches\": 1,\n    \"search_time_ms\": 2\n  }\n}\n```\n\n#### 3. Get All Responses for a Query\n\n```bash\n# Retrieve all stored responses for \"thank you\"\ncurl -X GET \"http://localhost:3000/api/semantic/responses?query=thank%20you\" \\\n  -H \"Authorization: Bearer YOUR_API_KEY\"\n```\n\n**Response:**\n```json\n{\n  \"success\": true,\n  \"entry\": {\n    \"query\": \"thank you\",\n    \"query_embedding\": {\n      \"vector\": [0.1, 0.2, 0.3],\n      \"dimension\": 3\n    },\n    \"responses\": [\n      {\n        \"id\": \"resp1\",\n        \"text\": \"You're welcome!\",\n        \"metadata\": { \"tone\": \"friendly\" }\n      },\n      {\n        \"id\": \"resp2\",\n        \"text\": \"My pleasure!\",\n        \"metadata\": { \"tone\": \"professional\" }\n      },\n      {\n        \"id\": \"resp3\",\n        \"text\": \"No problem at all!\",\n        \"metadata\": { \"tone\": \"casual\" }\n      }\n    ],\n    \"variant_strategy\": \"weighted\",\n    \"weights\": [3, 2, 1],\n    \"created_at\": \"2024-01-15T10:30:00.000Z\",\n    \"last_accessed\": \"2024-01-15T10:35:00.000Z\"\n  }\n}\n```\n\n#### 4. Get Cache Statistics\n\n```bash\n# View cache performance metrics\ncurl -X GET http://localhost:3000/api/semantic/stats \\\n  -H \"Authorization: Bearer YOUR_API_KEY\"\n```\n\n**Response:**\n```json\n{\n  \"success\": true,\n  \"cache_type\": \"InMemoryVectorCache\",\n  \"message\": \"Stats endpoint - implementation needed\"\n}\n```\n\n### Advanced Usage Examples\n\n#### Store Multiple Query Types\n\n```bash\n# Store responses for different types of greetings\ncurl -X POST http://localhost:3000/api/semantic/store \\\n  -H \"Content-Type: application/json\" \\\n  -H \"Authorization: Bearer YOUR_API_KEY\" \\\n  -d '{\n    \"query\": \"hello\",\n    \"query_embedding\": {\n      \"vector\": [0.9, 0.8, 0.7],\n      \"dimension\": 3\n    },\n    \"responses\": [\n      {\n        \"id\": \"hello1\",\n        \"text\": \"Hi there!\",\n        \"metadata\": { \"tone\": \"friendly\", \"time_of_day\": \"any\" }\n      },\n      {\n        \"id\": \"hello2\",\n        \"text\": \"Hello! How are you?\",\n        \"metadata\": { \"tone\": \"polite\", \"time_of_day\": \"morning\" }\n      }\n    ],\n    \"variant_strategy\": \"round-robin\"\n  }'\n```\n\n#### Search with Different Similarity Thresholds\n\n```bash\n# Strict similarity matching (only very similar queries)\ncurl -X POST http://localhost:3000/api/semantic/search \\\n  -H \"Content-Type: application/json\" \\\n  -H \"Authorization: Bearer YOUR_API_KEY\" \\\n  -d '{\n    \"query\": \"thanks a lot\",\n    \"query_embedding\": {\n      \"vector\": [0.15, 0.25, 0.35],\n      \"dimension\": 3\n    },\n    \"limit\": 1,\n    \"similarity_threshold\": 0.95\n  }'\n```\n\n### Metrics and Monitoring\n\nThe system automatically tracks comprehensive metrics for all semantic operations:\n\n- **Semantic Searches**: Total count, duration, and success rates\n- **Vector Similarity**: Distribution of similarity scores\n- **Response Storage**: Count of stored LLM responses by strategy\n- **Cache Performance**: Entry counts and access patterns\n- **Response Selection**: Variant strategy usage and performance\n\nAccess metrics at `/api/metrics` endpoint (Prometheus format).\n\n### Performance Characteristics\n\n- **Search Speed**: Typical semantic searches complete in \u003c5ms\n- **Memory Usage**: Efficient in-memory storage with configurable TTL\n- **Scalability**: Designed for thousands of cached responses\n- **Accuracy**: High-precision vector similarity using cosine distance\n\n### Best Practices\n\n1. **Vector Dimensions**: Use consistent embedding dimensions across your system\n2. **Similarity Thresholds**: Start with 0.7-0.8 for production use\n3. **Response Variety**: Store 3-5 responses per query for good variant selection\n4. **Metadata**: Include rich metadata for better response selection\n5. **TTL Management**: Set appropriate expiration times for dynamic content\n\n## Environment variables\n- PORT (default 3000)\n- JWT_SECRET\n- CACHE_BACKEND = memory | sqlite | redis\n- REDIS_URL (for Redis backend)\n- CACHE_ENCRYPTION_KEY (base64, 32 bytes)\n- RATE_LIMIT_WINDOW_MS (default 900000)\n- RATE_LIMIT_MAX (default 1000)\n- OTEL_EXPORTER_OTLP_ENDPOINT (traces), OTEL_SERVICE_NAME\n\n### Cache backends explained\n\n- **In-memory** (`CACHE_BACKEND=memory`):\n  - Fastest single-process store (Map-based), ideal for development and ephemeral caches\n  - Per-key TTL stored alongside values; expired entries are lazily evicted on access\n  - No cross-process sharing and no durability\n\n- **SQLite** (`CACHE_BACKEND=sqlite`):\n  - Local durability using Bun's SQLite; table `kv(key TEXT PRIMARY KEY, value TEXT, expiresAt INTEGER)`\n  - Upsert semantics on `set`, TTL computed client-side and stored in `expiresAt`\n  - Expired rows are pruned lazily on `get`; `clear()` wipes the table\n  - File path defaults to `resk-cache.sqlite`\n\n- **Redis** (`CACHE_BACKEND=redis`, `REDIS_URL=...`):\n  - Distributed, multi-instance cache using Bun's native RESP3 client\n  - Values are JSON-serialized with optional TTL via `EXPIRE`\n  - Prefix isolation via `rc:`; `clear()` scans and deletes only `rc:*` keys\n  - Helpers for experiments (round-robin counters, sets/lists for variants, optional pub/sub)\n\n## 🔗 API Endpoints - Complete Reference\n\n### Core Cache Endpoints\n- `GET /health` - Health check endpoint\n- `POST /api/cache` (JWT) - Store simple key-value pairs\n- `POST /api/cache/query` (JWT) - Retrieve cached values\n- `DELETE /api/cache` (JWT) - Clear all cache\n- `GET /api/openapi.json` - OpenAPI 3.1 specification from Zod schemas\n- `GET /api/metrics` - Prometheus metrics exposition\n\n### 💰 Cost Tracking Endpoints (NEW!)\n- `POST /api/cost/record` (JWT) - Record LLM API cost for a request\n- `GET /api/cost/analysis` (JWT) - Get comprehensive cost analysis and ROI\n- `GET /api/cost/breakdown` (JWT) - Cost breakdown by provider and model\n- `GET /api/cost/recent` (JWT) - Get recent cost entries\n- `POST /api/cost/pricing` (JWT) - Add custom pricing for provider/model\n- `GET /api/cost/pricing` (JWT) - Get all configured pricing\n\n### 🚀 Performance Optimization Endpoints (NEW!)\n- `POST /api/performance/record` (JWT) - Record performance metrics\n- `GET /api/performance/benchmarks` (JWT) - Get performance benchmarks\n- `GET /api/performance/slow-queries` (JWT) - Detect slow queries\n- `GET /api/performance/recommendations` (JWT) - Get optimization recommendations\n- `POST /api/performance/warming/start` (JWT) - Start cache warming strategy\n- `GET /api/performance/warming/progress` (JWT) - Get cache warming progress\n- `GET /api/performance/metrics` (JWT) - Get recent performance metrics\n\n### 🧪 Development \u0026 Testing Endpoints (NEW!)\n- `POST /api/testing/chat/completions` (JWT) - OpenAI-compatible chat completions\n- `POST /api/testing/mock/responses` (JWT) - Add custom mock responses\n- `GET /api/testing/mock/responses` (JWT) - Get all mock responses\n- `POST /api/testing/scenarios` (JWT) - Add test scenarios\n- `GET /api/testing/scenarios` (JWT) - Get all test scenarios\n- `POST /api/testing/scenarios/run` (JWT) - Run specific test scenario\n- `POST /api/testing/scenarios/run-all` (JWT) - Run all test scenarios\n- `GET /api/testing/history` (JWT) - Get request history\n- `POST /api/testing/scenarios/defaults` (JWT) - Load default test scenarios\n- `GET /api/testing/health` (JWT) - Get system health status\n- `GET /api/testing/circuit-breakers` (JWT) - Get circuit breaker statistics\n\n### Semantic Search Endpoints\n- `POST /api/semantic/store` (JWT) - Store LLM responses with vector embeddings\n- `POST /api/semantic/search` (JWT) - Search for similar queries using semantic similarity\n- `GET /api/semantic/responses` (JWT) - Get all responses for a specific query\n- `GET /api/semantic/stats` (JWT) - Get cache statistics and performance metrics\n\n## Semantic Search \u0026 Response Selection\n\n### How It Works\n\n1. **Store Responses**: First, store your pre-computed LLM responses with their vector embeddings\n2. **User Query**: When a user sends a message (e.g., \"merci\", \"merci pour ta réponse\")\n3. **Vector Search**: The system finds semantically similar queries in your database\n4. **Response Selection**: Uses advanced algorithms to choose the most appropriate response\n5. **Return Result**: Sends back a varied, contextually relevant response\n\n### Example: Thank You Responses\n\nStore multiple responses for \"thank you\" queries:\n\n```json\n{\n  \"query\": \"thank you\",\n  \"query_embedding\": {\n    \"vector\": [0.1, 0.2, 0.3, 0.4, 0.5],\n    \"dimension\": 5\n  },\n  \"responses\": [\n    {\n      \"id\": \"thank_1\",\n      \"text\": \"You're welcome! I'm glad I could help.\",\n      \"metadata\": {\"tone\": \"friendly\", \"formality\": \"casual\"},\n      \"quality_score\": 0.9,\n      \"category\": \"gratitude\"\n    },\n    {\n      \"id\": \"thank_2\",\n      \"text\": \"My pleasure! Feel free to ask if you need anything else.\",\n      \"metadata\": {\"tone\": \"professional\", \"formality\": \"formal\"},\n      \"quality_score\": 0.85,\n      \"category\": \"gratitude\"\n    }\n  ],\n  \"variant_strategy\": \"weighted\",\n  \"weights\": [3, 2],\n  \"seed\": \"user:123\"\n}\n```\n\n### Response Selection Strategies\n\n- **random**: Uniform random selection for variety\n- **round-robin**: Cycles through responses systematically\n- **deterministic**: Stable selection based on seed (user ID, conversation ID)\n- **weighted**: Probability-based selection according to quality scores or preferences\n\n### Search for Similar Queries\n\nWhen a user sends \"merci pour ta réponse\", the system:\n\n1. Converts the message to a vector embedding\n2. Finds similar queries in the database (e.g., \"thank you\", \"thanks\", \"merci\")\n3. Selects the best match based on similarity score\n4. Applies the variant strategy to choose a response\n5. Returns the selected response with metadata\n\nThis approach ensures users get varied, contextually appropriate responses while maintaining the high quality of pre-approved LLM outputs.\n\n## Library usage (TypeScript)\n```ts\nimport { selectCache, globalCostTracker, globalPerformanceOptimizer } from \"resk-caching\";\n\n// Basic cache usage\nconst cache = selectCache();\nawait cache.set(\"key\", { payload: true }, 60);\nconst val = await cache.get(\"key\");\n\n// Cost tracking integration\nconst cacheResult = await cache.search(query);\nif (cacheResult) {\n  // Cache hit - record savings\n  globalCostTracker.recordCost({\n    provider: \"openai\",\n    model: \"gpt-4\", \n    inputTokens: 150,\n    outputTokens: 200,\n    cacheHit: true\n  });\n} else {\n  // Cache miss - record actual cost\n  const response = await llmApi.createCompletion(query);\n  globalCostTracker.recordCost({\n    provider: \"openai\",\n    model: \"gpt-4\",\n    inputTokens: response.usage.prompt_tokens,\n    outputTokens: response.usage.completion_tokens,\n    cacheHit: false\n  });\n}\n\n// Performance monitoring\nglobalPerformanceOptimizer.recordMetric({\n  operation: 'search',\n  duration: responseTime,\n  cacheHit: !!cacheResult,\n  backend: 'redis'\n});\n```\n\n## 📚 Comprehensive Examples\n\n### 💰 Cost Tracking Example\n```typescript\n// examples/cost-tracking-example.ts\nimport { CostTracker } from \"resk-caching\";\n\nconst tracker = new CostTracker();\n\n// Record API costs\ntracker.recordCost({\n  provider: \"openai\",\n  model: \"gpt-4\",\n  inputTokens: 150,\n  outputTokens: 300,\n  cacheHit: false\n});\n\n// Get ROI analysis\nconst analysis = tracker.getCostAnalysis(30); // 30 days\nconsole.log(`Total Savings: $${analysis.totalSavings}`);\nconsole.log(`ROI: ${analysis.roiPercentage}%`);\n```\n\n### 🚀 Performance Optimization Example\n```typescript\n// examples/performance-optimization-example.ts\nimport { PerformanceOptimizer } from \"resk-caching\";\n\nconst optimizer = new PerformanceOptimizer();\n\n// Start cache warming\nawait optimizer.startCacheWarming({\n  strategy: 'popular',\n  batchSize: 20,\n  maxEntries: 1000\n});\n\n// Get optimization recommendations\nconst recommendations = optimizer.getOptimizationRecommendations();\nrecommendations.forEach(rec =\u003e {\n  console.log(`${rec.type}: ${rec.description}`);\n});\n```\n\n### 🧪 Development \u0026 Testing Example\n```typescript\n// examples/development-testing-example.ts\nimport { MockLLMProvider } from \"resk-caching\";\n\nconst mockProvider = new MockLLMProvider();\n\n// OpenAI-compatible API for development\nconst response = await mockProvider.createChatCompletion({\n  model: \"gpt-3.5-turbo\",\n  messages: [{ role: \"user\", content: \"Hello!\" }]\n});\n\n// Run automated test scenarios\nconst testResults = await mockProvider.runAllTestScenarios();\nconsole.log(`Tests passed: ${testResults.filter(r =\u003e r.passed).length}`);\n```\n\n### 🌟 Complete Demo\n```bash\n# Run the comprehensive demo showcasing all four benefits\nnpm run example:demo\n\n# Or run individual examples\nnpm run example:cost-tracking\nnpm run example:performance\nnpm run example:development\n```\n\n## OpenAPI and clients\n- Fetch the spec: GET /api/openapi.json\n- Use your preferred OpenAPI generator to produce clients/SDKs\n\n## Observability\n- Prometheus metrics at /api/metrics\n- OpenTelemetry tracing via OTLP HTTP exporter (configure OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_SERVICE_NAME)\n- Correlation-ID header propagated for easier debugging\n\n## Security model (summary)\n- Secrets only on the server (env/secret manager). No secrets in frontend\n- TLS transport; JWT short-lived; per-user/IP rate-limit\n- Optional AES-GCM encryption at rest for persisted cache entries\n- Structured logs with correlation-id; metrics and traces for forensics\n\n## License\nApache-2.0 — see LICENSE\n\n## Vector Database Integration\n\n### Overview\nResk-Caching supports multiple vector database backends for similarity search and semantic caching. The system can ingest documents, compute embeddings, and store them in vector databases for efficient retrieval.\n\n### Supported Vector Databases\n- **Chroma**: Local or hosted ChromaDB instances\n- **Pinecone**: Managed vector database service\n- **Weaviate**: Open-source vector database\n- **Milvus**: High-performance vector database\n- **Custom adapters**: Extend for your specific needs\n\n### Environment Configuration\n```bash\n# Vector Database Type\nexport VECTORDB_TYPE=pinecone  # or chroma, weaviate, milvus\n\n# Embedding Provider\nexport EMBEDDING_PROVIDER=openai  # or huggingface, sentence-transformers\nexport EMBEDDING_MODEL=text-embedding-ada-002  # OpenAI model name\nexport OPENAI_API_KEY=your_openai_key_here\n\n# Pinecone Configuration\nexport PINECONE_API_KEY=your_pinecone_key\nexport PINECONE_INDEX_HOST=https://your-index.pinecone.io\nexport PINECONE_INDEX_NAME=your-index-name\n\n# Chroma Configuration\nexport CHROMA_HOST=localhost\nexport CHROMA_PORT=8000\nexport CHROMA_COLLECTION_NAME=documents\n\n# Weaviate Configuration\nexport WEAVIATE_URL=http://localhost:8080\nexport WEAVIATE_API_KEY=your_weaviate_key\nexport WEAVIATE_CLASS_NAME=Document\n\n# Milvus Configuration\nexport MILVUS_HOST=localhost\nexport MILVUS_PORT=19530\nexport MILVUS_COLLECTION_NAME=documents\n\n# Batch Processing\nexport BATCH_SIZE=100  # Documents per batch for embeddings\nexport UPSERT_BATCH=50  # Documents per batch for vector DB\n```\n\n### Ingestion Script\nUse the provided ingestion script to batch process documents:\n\n```bash\n# Run ingestion\nbun run scripts/ingest-example.ts\n```\n\nThe script will:\n1. Read documents from your source\n2. Compute embeddings in batches\n3. Store vectors in the configured database\n4. Handle retries and error recovery\n\n### Example Ingestion Code\n```typescript\nimport { createVectorDBAdapter } from 'resk-caching';\nimport { createEmbeddingProvider } from 'resk-caching';\n\nasync function ingestDocuments(documents: Document[]) {\n  const vectorDB = createVectorDBAdapter();\n  const embeddings = createEmbeddingProvider();\n  \n  // Process in batches\n  for (let i = 0; i \u003c documents.length; i += BATCH_SIZE) {\n    const batch = documents.slice(i, i + BATCH_SIZE);\n    \n    // Compute embeddings\n    const vectors = await embeddings.embedBatch(\n      batch.map(doc =\u003e doc.content)\n    );\n    \n    // Prepare for storage\n    const vectorsWithMetadata = batch.map((doc, idx) =\u003e ({\n      id: doc.id,\n      vector: vectors[idx],\n      metadata: {\n        title: doc.title,\n        source: doc.source,\n        timestamp: doc.timestamp\n      }\n    }));\n    \n    // Store in vector database\n    await vectorDB.upsertBatch(vectorsWithMetadata);\n  }\n}\n```\n\n### Vector Search\n```typescript\nimport { createVectorDBAdapter } from 'resk-caching';\n\nasync function searchSimilar(query: string, k: number = 5) {\n  const vectorDB = createVectorDBAdapter();\n  const embeddings = createEmbeddingProvider();\n  \n  // Get query embedding\n  const queryVector = await embeddings.embed(query);\n  \n  // Search for similar vectors\n  const results = await vectorDB.search(queryVector, {\n    k,\n    threshold: 0.7,  // Similarity threshold\n    filters: {\n      source: 'knowledge_base',\n      timestamp: { $gte: '2024-01-01' }\n    }\n  });\n  \n  return results;\n}\n```\n\n### Performance Considerations\n- **Batch sizes**: Larger batches (100-500) for embeddings, smaller (50-100) for vector DB operations\n- **Parallel processing**: Use worker threads for CPU-intensive embedding computation\n- **Caching**: Cache frequently accessed embeddings and search results\n- **Indexing**: Ensure proper vector database indexes are created for your use case\n\n### Monitoring and Metrics\nThe system provides metrics for:\n- Embedding computation latency and throughput\n- Vector database operation success rates\n- Search query performance\n- Cache hit rates for vector operations\n\nAccess metrics at `/api/metrics` endpoint.\n\n\n## Next steps\n\n- Docker image and multi-stage build for slim runtimes\n- LangChain integration helper (middleware to consult cache before LLM calls)\n- LlamaIndex and Vercel AI SDK adapters\n- Pluggable vector stores (Qdrant, Weaviate, Pinecone) with adapters\n- Background refresh policies and stale-while-revalidate\n- Eviction strategies (LRU/LFU) and cache warming CLI\n- Upstash Redis \u0026 Redis Cloud deployment templates\n- Benchmarks and load-test recipes (k6/Artillery)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fresk-security%2Fresk-caching","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fresk-security%2Fresk-caching","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fresk-security%2Fresk-caching/lists"}