https://github.com/sensoris/semcache
Semantic caching layer for your LLM applications. Reuse responses and reduce token usage.
https://github.com/sensoris/semcache
anthropic gemini genai llm openai semantic-cache
Last synced: about 1 month ago
JSON representation
Semantic caching layer for your LLM applications. Reuse responses and reduce token usage.
- Host: GitHub
- URL: https://github.com/sensoris/semcache
- Owner: sensoris
- License: mit
- Created: 2025-05-09T12:52:10.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-06-18T17:40:45.000Z (4 months ago)
- Last Synced: 2025-06-18T18:27:23.105Z (4 months ago)
- Topics: anthropic, gemini, genai, llm, openai, semantic-cache
- Language: Rust
- Homepage: https://semcache.io
- Size: 387 KB
- Stars: 67
- Watchers: 2
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ⚡ semcache
`semcache` is a semantic caching layer for your LLM applications.
- [semcache website](https://semcache.io)
- [semcache docs](https://docs.semcache.io)## Quick Start
Start the Semcache Docker image:
```bash
docker run -p 8080:8080 semcache/semcache:latest
```Configure your application e.g with the OpenAI Python SDK:
```python
from openai import OpenAI# Point to your Semcache host instead of OpenAI
client = OpenAI(base_url="http://localhost:8080", api_key="your-key")# Cache miss - continues to OpenAI
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is the capital of France?"}]
)# Cache hit - returns instantly
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell me France's capital city"}]
)
```Node.js follows a similar pattern of changing the base URL to point to your Semcache host:
```js
const OpenAI = require('openai');// Point to your Semcache host instead of OpenAI
const openai = new OpenAI({baseURL: 'http://localhost:8080', apiKey: 'your-key'});
```## Features
- **🧠 Completely in-memory** - Prompts, responses and the vector database are stored in-memory
- **🎯 Flexible by design** - Can work with your custom or private LLM APIs
- **🔌 Support for major LLM APIs** - OpenAI, Anthropic, Gemini, and more
- **⚡ HTTP proxy mode** - Drop-in replacement that reduces costs and latency
- **📈 Prometheus metrics** - Full observability out of the box
- **📊 Build-in dashboard** - Monitor cache performance at `/admin`
- **📤 Smart eviction** - LRU cache eviction policy**Semcache is still in beta and being actively developed.**
## How it works
Semcache accelerates LLM applications by caching responses based on semantic similarity.
When you make a request Semcache first searches for previously cached answers to similar prompts and delivers them immediately. This eliminates redundant API calls, reducing both latency and costs.
Semcache also operates in a "cache-aside" mode, allowing you to load prompts and responses yourself.
## Example Integrations
For comprehensive provider configuration and detailed code examples, visit our [LLM Providers & Tools documentation](https://docs.semcache.io/docs/llm-providers-tools).
### HTTP Proxy
Point your existing SDK to Semcache instead of the provider's endpoint.
**OpenAI**
```python
from openai import OpenAIclient = OpenAI(base_url="http://localhost:8080", api_key="your-key")
```**Anthropic**
```python
import anthropicclient = anthropic.Anthropic(
base_url="http://localhost:8080", # Semcache endpoint
api_key="your-key"
)
```**LangChain**
```python
from langchain.llms import OpenAIllm = OpenAI(
openai_api_base="http://localhost:8080",
openai_api_key="your-key"
)
```**LiteLLM**
```python
import litellmlitellm.api_base = "http://localhost:8080"
```### Cache-aside
Install with:
```bash
pip install semcache
``````python
from semcache import Semcache# Initialize the client
client = Semcache(base_url="http://localhost:8080")# Store a key-data pair
client.put("What is the capital of France?", "Paris")# Retrieve data by semantic similarity
response = client.get("Tell me France's capital city.")
print(response) # "Paris"
```or in Node.js
Install with
```bash
npm install semcache
```
Use the sdk in your service```javascript
const SemcacheClient = require('semcache');const client = new SemcacheClient('http://localhost:8080');
(async () => {
await client.put('What is the capital of France?', 'Paris');const result = await client.get('What is the capital of France?');
console.log(result); // => 'Paris'
})();
```## Configuration
Configure via environment variables or `config.yaml`:
```yaml
log_level: info
port: 8080
```Environment variables (prefix with `SEMCACHE_`):
```bash
SEMCACHE_PORT=8080
SEMCACHE_LOG_LEVEL=debug
```## Monitoring
### Prometheus Metrics
Semcache emits comprehensive Prometheus metrics for production monitoring.
Check out our `/monitoring` directory for our custom Grafana dashboard.
### Built-in Dashboard
Access the admin dashboard at `/admin` to monitor cache performance.
## Enterprise
Our managed version of Semcache provides you with semantic caching as a service.
Features we offer:
- **Custom text embedding models** for your specific business
- **Persistent storage** allowing you to build application memory over time
- **In-depth analysis** of your LLM responses
- **SLA support** and dedicated engineering resourcesContact us at [contact@semcache.io](mailto:contact@semcache.io)
## Contributing
Interested in contributing? Contributions to Semcache are welcome! Feel free to make a PR.
---
Built with ❤️ in Rust • [Documentation](https://docs.semcache.io) • [GitHub Issues](https://github.com/sensoris/semcache/issues)