https://github.com/rawcontext/reflex
Episodic memory and semantic cache proxy for LLM APIs with ~40% token savings
https://github.com/rawcontext/reflex
agent-orchestration ai-agents context-graph developer-tools knowledge-graph llm-proxy semantic-cache token-optimization
Last synced: 4 months ago
JSON representation
Episodic memory and semantic cache proxy for LLM APIs with ~40% token savings
- Host: GitHub
- URL: https://github.com/rawcontext/reflex
- Owner: rawcontext
- License: agpl-3.0
- Created: 2025-12-29T00:43:20.000Z (4 months ago)
- Default Branch: master
- Last Pushed: 2025-12-29T01:47:20.000Z (4 months ago)
- Last Synced: 2026-01-04T23:17:34.030Z (4 months ago)
- Topics: agent-orchestration, ai-agents, context-graph, developer-tools, knowledge-graph, llm-proxy, semantic-cache, token-optimization
- Language: Rust
- Homepage: https://rawcontext.com/projects/reflex/
- Size: 236 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
> **Under Construction**: This project is actively being developed and is not yet ready for production use. APIs and features may change without notice.
Episodic Memory & Semantic Cache for LLM Responses
Because nobody likes paying for the same token twice.
██████╗ ███████╗███████╗██╗ ███████╗██╗ ██╗
██╔══██╗██╔════╝██╔════╝██║ ██╔════╝╚██╗██╔╝
██████╔╝█████╗ █████╗ ██║ █████╗ ╚███╔╝
██╔══██╗██╔══╝ ██╔══╝ ██║ ██╔══╝ ██╔██╗
██║ ██║███████╗██║ ███████╗███████╗██╔╝ ██╗
╚═╝ ╚═╝╚══════╝╚═╝ ╚══════╝╚══════╝╚═╝ ╚═╝
---
## What It Is
Reflex is an **OpenAI-compatible HTTP cache** for LLM responses: it sits between your agent/app and the provider, returning cached answers instantly and storing misses for later reuse. Cached responses are returned in [Tauq](https://github.com/epistates/tauq) format to reduce token overhead.
---
## Quick Start (Server)
```bash
# 1. Start Qdrant (vector database)
docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant
# 2. Run Reflex (HTTP server)
cargo run -p reflex-server --release
# 3. Point your agent to localhost:8080
export OPENAI_BASE_URL=http://localhost:8080/v1
```
---
## Quick Start (Library)
```bash
# Run the library example (no HTTP server)
cargo run -p reflex-cache --example basic_lookup --features mock
```
Embed in your own app:
```toml
[dependencies]
reflex = { package = "reflex-cache", version = "x.x.x" }
```
---
## Crates In This Repo
- **Server + binary (`reflex`)**: [crates/reflex-server](crates/reflex-server/README.md)
- **Core library (embedded use)**: [crates/reflex-cache](crates/reflex-cache/README.md) (docs.rs: https://docs.rs/reflex-cache)
---
## How It Works (High Level)
```
Request → L1 (exact) → L2 (semantic) → L3 (rerank/verify) → Provider
```
- **L1**: exact match (fast, in-memory)
- **L2**: semantic retrieval (Qdrant vector search)
- **L3**: verification (cross-encoder rerank to avoid false positives)
---
## Development
```bash
cargo test
cargo clippy --all-targets -- -D warnings
cargo fmt -- --check
```
---
Reflex: Stop paying for the same token twice.
Built with Rust, Qdrant, and a healthy disdain for redundant API calls.