An open API service indexing awesome lists of open source software.

https://github.com/rawcontext/reflex

Episodic memory and semantic cache proxy for LLM APIs with ~40% token savings
https://github.com/rawcontext/reflex

agent-orchestration ai-agents context-graph developer-tools knowledge-graph llm-proxy semantic-cache token-optimization

Last synced: 4 months ago
JSON representation

Episodic memory and semantic cache proxy for LLM APIs with ~40% token savings

Awesome Lists containing this project

README

          

> **Under Construction**: This project is actively being developed and is not yet ready for production use. APIs and features may change without notice.


Episodic Memory & Semantic Cache for LLM Responses


Build Status
License
Crates.io
Rust 2024


Because nobody likes paying for the same token twice.



██████╗ ███████╗███████╗██╗ ███████╗██╗ ██╗
██╔══██╗██╔════╝██╔════╝██║ ██╔════╝╚██╗██╔╝
██████╔╝█████╗ █████╗ ██║ █████╗ ╚███╔╝
██╔══██╗██╔══╝ ██╔══╝ ██║ ██╔══╝ ██╔██╗
██║ ██║███████╗██║ ███████╗███████╗██╔╝ ██╗
╚═╝ ╚═╝╚══════╝╚═╝ ╚══════╝╚══════╝╚═╝ ╚═╝

---

## What It Is

Reflex is an **OpenAI-compatible HTTP cache** for LLM responses: it sits between your agent/app and the provider, returning cached answers instantly and storing misses for later reuse. Cached responses are returned in [Tauq](https://github.com/epistates/tauq) format to reduce token overhead.

---

## Quick Start (Server)

```bash
# 1. Start Qdrant (vector database)
docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant

# 2. Run Reflex (HTTP server)
cargo run -p reflex-server --release

# 3. Point your agent to localhost:8080
export OPENAI_BASE_URL=http://localhost:8080/v1
```

---

## Quick Start (Library)

```bash
# Run the library example (no HTTP server)
cargo run -p reflex-cache --example basic_lookup --features mock
```

Embed in your own app:

```toml
[dependencies]
reflex = { package = "reflex-cache", version = "x.x.x" }
```

---

## Crates In This Repo

- **Server + binary (`reflex`)**: [crates/reflex-server](crates/reflex-server/README.md)
- **Core library (embedded use)**: [crates/reflex-cache](crates/reflex-cache/README.md) (docs.rs: https://docs.rs/reflex-cache)

---

## How It Works (High Level)

```
Request → L1 (exact) → L2 (semantic) → L3 (rerank/verify) → Provider
```

- **L1**: exact match (fast, in-memory)
- **L2**: semantic retrieval (Qdrant vector search)
- **L3**: verification (cross-encoder rerank to avoid false positives)

---

## Development

```bash
cargo test
cargo clippy --all-targets -- -D warnings
cargo fmt -- --check
```

---


Reflex: Stop paying for the same token twice.


Built with Rust, Qdrant, and a healthy disdain for redundant API calls.