https://github.com/rawcontext/reflex

Episodic memory and semantic cache proxy for LLM APIs with ~40% token savings
https://github.com/rawcontext/reflex

agent-orchestration ai-agents context-graph developer-tools knowledge-graph llm-proxy semantic-cache token-optimization

Last synced: 4 months ago
JSON representation

Episodic memory and semantic cache proxy for LLM APIs with ~40% token savings

Host: GitHub
URL: https://github.com/rawcontext/reflex
Owner: rawcontext
License: agpl-3.0
Created: 2025-12-29T00:43:20.000Z (4 months ago)
Default Branch: master
Last Pushed: 2025-12-29T01:47:20.000Z (4 months ago)
Last Synced: 2026-01-04T23:17:34.030Z (4 months ago)
Topics: agent-orchestration, ai-agents, context-graph, developer-tools, knowledge-graph, llm-proxy, semantic-cache, token-optimization
Language: Rust
Homepage: https://rawcontext.com/projects/reflex/
Size: 236 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

> **Under Construction**: This project is actively being developed and is not yet ready for production use. APIs and features may change without notice.

Episodic Memory & Semantic Cache for LLM Responses

Because nobody likes paying for the same token twice.


██████╗ ███████╗███████╗██╗     ███████╗██╗  ██╗

██╔══██╗██╔════╝██╔════╝██║     ██╔════╝╚██╗██╔╝

██████╔╝█████╗  █████╗  ██║     █████╗   ╚███╔╝

██╔══██╗██╔══╝  ██╔══╝  ██║     ██╔══╝   ██╔██╗

██║  ██║███████╗██║     ███████╗███████╗██╔╝ ██╗

╚═╝  ╚═╝╚══════╝╚═╝     ╚══════╝╚══════╝╚═╝  ╚═╝

---

## What It Is

Reflex is an **OpenAI-compatible HTTP cache** for LLM responses: it sits between your agent/app and the provider, returning cached answers instantly and storing misses for later reuse. Cached responses are returned in [Tauq](https://github.com/epistates/tauq) format to reduce token overhead.

---

## Quick Start (Server)

```bash
# 1. Start Qdrant (vector database)
docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant

# 2. Run Reflex (HTTP server)
cargo run -p reflex-server --release

# 3. Point your agent to localhost:8080
export OPENAI_BASE_URL=http://localhost:8080/v1
```

---

## Quick Start (Library)

```bash
# Run the library example (no HTTP server)
cargo run -p reflex-cache --example basic_lookup --features mock
```

Embed in your own app:

```toml
[dependencies]
reflex = { package = "reflex-cache", version = "x.x.x" }
```

---

## Crates In This Repo

- **Server + binary (`reflex`)**: [crates/reflex-server](crates/reflex-server/README.md)
- **Core library (embedded use)**: [crates/reflex-cache](crates/reflex-cache/README.md) (docs.rs: https://docs.rs/reflex-cache)

---

## How It Works (High Level)

```
Request → L1 (exact) → L2 (semantic) → L3 (rerank/verify) → Provider
```

- **L1**: exact match (fast, in-memory)
- **L2**: semantic retrieval (Qdrant vector search)
- **L3**: verification (cross-encoder rerank to avoid false positives)

---

## Development

```bash
cargo test
cargo clippy --all-targets -- -D warnings
cargo fmt -- --check
```

---

Reflex: Stop paying for the same token twice.

_{Built with Rust, Qdrant, and a healthy disdain for redundant API calls.}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rawcontext/reflex

Awesome Lists containing this project

README