An open API service indexing awesome lists of open source software.

https://github.com/pitimon/ai-memory-patterns

Production patterns for building AI agent memory systems — from the team behind MemForge
https://github.com/pitimon/ai-memory-patterns

Last synced: about 2 months ago
JSON representation

Production patterns for building AI agent memory systems — from the team behind MemForge

Awesome Lists containing this project

README

          

# AI Memory Patterns

**Production patterns for building AI agent memory systems.**

This guide shares architectural patterns, design decisions, and lessons learned from building a production AI memory system (154K+ observations, multi-tenant, 3-node cluster) that competes with commercial solutions like Mem0 and Zep.

**Not** a tutorial for using existing frameworks. This is a **builder's guide** — for engineers who want to understand _why_ certain patterns work and _what breaks_ when you skip them.

## Who This Is For

- Engineers building custom AI memory / RAG systems
- Teams evaluating build-vs-buy for persistent agent memory
- Architects designing multi-tenant AI platforms
- Anyone frustrated by "just use vector search" advice

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────────────┐
│ QUERY LAYER │
│ │
│ User Query ──► Intent Detection ──► Complexity Analysis │
│ │ │ │
│ ┌───────┴───────┐ ┌───────┴───────┐ │
│ │ Weight Map │ │ Retrieval │ │
│ │ factual: 0.7F │ │ Params │ │
│ │ relat: 0.7V │ │ limit/rerank │ │
│ └───────┬───────┘ └───────┬───────┘ │
│ └──────────┬───────────┘ │
│ ▼ │
│ ┌──── Signal 1 ────┐ ┌── Signal 2 ──┐ ┌──── Signal 3 ────┐ │
│ │ FTS (keywords) │ │ Vector (768d) │ │ Graph (concepts) │ │
│ │ LIKE %query% │ │ HNSW cosine │ │ 1-2 hop traverse │ │
│ └────────┬─────────┘ └──────┬────────┘ └────────┬─────────┘ │
│ └──────────────┬────┴─────────────────────┘ │
│ ▼ │
│ RRF Fusion (k=60) │
│ combined = Wv·vec + Wf·fts + 0.15·graph │
│ │ │
│ ▼ │
│ Importance-weighted ranking │
│ score = similarity·0.7 + importance·0.3 │
│ │ │
│ ▼ │
│ Ranked Results + Hints │
└─────────────────────────────────────────────────────────────────────┘

┌────────────────┼────────────────┐
▼ ▼ ▼
┌──── STORAGE ─────┐ ┌── GRAPH ──┐ ┌──── CACHE ──────┐
│ PostgreSQL │ │ Memgraph │ │ Redis │
│ + pgvector HNSW │ │ Cypher │ │ TTL + Pub/Sub │
│ halfvec(768) │ │ In-memory │ │ Bloom filter │
│ Schema-per-user │ │ │ │ LRU embedding │
└──────────────────┘ └───────────┘ └─────────────────┘

┌────────────────┼────────────────┐
▼ ▼ ▼
┌── AI WORKERS ────┐ ┌── SCORING ──┐ ┌── CURATION ───┐
│ Embedding (5s) │ │ 5-Factor │ │ Pin / Unpin │
│ Observer (bg) │ │ Importance │ │ Contradict │
│ Compression (1m) │ │ type: 30% │ │ Drift check │
│ Graph sync (5m) │ │ recency:25% │ │ Set importance│
│ LLM: 4-provider │ │ access: 20% │ │ Event date │
│ fallback chain │ │ ref: 15% │ │ │
│ + circuit break │ │ content:10% │ │ │
└──────────────────┘ └─────────────┘ └───────────────┘
```

## Patterns

### 1. Search Architecture

| Pattern | File | Key Insight |
| --------------------------------------------------------------- | ---- | -------------------------------------------------------------------------------- |
| [3-Signal Hybrid Search](patterns/01-hybrid-search.md) | `01` | Vector-only misses exact keywords. FTS-only misses semantics. Combine with RRF. |
| [Intent-Based Weight Adaptation](patterns/02-intent-weights.md) | `02` | Factual queries need more FTS. Relational queries need more vector. Auto-detect. |
| [Query Complexity Analyzer](patterns/03-query-complexity.md) | `03` | Simple queries need 10 results. Complex queries need 50 + reranking. |

### 2. Data Architecture

| Pattern | File | Key Insight |
| --------------------------------------------------------------- | ---- | ----------------------------------------------------------------------------- |
| [Schema-Per-User Multi-Tenancy](patterns/04-schema-per-user.md) | `04` | Row-level filtering leaks data when you forget WHERE. Schema isolation can't. |
| [Importance Scoring](patterns/05-importance-scoring.md) | `05` | Not all memories matter equally. 5-factor model beats recency-only. |
| [Knowledge Graph Integration](patterns/06-knowledge-graph.md) | `06` | Graph adds value for multi-hop questions. Not worth it for simple recall. |

### 3. AI Infrastructure

| Pattern | File | Key Insight |
| ---------------------------------------------------------- | ---- | --------------------------------------------------------------------------------------- |
| [LLM Provider Fallback Chain](patterns/07-llm-fallback.md) | `07` | Free providers first, paid fallback. Circuit breaker per provider. |
| [Embedding Strategy](patterns/08-embedding-strategy.md) | `08` | 768d is enough. Float16 quantization saves 50% storage. Don't over-engineer dimensions. |
| [Background Worker Architecture](patterns/09-workers.md) | `09` | 15 workers, self-healing, batch processing. Don't embed synchronously. |

### 4. Quality & Operations

| Pattern | File | Key Insight |
| ---------------------------------------------------- | ---- | --------------------------------------------------------------------------- |
| [Memory Curation Tools](patterns/10-curation.md) | `10` | Pin, contradict, drift-check. Memory rots without active curation. |
| [Benchmarking (LoCoMo)](patterns/11-benchmarking.md) | `11` | How to evaluate memory systems fairly. Methodology matters more than score. |
| [MCP Plugin Design](patterns/12-mcp-plugin.md) | `12` | Workflow hints > more tools. Descriptions guide LLM tool selection. |

## Anti-Patterns

| Anti-Pattern | Why It Fails | Better Alternative |
| ----------------------- | ----------------------------------------------- | --------------------------------------- |
| Vector-only search | Misses exact keywords, no project isolation | 3-signal RRF (Pattern 1) |
| Synchronous embedding | Blocks write path, slow ingestion | Background worker queue (Pattern 9) |
| No importance scoring | All memories treated equal, noise drowns signal | 5-factor model (Pattern 5) |
| No project filter | Cross-project contamination in search | Server-side SQL filter |
| Huge embeddings (3072d) | 4x storage, marginal quality gain | 768d + Float16 (Pattern 8) |
| No curation tools | Memory quality degrades over time | Pin/contradict/drift-check (Pattern 10) |

## Benchmark Results

Evaluated against [LoCoMo](https://github.com/snap-research/locomo) benchmark (the standard for AI memory systems):

| System | Score | Approach |
| -------------- | ----------- | ------------------------ |
| Backboard | 90.0% | Gemini 2.5 Pro + custom |
| Zep | 75.1% | Temporal knowledge graph |
| Mem0 | 62.5% | Vector + optional graph |
| **Our system** | **57.3%\*** | 3-signal RRF + FTS |

\*Pilot on 1 conversation. Optimization roadmap targets 66-72%.

## Decision Matrices

### Build vs Buy

| Factor | Build (like this guide) | Buy (Mem0/Zep) |
| ---------------- | -------------------------------------------------- | ------------------------------------- |
| Data sovereignty | Full control | Vendor-dependent |
| Customization | Unlimited | API constraints |
| Cost at scale | Infrastructure only | $19-$249/mo + usage |
| Time to market | Weeks-months | Hours-days |
| Maintenance | Your team | Vendor handles |
| **Best for** | **Production platforms, compliance, custom needs** | **MVPs, startups, rapid prototyping** |

### Search Strategy

| Query Type | Best Approach | Why |
| ------------------- | ------------------------------------- | ------------------------------------ |
| Exact fact recall | FTS-heavy (70% FTS, 30% vector) | Keywords matter for facts |
| Semantic similarity | Vector-heavy (70% vector, 30% FTS) | Meaning matters for concepts |
| Time-based | Temporal query with date parsing | "yesterday", "last week" |
| Multi-hop reasoning | Graph traversal + vector | Connect entities across observations |
| General question | Balanced hybrid (50/50) + graph (15%) | Best default |

## Technology Stack (Reference Implementation)

```
Runtime: Bun (TypeScript)
Database: PostgreSQL 17 + pgvector (HNSW, halfvec)
Graph: Memgraph (Cypher, in-memory)
Cache: Redis 7 (TTL, Pub/Sub, Bloom filter)
Embedding: Gemini embedding-001 (768d) → OpenRouter fallback
LLM: vLLM → OpenRouter → Google AI → Ollama
Deploy: Docker Swarm (3-node HA) with NFS
Monitoring: Prometheus + Grafana + Alertmanager
Client: MCP plugin for Claude Code (27 tools)
```

## Contributing

Found a pattern that worked (or broke) in your memory system? PRs welcome.

## License

MIT

## Acknowledgements

Patterns extracted from [MemForge](https://github.com/pitimon/memforge) — a production AI memory system serving 154K+ observations with multi-tenant architecture.

Inspired by the work of [Mem0](https://mem0.ai), [Zep/Graphiti](https://getzep.com), and the [LoCoMo benchmark](https://github.com/snap-research/locomo) research.

---

_Last verified: 2026-03-22 | Version: 1.0_