https://github.com/pitimon/ai-memory-patterns

Production patterns for building AI agent memory systems — from the team behind MemForge
https://github.com/pitimon/ai-memory-patterns
Last synced: about 2 months ago
JSON representation
Production patterns for building AI agent memory systems — from the team behind MemForge
Host: GitHub
URL: https://github.com/pitimon/ai-memory-patterns
Owner: pitimon
Created: 2026-03-21T22:09:32.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-03-21T22:48:17.000Z (3 months ago)
Last Synced: 2026-03-22T10:56:23.033Z (3 months ago)
Size: 36.1 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # AI Memory Patterns

**Production patterns for building AI agent memory systems.**

This guide shares architectural patterns, design decisions, and lessons learned from building a production AI memory system (154K+ observations, multi-tenant, 3-node cluster) that competes with commercial solutions like Mem0 and Zep.

**Not** a tutorial for using existing frameworks. This is a **builder's guide** — for engineers who want to understand _why_ certain patterns work and _what breaks_ when you skip them.

## Who This Is For

- Engineers building custom AI memory / RAG systems

- Teams evaluating build-vs-buy for persistent agent memory

- Architects designing multi-tenant AI platforms

- Anyone frustrated by "just use vector search" advice

## Architecture Overview

```

┌─────────────────────────────────────────────────────────────────────┐

│                        QUERY LAYER                                  │

│                                                                     │

│  User Query ──► Intent Detection ──► Complexity Analysis            │

│                      │                      │                       │

│              ┌───────┴───────┐      ┌───────┴───────┐              │

│              │ Weight Map    │      │ Retrieval     │              │

│              │ factual: 0.7F │      │ Params        │              │

│              │ relat:   0.7V │      │ limit/rerank  │              │

│              └───────┬───────┘      └───────┬───────┘              │

│                      └──────────┬───────────┘                       │

│                                 ▼                                   │

│  ┌──── Signal 1 ────┐  ┌── Signal 2 ──┐  ┌──── Signal 3 ────┐     │

│  │ FTS (keywords)   │  │ Vector (768d) │  │ Graph (concepts) │     │

│  │ LIKE %query%     │  │ HNSW cosine   │  │ 1-2 hop traverse │     │

│  └────────┬─────────┘  └──────┬────────┘  └────────┬─────────┘     │

│           └──────────────┬────┴─────────────────────┘               │

│                          ▼                                          │

│                   RRF Fusion (k=60)                                 │

│              combined = Wv·vec + Wf·fts + 0.15·graph               │

│                          │                                          │

│                          ▼                                          │

│                 Importance-weighted ranking                         │

│              score = similarity·0.7 + importance·0.3               │

│                          │                                          │

│                          ▼                                          │

│                   Ranked Results + Hints                            │

└─────────────────────────────────────────────────────────────────────┘

                           │

          ┌────────────────┼────────────────┐

          ▼                ▼                ▼

┌──── STORAGE ─────┐ ┌── GRAPH ──┐ ┌──── CACHE ──────┐

│ PostgreSQL       │ │ Memgraph  │ │ Redis           │

│ + pgvector HNSW  │ │ Cypher    │ │ TTL + Pub/Sub   │

│ halfvec(768)     │ │ In-memory │ │ Bloom filter    │

│ Schema-per-user  │ │           │ │ LRU embedding   │

└──────────────────┘ └───────────┘ └─────────────────┘

                           │

          ┌────────────────┼────────────────┐

          ▼                ▼                ▼

┌── AI WORKERS ────┐ ┌── SCORING ──┐ ┌── CURATION ───┐

│ Embedding (5s)   │ │ 5-Factor    │ │ Pin / Unpin   │

│ Observer (bg)    │ │ Importance  │ │ Contradict    │

│ Compression (1m) │ │ type: 30%   │ │ Drift check   │

│ Graph sync (5m)  │ │ recency:25% │ │ Set importance│

│ LLM: 4-provider  │ │ access: 20% │ │ Event date    │

│  fallback chain  │ │ ref:    15% │ │               │

│  + circuit break │ │ content:10% │ │               │

└──────────────────┘ └─────────────┘ └───────────────┘

```

## Patterns

### 1. Search Architecture

| Pattern                                                         | File | Key Insight                                                                      |

| --------------------------------------------------------------- | ---- | -------------------------------------------------------------------------------- |

| [3-Signal Hybrid Search](patterns/01-hybrid-search.md)          | `01` | Vector-only misses exact keywords. FTS-only misses semantics. Combine with RRF.  |

| [Intent-Based Weight Adaptation](patterns/02-intent-weights.md) | `02` | Factual queries need more FTS. Relational queries need more vector. Auto-detect. |

| [Query Complexity Analyzer](patterns/03-query-complexity.md)    | `03` | Simple queries need 10 results. Complex queries need 50 + reranking.             |

### 2. Data Architecture

| Pattern                                                         | File | Key Insight                                                                   |

| --------------------------------------------------------------- | ---- | ----------------------------------------------------------------------------- |

| [Schema-Per-User Multi-Tenancy](patterns/04-schema-per-user.md) | `04` | Row-level filtering leaks data when you forget WHERE. Schema isolation can't. |

| [Importance Scoring](patterns/05-importance-scoring.md)         | `05` | Not all memories matter equally. 5-factor model beats recency-only.           |

| [Knowledge Graph Integration](patterns/06-knowledge-graph.md)   | `06` | Graph adds value for multi-hop questions. Not worth it for simple recall.     |

### 3. AI Infrastructure

| Pattern                                                    | File | Key Insight                                                                             |

| ---------------------------------------------------------- | ---- | --------------------------------------------------------------------------------------- |

| [LLM Provider Fallback Chain](patterns/07-llm-fallback.md) | `07` | Free providers first, paid fallback. Circuit breaker per provider.                      |

| [Embedding Strategy](patterns/08-embedding-strategy.md)    | `08` | 768d is enough. Float16 quantization saves 50% storage. Don't over-engineer dimensions. |

| [Background Worker Architecture](patterns/09-workers.md)   | `09` | 15 workers, self-healing, batch processing. Don't embed synchronously.                  |

### 4. Quality & Operations

| Pattern                                              | File | Key Insight                                                                 |

| ---------------------------------------------------- | ---- | --------------------------------------------------------------------------- |

| [Memory Curation Tools](patterns/10-curation.md)     | `10` | Pin, contradict, drift-check. Memory rots without active curation.          |

| [Benchmarking (LoCoMo)](patterns/11-benchmarking.md) | `11` | How to evaluate memory systems fairly. Methodology matters more than score. |

| [MCP Plugin Design](patterns/12-mcp-plugin.md)       | `12` | Workflow hints > more tools. Descriptions guide LLM tool selection.         |

## Anti-Patterns

| Anti-Pattern            | Why It Fails                                    | Better Alternative                      |

| ----------------------- | ----------------------------------------------- | --------------------------------------- |

| Vector-only search      | Misses exact keywords, no project isolation     | 3-signal RRF (Pattern 1)                |

| Synchronous embedding   | Blocks write path, slow ingestion               | Background worker queue (Pattern 9)     |

| No importance scoring   | All memories treated equal, noise drowns signal | 5-factor model (Pattern 5)              |

| No project filter       | Cross-project contamination in search           | Server-side SQL filter                  |

| Huge embeddings (3072d) | 4x storage, marginal quality gain               | 768d + Float16 (Pattern 8)              |

| No curation tools       | Memory quality degrades over time               | Pin/contradict/drift-check (Pattern 10) |

## Benchmark Results

Evaluated against [LoCoMo](https://github.com/snap-research/locomo) benchmark (the standard for AI memory systems):

| System         | Score       | Approach                 |

| -------------- | ----------- | ------------------------ |

| Backboard      | 90.0%       | Gemini 2.5 Pro + custom  |

| Zep            | 75.1%       | Temporal knowledge graph |

| Mem0           | 62.5%       | Vector + optional graph  |

| **Our system** | **57.3%\*** | 3-signal RRF + FTS       |

\*Pilot on 1 conversation. Optimization roadmap targets 66-72%.

## Decision Matrices

### Build vs Buy

| Factor           | Build (like this guide)                            | Buy (Mem0/Zep)                        |

| ---------------- | -------------------------------------------------- | ------------------------------------- |

| Data sovereignty | Full control                                       | Vendor-dependent                      |

| Customization    | Unlimited                                          | API constraints                       |

| Cost at scale    | Infrastructure only                                | $19-$249/mo + usage                   |

| Time to market   | Weeks-months                                       | Hours-days                            |

| Maintenance      | Your team                                          | Vendor handles                        |

| **Best for**     | **Production platforms, compliance, custom needs** | **MVPs, startups, rapid prototyping** |

### Search Strategy

| Query Type          | Best Approach                         | Why                                  |

| ------------------- | ------------------------------------- | ------------------------------------ |

| Exact fact recall   | FTS-heavy (70% FTS, 30% vector)       | Keywords matter for facts            |

| Semantic similarity | Vector-heavy (70% vector, 30% FTS)    | Meaning matters for concepts         |

| Time-based          | Temporal query with date parsing      | "yesterday", "last week"             |

| Multi-hop reasoning | Graph traversal + vector              | Connect entities across observations |

| General question    | Balanced hybrid (50/50) + graph (15%) | Best default                         |

## Technology Stack (Reference Implementation)

```

Runtime:     Bun (TypeScript)

Database:    PostgreSQL 17 + pgvector (HNSW, halfvec)

Graph:       Memgraph (Cypher, in-memory)

Cache:       Redis 7 (TTL, Pub/Sub, Bloom filter)

Embedding:   Gemini embedding-001 (768d) → OpenRouter fallback

LLM:         vLLM → OpenRouter → Google AI → Ollama

Deploy:      Docker Swarm (3-node HA) with NFS

Monitoring:  Prometheus + Grafana + Alertmanager

Client:      MCP plugin for Claude Code (27 tools)

```

## Contributing

Found a pattern that worked (or broke) in your memory system? PRs welcome.

## License

MIT

## Acknowledgements

Patterns extracted from [MemForge](https://github.com/pitimon/memforge) — a production AI memory system serving 154K+ observations with multi-tenant architecture.

Inspired by the work of [Mem0](https://mem0.ai), [Zep/Graphiti](https://getzep.com), and the [LoCoMo benchmark](https://github.com/snap-research/locomo) research.

---

_Last verified: 2026-03-22 | Version: 1.0_
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pitimon/ai-memory-patterns

Awesome Lists containing this project

README