https://github.com/basicmachines-co/basic-memory-benchmarks
Reproducible benchmark suite for Basic Memory and competitor memory systems
https://github.com/basicmachines-co/basic-memory-benchmarks
Last synced: 3 months ago
JSON representation
Reproducible benchmark suite for Basic Memory and competitor memory systems
- Host: GitHub
- URL: https://github.com/basicmachines-co/basic-memory-benchmarks
- Owner: basicmachines-co
- Created: 2026-02-26T03:15:22.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-03-15T02:46:46.000Z (3 months ago)
- Last Synced: 2026-03-15T13:52:00.701Z (3 months ago)
- Language: Python
- Size: 360 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# basic-memory-benchmarks
Standalone, reproducible benchmark suite for comparing Basic Memory against competitor memory systems.
## Goals
- Deterministic retrieval benchmarks (Recall@5/10, MRR, Precision@5, content-hit, latency)
- Optional LLM-as-judge scoring (Pydantic Evals)
- Public artifacts with provenance and reproducibility metadata
- Clean dependency isolation from the core `basic-memory` repository
## Current v1 Scope
- Providers:
- `bm-local` (warm `bm mcp` stdio session)
- `bm-cloud` (optional, credential-gated)
- `mem0-local`
- `zep-reference` (reference-only in v1)
- Datasets:
- LoCoMo (primary)
- LongMemEval scaffold (placeholder)
- Built-in synthetic smoke corpus
## Installation
```bash
uv sync --group dev
```
Optional judge dependencies:
```bash
uv sync --group dev --extra judge
```
## Quickstart
### 1) Fetch LoCoMo dataset
```bash
uv run bm-bench datasets fetch --dataset locomo
```
### 2) Convert LoCoMo into benchmark corpus
```bash
uv run bm-bench convert locomo
```
### 3) Run retrieval benchmark
```bash
uv run bm-bench run retrieval \
--providers bm-local,mem0-local \
--corpus-dir benchmarks/generated/locomo/docs \
--queries-path benchmarks/generated/locomo/queries.json
```
### 4) Optional judge benchmark
```bash
uv run bm-bench run judge --run-dir benchmarks/runs/
```
### 5) Publish run artifacts
```bash
uv run bm-bench publish --run-dir benchmarks/runs/
```
## Basic Memory source policy
By default this project tracks Basic Memory from `main`.
Each run manifest stores:
- BM source (`github main` or local path override)
- resolved BM commit SHA
Local override:
```bash
uv run bm-bench run retrieval \
--bm-local-path /path/to/basic-memory
```
## Mem0 local requirements
`mem0-local` requires model credentials available in environment.
At minimum, set:
```bash
export OPENAI_API_KEY=...
```
If unavailable, provider status will be recorded as `SKIPPED(reason)`.
## BM indexing readiness
`bm-local` verifies index readiness before querying.
- If the installed `bm` supports `bm status --json`, readiness is polled from that output.
- If `--json` is not available in the installed `bm`, the benchmark proceeds after reindex.
## Run Artifacts
Per run (`benchmarks/runs//`):
- `manifest.json`
- `provider-status.json`
- `per-query-retrieval.jsonl`
- `retrieval-summary.json`
- `per-query-judge.jsonl` (optional)
- `judge-summary.json` (optional)
- `summary.md`
## Just commands
```bash
just bench-smoke
just bench-fetch-locomo
just bench-convert-locomo
just bench-run-bm-local
just bench-run-mem0-local
just bench-run-full
just bench-judge
just bench-publish RUN_DIR=benchmarks/runs/
```
## Notes on dataset publication
Dataset publication follows licensing constraints:
- If redistribution is permitted: snapshot + checksum may be published.
- If not: canonical source links + downloader + checksum verification are published.