{"id":33922541,"url":"https://github.com/memvid/memvid","last_synced_at":"2026-02-15T21:06:12.092Z","repository":{"id":296039685,"uuid":"991431142","full_name":"memvid/memvid","owner":"memvid","description":"Memory layer for AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer. Give your agents instant retrieval and long-term memory.","archived":false,"fork":false,"pushed_at":"2026-02-03T20:59:46.000Z","size":30538,"stargazers_count":12928,"open_issues_count":21,"forks_count":1088,"subscribers_count":89,"default_branch":"main","last_synced_at":"2026-02-04T09:28:14.123Z","etag":null,"topics":["ai","context","embedded","faiss","knowledge-base","knowledge-graph","llm","machine-learning","memory","memvid","mv2","nlp","offline-first","opencv","python","rag","retrieval-augmented-generation","semantic-search","vector-database","video-processing"],"latest_commit_sha":null,"homepage":"https://www.memvid.com","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/memvid.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["memvid"]}},"created_at":"2025-05-27T16:01:08.000Z","updated_at":"2026-02-04T09:20:49.000Z","dependencies_parsed_at":"2025-12-30T10:05:21.926Z","dependency_job_id":null,"html_url":"https://github.com/memvid/memvid","commit_stats":null,"previous_names":["olow304/memvid","memvid/memvid"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/memvid/memvid","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/memvid%2Fmemvid","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/memvid%2Fmemvid/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/memvid%2Fmemvid/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/memvid%2Fmemvid/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/memvid","download_url":"https://codeload.github.com/memvid/memvid/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/memvid%2Fmemvid/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29489427,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-15T19:29:10.908Z","status":"ssl_error","status_checked_at":"2026-02-15T19:29:10.419Z","response_time":118,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","context","embedded","faiss","knowledge-base","knowledge-graph","llm","machine-learning","memory","memvid","mv2","nlp","offline-first","opencv","python","rag","retrieval-augmented-generation","semantic-search","vector-database","video-processing"],"created_at":"2025-12-12T09:04:33.425Z","updated_at":"2026-02-15T21:06:12.077Z","avatar_url":"https://github.com/memvid.png","language":"Rust","readme":"\u003c!-- HEADER:START --\u003e\n\u003cimg width=\"2000\" height=\"524\" alt=\"Social Cover (9)\"\n     src=\"https://github.com/user-attachments/assets/cf66f045-c8be-494b-b696-b8d7e4fb709c\" /\u003e\n\u003c!-- HEADER:END --\u003e\n\n\u003cdiv style=\"height: 16px;\"\u003e\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://trendshift.io/repositories/17293\" target=\"_blank\"\u003e\u003cimg src=\"https://trendshift.io/api/badge/repositories/17293\" alt=\"memvid%2Fmemvid | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"/\u003e\u003c/a\u003e\n\u003c/p\u003e\n\u003c!-- BADGES:END --\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eMemvid is a single-file memory layer for AI agents with instant retrieval and long-term memory.\u003c/strong\u003e\u003cbr/\u003e\n  Persistent, versioned, and portable memory, without databases.\n\u003c/p\u003e\n\n\u003c!-- NAV:START --\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://www.memvid.com\"\u003eWebsite\u003c/a\u003e\n  ·\n  \u003ca href=\"https://sandbox.memvid.com\"\u003eTry Sandbox\u003c/a\u003e\n  ·\n  \u003ca href=\"https://docs.memvid.com\"\u003eDocs\u003c/a\u003e\n  ·\n  \u003ca href=\"https://github.com/memvid/memvid/discussions\"\u003eDiscussions\u003c/a\u003e\n\u003c/p\u003e\n\u003c!-- NAV:END --\u003e\n\n\u003c!-- BADGES:START --\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://crates.io/crates/memvid-core\"\u003e\u003cimg src=\"https://img.shields.io/crates/v/memvid-core?style=flat-square\u0026logo=rust\" alt=\"Crates.io\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://docs.rs/memvid-core\"\u003e\u003cimg src=\"https://img.shields.io/docsrs/memvid-core?style=flat-square\u0026logo=docs.rs\" alt=\"docs.rs\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/memvid/memvid/blob/main/LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-Apache%202.0-blue?style=flat-square\" alt=\"License\" /\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/memvid/memvid/stargazers\"\u003e\u003cimg src=\"https://img.shields.io/github/stars/memvid/memvid?style=flat-square\u0026logo=github\" alt=\"Stars\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/memvid/memvid/network/members\"\u003e\u003cimg src=\"https://img.shields.io/github/forks/memvid/memvid?style=flat-square\u0026logo=github\" alt=\"Forks\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/memvid/memvid/issues\"\u003e\u003cimg src=\"https://img.shields.io/github/issues/memvid/memvid?style=flat-square\u0026logo=github\" alt=\"Issues\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://discord.gg/2mynS7fcK7\"\u003e\u003cimg src=\"https://img.shields.io/discord/1442910055233224745?style=flat-square\u0026logo=discord\u0026label=discord\" alt=\"Discord\" /\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\n\n\n## Benchmark Highlights\n\n**🚀 Higher accuracy than any other memory system :** +35% SOTA on LoCoMo, best-in-class long-horizon conversational recall \u0026 reasoning\n\n**🧠 Superior multi-hop \u0026 temporal reasoning:**  +76% multi-hop, +56% temporal vs. the industry average\n\n**⚡ Ultra-low latency at scale** 0.025ms P50 and 0.075ms P99, with 1,372× higher throughput than standard\n\n**🔬 Fully reproducible benchmarks:** LoCoMo (10 × ~26K-token conversations), open-source eval, LLM-as-Judge\n\n\n## What is Memvid?\n\nMemvid is a portable AI memory system that packages your data, embeddings, search structure, and metadata into a single file.\n\nInstead of running complex RAG pipelines or server-based vector databases, Memvid enables fast retrieval directly from the file.\n\nThe result is a model-agnostic, infrastructure-free memory layer that gives AI agents persistent, long-term memory they can carry anywhere.\n\n    \n## What are Smart Frames?\n\nMemvid draws inspiration from video encoding, not to store video, but to **organize AI memory as an append-only, ultra-efficient sequence of Smart Frames.**\n\nA Smart Frame is an immutable unit that stores content along with timestamps, checksums and basic metadata.\nFrames are grouped in a way that allows efficient compression, indexing, and parallel reads.\n\nThis frame-based design enables:\n\n-   Append-only writes without modifying or corrupting existing data\n-   Queries over past memory states\n-   Timeline-style inspection of how knowledge evolves\n-   Crash safety through committed, immutable frames\n-   Efficient compression using techniques adapted from video encoding\n\nThe result is a single file that behaves like a rewindable memory timeline for AI systems.\n\n\n## Core Concepts\n\n-   **Living Memory Engine**\n    Continuously append, branch, and evolve memory across sessions.\n\n-   **Capsule Context (`.mv2`)**\n    Self-contained, shareable memory capsules with rules and expiry.\n\n-   **Time-Travel Debugging**\n    Rewind, replay, or branch any memory state.\n\n-   **Smart Recall**\n    Sub-5ms local memory access with predictive caching.\n\n-   **Codec Intelligence**\n    Auto-selects and upgrades compression over time.\n\n\n## Use Cases\n\nMemvid is a portable, serverless memory layer that gives AI agents persistent memory and fast recall. Because it's model-agnostic, multi-modal, and works fully offline, developers are using Memvid across a wide range of real-world applications.\n\n-   Long-Running AI Agents\n-   Enterprise Knowledge Bases\n-   Offline-First AI Systems\n-   Codebase Understanding\n-   Customer Support Agents\n-   Workflow Automation\n-   Sales and Marketing Copilots\n-   Personal Knowledge Assistants\n-   Medical, Legal, and Financial Agents\n-   Auditable and Debuggable AI Workflows\n-   Custom Applications\n\n\n## SDKs \u0026 CLI\n\nUse Memvid in your preferred language:\n\n| Package         | Install                     | Links                                                                                                               |\n| --------------- | --------------------------- | ------------------------------------------------------------------------------------------------------------------- |\n| **CLI**         | `npm install -g memvid-cli` | [![npm](https://img.shields.io/npm/v/memvid-cli?style=flat-square)](https://www.npmjs.com/package/memvid-cli)       |\n| **Node.js SDK** | `npm install @memvid/sdk`   | [![npm](https://img.shields.io/npm/v/@memvid/sdk?style=flat-square)](https://www.npmjs.com/package/@memvid/sdk)     |\n| **Python SDK**  | `pip install memvid-sdk`    | [![PyPI](https://img.shields.io/pypi/v/memvid-sdk?style=flat-square)](https://pypi.org/project/memvid-sdk/)         |\n| **Rust**        | `cargo add memvid-core`     | [![Crates.io](https://img.shields.io/crates/v/memvid-core?style=flat-square)](https://crates.io/crates/memvid-core) |\n\n---\n\n## Installation (Rust)\n\n### Requirements\n\n-   **Rust 1.85.0+** — Install from [rustup.rs](https://rustup.rs)\n\n### Add to Your Project\n\n```toml\n[dependencies]\nmemvid-core = \"2.0\"\n```\n\n### Feature Flags\n\n| Feature             | Description                                                      |\n| ------------------- | ---------------------------------------------------------------- |\n| `lex`               | Full-text search with BM25 ranking (Tantivy)                     |\n| `pdf_extract`       | Pure Rust PDF text extraction                                    |\n| `vec`               | Vector similarity search (HNSW + local text embeddings via ONNX) |\n| `clip`              | CLIP visual embeddings for image search                          |\n| `whisper`           | Audio transcription with Whisper                                 |\n| `api_embed`         | Cloud API embeddings (OpenAI)                                    |\n| `temporal_track`    | Natural language date parsing (\"last Tuesday\")                   |\n| `parallel_segments` | Multi-threaded ingestion                                         |\n| `encryption`        | Password-based encryption capsules (.mv2e)                       |\n| `symspell_cleanup`  | Robust PDF text repair (fixes \"emp lo yee\" -\u003e \"employee\")        |\n\nEnable features as needed:\n\n```toml\n[dependencies]\nmemvid-core = { version = \"2.0\", features = [\"lex\", \"vec\", \"temporal_track\"] }\n```\n\n\n## Quick Start\n\n```rust\nuse memvid_core::{Memvid, PutOptions, SearchRequest};\n\nfn main() -\u003e memvid_core::Result\u003c()\u003e {\n    // Create a new memory file\n    let mut mem = Memvid::create(\"knowledge.mv2\")?;\n\n    // Add documents with metadata\n    let opts = PutOptions::builder()\n        .title(\"Meeting Notes\")\n        .uri(\"mv2://meetings/2024-01-15\")\n        .tag(\"project\", \"alpha\")\n        .build();\n    mem.put_bytes_with_options(b\"Q4 planning discussion...\", opts)?;\n    mem.commit()?;\n\n    // Search\n    let response = mem.search(SearchRequest {\n        query: \"planning\".into(),\n        top_k: 10,\n        snippet_chars: 200,\n        ..Default::default()\n    })?;\n\n    for hit in response.hits {\n        println!(\"{}: {}\", hit.title.unwrap_or_default(), hit.text);\n    }\n\n    Ok(())\n}\n```\n\n---\n\n## Build\n\nClone the repository:\n\n```bash\ngit clone https://github.com/memvid/memvid.git\ncd memvid\n```\n\nBuild in debug mode:\n\n```bash\ncargo build\n```\n\nBuild in release mode (optimized):\n\n```bash\ncargo build --release\n```\n\nBuild with specific features:\n\n```bash\ncargo build --release --features \"lex,vec,temporal_track\"\n```\n\n---\n\n## Run Tests\n\nRun all tests:\n\n```bash\ncargo test\n```\n\nRun tests with output:\n\n```bash\ncargo test -- --nocapture\n```\n\nRun a specific test:\n\n```bash\ncargo test test_name\n```\n\nRun integration tests only:\n\n```bash\ncargo test --test lifecycle\ncargo test --test search\ncargo test --test mutation\n```\n\n---\n\n## Examples\n\nThe `examples/` directory contains working examples:\n\n### Basic Usage\n\nDemonstrates create, put, search, and timeline operations:\n\n```bash\ncargo run --example basic_usage\n```\n\n### PDF Ingestion\n\nIngest and search PDF documents (uses the \"Attention Is All You Need\" paper):\n\n```bash\ncargo run --example pdf_ingestion\n```\n\n### CLIP Visual Search\n\nImage search using CLIP embeddings (requires `clip` feature):\n\n```bash\ncargo run --example clip_visual_search --features clip\n```\n\n### Whisper Transcription\n\nAudio transcription (requires `whisper` feature):\n\n```bash\ncargo run --example test_whisper --features whisper -- /path/to/audio.mp3\n```\n\n**Available Models:**\n\n| Model                 | Size   | Speed   | Use Case                            |\n| --------------------- | ------ | ------- | ----------------------------------- |\n| `whisper-small-en`    | 244 MB | Slowest | Best accuracy (default)             |\n| `whisper-tiny-en`     | 75 MB  | Fast    | Balanced                            |\n| `whisper-tiny-en-q8k` | 19 MB  | Fastest | Quick testing, resource-constrained |\n\n**Model Selection:**\n\n```bash\n# Default (FP32 small, highest accuracy)\ncargo run --example test_whisper --features whisper -- audio.mp3\n\n# Quantized tiny (75% smaller, faster)\nMEMVID_WHISPER_MODEL=whisper-tiny-en-q8k cargo run --example test_whisper --features whisper -- audio.mp3\n```\n\n**Programmatic Configuration:**\n\n```rust\nuse memvid_core::{WhisperConfig, WhisperTranscriber};\n\n// Default FP32 small model\nlet config = WhisperConfig::default();\n\n// Quantized tiny model (faster, smaller)\nlet config = WhisperConfig::with_quantization();\n\n// Specific model\nlet config = WhisperConfig::with_model(\"whisper-tiny-en-q8k\");\n\nlet transcriber = WhisperTranscriber::new(\u0026config)?;\nlet result = transcriber.transcribe_file(\"audio.mp3\")?;\nprintln!(\"{}\", result.text);\n```\n\n\n## Text Embedding Models\n\nThe `vec` feature includes local text embedding support using ONNX models. Before using local text embeddings, you need to download the model files manually.\n\n### Quick Start: BGE-small (Recommended)\n\nDownload the default BGE-small model (384 dimensions, fast and efficient):\n\n```bash\nmkdir -p ~/.cache/memvid/text-models\n\n# Download ONNX model\ncurl -L 'https://huggingface.co/BAAI/bge-small-en-v1.5/resolve/main/onnx/model.onnx' \\\n  -o ~/.cache/memvid/text-models/bge-small-en-v1.5.onnx\n\n# Download tokenizer\ncurl -L 'https://huggingface.co/BAAI/bge-small-en-v1.5/resolve/main/tokenizer.json' \\\n  -o ~/.cache/memvid/text-models/bge-small-en-v1.5_tokenizer.json\n```\n\n### Available Models\n\n| Model                   | Dimensions | Size   | Best For        |\n| ----------------------- | ---------- | ------ | --------------- |\n| `bge-small-en-v1.5`     | 384        | ~120MB | Default, fast   |\n| `bge-base-en-v1.5`      | 768        | ~420MB | Better quality  |\n| `nomic-embed-text-v1.5` | 768        | ~530MB | Versatile tasks |\n| `gte-large`             | 1024       | ~1.3GB | Highest quality |\n\n### Other Models\n\n**BGE-base** (768 dimensions):\n```bash\ncurl -L 'https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/onnx/model.onnx' \\\n  -o ~/.cache/memvid/text-models/bge-base-en-v1.5.onnx\ncurl -L 'https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/tokenizer.json' \\\n  -o ~/.cache/memvid/text-models/bge-base-en-v1.5_tokenizer.json\n```\n\n**Nomic** (768 dimensions):\n```bash\ncurl -L 'https://huggingface.co/nomic-ai/nomic-embed-text-v1.5/resolve/main/onnx/model.onnx' \\\n  -o ~/.cache/memvid/text-models/nomic-embed-text-v1.5.onnx\ncurl -L 'https://huggingface.co/nomic-ai/nomic-embed-text-v1.5/resolve/main/tokenizer.json' \\\n  -o ~/.cache/memvid/text-models/nomic-embed-text-v1.5_tokenizer.json\n```\n\n**GTE-large** (1024 dimensions):\n```bash\ncurl -L 'https://huggingface.co/thenlper/gte-large/resolve/main/onnx/model.onnx' \\\n  -o ~/.cache/memvid/text-models/gte-large.onnx\ncurl -L 'https://huggingface.co/thenlper/gte-large/resolve/main/tokenizer.json' \\\n  -o ~/.cache/memvid/text-models/gte-large_tokenizer.json\n```\n\n### Usage in Code\n\n```rust\nuse memvid_core::text_embed::{LocalTextEmbedder, TextEmbedConfig};\nuse memvid_core::types::embedding::EmbeddingProvider;\n\n// Use default model (BGE-small)\nlet config = TextEmbedConfig::default();\nlet embedder = LocalTextEmbedder::new(config)?;\n\nlet embedding = embedder.embed_text(\"hello world\")?;\nassert_eq!(embedding.len(), 384);\n\n// Use different model\nlet config = TextEmbedConfig::bge_base();\nlet embedder = LocalTextEmbedder::new(config)?;\n```\n\nSee `examples/text_embedding.rs` for a complete example with similarity computation and search ranking.\n\n### Model Consistency\n\nTo prevent accidental model mixing (e.g., querying a BGE-small index with OpenAI embeddings), you can explicitly bind your Memvid instance to a specific model name:\n\n```rust\n// Bind the index to a specific model.\n// If the index was previously created with a different model, this will return an error.\nmem.set_vec_model(\"bge-small-en-v1.5\")?;\n```\n\nThis binding is persistent. Once set, future attempts to use a different model name will fail fast with a `ModelMismatch` error.\n\n\n\n## API Embeddings (OpenAI)\n\nThe `api_embed` feature enables cloud-based embedding generation using OpenAI's API.\n\n### Setup\n\nSet your OpenAI API key:\n\n```bash\nexport OPENAI_API_KEY=\"sk-...\"\n```\n\n### Usage\n\n```rust\nuse memvid_core::api_embed::{OpenAIConfig, OpenAIEmbedder};\nuse memvid_core::types::embedding::EmbeddingProvider;\n\n// Use default model (text-embedding-3-small)\nlet config = OpenAIConfig::default();\nlet embedder = OpenAIEmbedder::new(config)?;\n\nlet embedding = embedder.embed_text(\"hello world\")?;\nassert_eq!(embedding.len(), 1536);\n\n// Use higher quality model\nlet config = OpenAIConfig::large();  // text-embedding-3-large (3072 dims)\nlet embedder = OpenAIEmbedder::new(config)?;\n```\n\n### Available Models\n\n| Model                    | Dimensions | Best For                   |\n| ------------------------ | ---------- | -------------------------- |\n| `text-embedding-3-small` | 1536       | Default, fastest, cheapest |\n| `text-embedding-3-large` | 3072       | Highest quality            |\n| `text-embedding-ada-002` | 1536       | Legacy model               |\n\nSee `examples/openai_embedding.rs` for a complete example.\n\n\n\n## File Format\n\nEverything lives in a single `.mv2` file:\n\n```\n┌────────────────────────────┐\n│ Header (4KB)               │  Magic, version, capacity\n├────────────────────────────┤\n│ Embedded WAL (1-64MB)      │  Crash recovery\n├────────────────────────────┤\n│ Data Segments              │  Compressed frames\n├────────────────────────────┤\n│ Lex Index                  │  Tantivy full-text\n├────────────────────────────┤\n│ Vec Index                  │  HNSW vectors\n├────────────────────────────┤\n│ Time Index                 │  Chronological ordering\n├────────────────────────────┤\n│ TOC (Footer)               │  Segment offsets\n└────────────────────────────┘\n```\n\nNo `.wal`, `.lock`, `.shm`, or sidecar files. Ever.\n\nSee [MV2_SPEC.md](MV2_SPEC.md) for the complete file format specification.\n\n\n\n## Support\n\nHave questions or feedback?\nEmail: contact@memvid.com\n\n**Drop a ⭐ to show support**\n\n---\n\n\u003e **Memvid v1 (QR-based memory) is deprecated**\n\u003e\n\u003e If you are referencing QR codes, you are using outdated information.\n\u003e\n\u003e See: https://docs.memvid.com/memvid-v1-deprecation\n\n---\n\n## License\n\nApache License 2.0 — see the [LICENSE](LICENSE) file for details.\n","funding_links":["https://github.com/sponsors/memvid"],"categories":["Libraries","Rust","Repos"],"sub_categories":["Artificial Intelligence"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmemvid%2Fmemvid","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmemvid%2Fmemvid","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmemvid%2Fmemvid/lists"}