{"id":18599511,"url":"https://github.com/hicder/muopdb","last_synced_at":"2025-04-10T18:31:17.539Z","repository":{"id":257929460,"uuid":"869784587","full_name":"hicder/muopdb","owner":"hicder","description":"MuopDB - A Vector Database","archived":false,"fork":false,"pushed_at":"2025-04-10T04:02:10.000Z","size":15199,"stargazers_count":62,"open_issues_count":19,"forks_count":6,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-10T04:36:32.060Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://github.com/hicder/muopdb/wiki","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hicder.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-08T22:10:04.000Z","updated_at":"2025-04-10T04:02:14.000Z","dependencies_parsed_at":"2024-10-20T16:36:37.366Z","dependency_job_id":"9caf8b45-8eb3-47bd-80fe-d77720acbcff","html_url":"https://github.com/hicder/muopdb","commit_stats":null,"previous_names":["hicder/muopdb"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hicder%2Fmuopdb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hicder%2Fmuopdb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hicder%2Fmuopdb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hicder%2Fmuopdb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hicder","download_url":"https://codeload.github.com/hicder/muopdb/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248271615,"owners_count":21075800,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T02:00:26.787Z","updated_at":"2025-04-10T18:31:17.531Z","avatar_url":"https://github.com/hicder.png","language":"Rust","funding_links":[],"categories":["Open Sources","Multidimensional data / Vectors"],"sub_categories":[],"readme":"MuopDB - A vector database for AI memories\n---\n\n## Introduction\nMuopDB is a vector database for machine learning. Currently, it supports:\n* Index type: HNSW, IVF, SPANN, Multi-user SPANN. All on-disk with mmap.\n* Quantization: product quantization\n\n## Why MuopDB?\nMuopDB supports multiple users by default. What that means is, each user will have its own vector index, within the same collection. The use-case for this is to build memory for LLMs.\nThink of it as:\n* Each user will have its own memory\n* Each user can still search a shared knowledge base.\n\nAll users' indices will be stored in a few files, reducing operational complexity.\n\n## Quick Start\n\n* Build MuopDB. Refer to this [instruction](https://github.com/hicder/muopdb?tab=readme-ov-file#building).\n* Prepare necessary `data` and `indices` directories. On Mac, you might want to change these directories since root directory is read-only, i.e: `~/mnt/muopdb/`.\n```\nmkdir -p /mnt/muopdb/indices\nmkdir -p /mnt/muopdb/data\n```\n* Start MuopDB `index_server` with the directories we just prepared using one of these methods:\n```bash\n# Start server locally. This is recommended for Mac.\ncd target/release\nRUST_LOG=info ./index_server --node-id 0 --index-config-path /mnt/muopdb/indices --index-data-path /mnt/muopdb/data --port 9002\n\n# Start server with Docker. Only use this option on Linux.\ndocker-compose up --build\n```\n* Now you have an up and running MuopDB `index_server`.\n  * You can send gRPC requests to this server (possibly with [Postman](https://www.postman.com/)).\n  * You can use Server Reflection in Postman - it will automatically detect the RPCs for MuopDB.\n### Examples using Postman\n1. Create collection\n\u003cimg width=\"802\" alt=\"Screenshot 2025-03-26 at 8 32 05 PM\" src=\"https://github.com/user-attachments/assets/52af33b0-3698-4770-90af-ff679c42ffd6\" /\u003e\n\n\n```\n{\n    \"collection_name\": \"test-collection-2\",\n    \"num_features\": 10,\n    \"wal_file_size\": 1024000000,\n    \"max_time_to_flush_ms\": 5000,\n    \"max_pending_ops\": 10\n}\n```\n\n2. Insert some data\n\n\u003cimg width=\"782\" alt=\"Screenshot 2025-03-26 at 8 24 52 PM\" src=\"https://github.com/user-attachments/assets/6d6bed7d-637d-48c6-96b2-6a512c2f848a\" /\u003e\n\n```\n{\n    \"collection_name\": \"test-collection-2\",\n    \"doc_ids\": [\n        {\n            \"high_id\": 0,\n            \"low_id\": 100\n        }\n    ],\n    \"user_ids\": [\n        {\n            \"high_id\": 0,\n            \"low_id\": 0\n        }\n    ],\n    \"vectors\": [\n        100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0\n    ]\n}\n```\n\n3. Search\n\u003cimg width=\"603\" alt=\"Screenshot 2025-03-26 at 8 25 40 PM\" src=\"https://github.com/user-attachments/assets/e01cfa34-ade0-467c-b4b5-5d9dbec65e88\" /\u003e\n\n```\n{\n    \"collection_name\": \"test-collection-2\",\n    \"ef_construction\": 200,\n    \"record_metrics\": false,\n    \"top_k\": 1,\n    \"user_ids\": [\n        {\n            \"high_id\": 0,\n            \"low_id\": 0\n        }\n    ],\n    \"vector\": [100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0]\n}\n```\n\n4. Remove\n\n\u003cimg width=\"603\" alt=\"Screenshot 2025-03-26 at 8 25 57 PM\" src=\"https://github.com/user-attachments/assets/7007eb6a-ca96-423d-b866-6ead2c5cbb22\" /\u003e\n\n\n```\n{\n    \"collection_name\": \"test-collection-2\",\n    \"doc_ids\": [\n        {\n            \"low_id\": 100,\n            \"high_id\": 0\n        }\n    ],\n    \"user_ids\": [\n        {\n            \"low_id\": 0,\n            \"high_id\": 0\n        }\n    ]\n}\n```\n\n5. Search again\nYou should see something else\n\u003cimg width=\"603\" alt=\"Screenshot 2025-03-26 at 8 26 15 PM\" src=\"https://github.com/user-attachments/assets/33ab4e14-785c-4bd9-a9a0-668cc4c554c0\" /\u003e\n\n```\n{\n    \"collection_name\": \"test-collection-2\",\n    \"ef_construction\": 200,\n    \"record_metrics\": false,\n    \"top_k\": 1,\n    \"user_ids\": [\n        {\n            \"high_id\": 0,\n            \"low_id\": 0\n        }\n    ],\n    \"vector\": [100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0]\n}\n```\n\nThis time it should give you something else\n\n## Plans\n### Phase 0 (Done)\n- [x] Query path\n  - [x] Vector similarity search\n  - [x] Hierarchical Navigable Small Worlds (HNSW)\n  - [x] Product Quantization (PQ)\n- [x] Indexing path\n  - [x] Support periodic offline indexing\n- [x] Database Management\n  - [x] Doc-sharding \u0026 query fan-out with aggregator-leaf architecture\n  - [x] In-memory \u0026 disk-based storage with mmap\n### Phase 1 (Done)\n- [x] Query \u0026 Indexing\n  - [x] Inverted File (IVF)\n  - [x] Improve locality for HNSW\n  - [x] SPANN\n### Phase 2 (Done)\n- [x] Query\n  - [x] Multiple index segments\n  - [x] L2 distance\n- [x] Index\n  - [x] Optimizing index build time\n  - [x] Elias-Fano encoding for IVF\n  - [x] Multi-user SPANN index\n### Phase 3 (Done)\n- [x] Features\n  - [x] Delete vector from collection\n- [x] Database Management\n  - [x] Segment optimizer framework\n  - [x] Write-ahead-log\n  - [x] Segments merger\n  - [x] Segments vacuum\n### Phase 4 (Ongoing)\n- [ ] Features\n  - [ ] Hybrid search\n- [ ] Database Management\n  - [ ] Optimizing deletion with bloom filter\n  - [ ] Automatic segment optimizer\n  - [ ] Cloud-native MuopDB (Kafka + S3)\n\n### Building\n\n- Install prerequisites:\n  - Rust: https://www.rust-lang.org/tools/install\n  - Make sure you're on nightly: `rustup toolchain install nightly`\n  - Libraries\n```bash\n# MacOS (using Homebrew)\nbrew install hdf5 protobuf openblas\n\n# Linux (Arch-based)\n# On Arch Linux (and its derivatives, such as EndeavourOS, CachyOS):\nsudo pacman -Syu hdf5 protobuf openblas\n\n# Linux (Debian-based)\nsudo apt-get install libhdf5-dev libprotobuf-dev libopenblas-dev\n```\n\n- Build from Source:\n```bash\ngit clone https://github.com/hicder/muopdb.git\ncd muopdb\n\n# Build\ncargo build --release\n\n# Run tests\ncargo test --release\n```\n\n## Contributions\nThis project is done with [TechCare Coaching](https://techcarecoaching.com/). I am mentoring mentees who made contributions to this project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhicder%2Fmuopdb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhicder%2Fmuopdb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhicder%2Fmuopdb/lists"}