https://github.com/hicder/muopdb

MuopDB - A Vector Database
https://github.com/hicder/muopdb

Last synced: 6 months ago
JSON representation

MuopDB - A Vector Database

Host: GitHub
URL: https://github.com/hicder/muopdb
Owner: hicder
Created: 2024-10-08T22:10:04.000Z (12 months ago)
Default Branch: master
Last Pushed: 2025-04-10T04:02:10.000Z (6 months ago)
Last Synced: 2025-04-10T04:36:32.060Z (6 months ago)
Language: Rust
Homepage: https://github.com/hicder/muopdb/wiki
Size: 14.5 MB
Stars: 62
Watchers: 3
Forks: 6
Open Issues: 19
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-vector-databases - MuopDB - MuopDB is an open-source vector database designed for fast and scalable similarity search in AI applications. ([Read more](/details/muopdb.md)) `open-source` `vector database` `similarity search` `scalable` (Open Sources)
awesome-vector-database - MuopDB

README

          MuopDB - A vector database for AI memories

---

## Introduction

MuopDB is a vector database for machine learning. Currently, it supports:

* Index type: HNSW, IVF, SPANN, Multi-user SPANN. All on-disk with mmap.

* Quantization: product quantization

## Why MuopDB?

MuopDB supports multiple users by default. What that means is, each user will have its own vector index, within the same collection. The use-case for this is to build memory for LLMs.

Think of it as:

* Each user will have its own memory

* Each user can still search a shared knowledge base.

All users' indices will be stored in a few files, reducing operational complexity.

## Quick Start

* Build MuopDB. Refer to this [instruction](https://github.com/hicder/muopdb?tab=readme-ov-file#building).

* Prepare necessary `data` and `indices` directories. On Mac, you might want to change these directories since root directory is read-only, i.e: `~/mnt/muopdb/`.

```

mkdir -p /mnt/muopdb/indices

mkdir -p /mnt/muopdb/data

```

* Start MuopDB `index_server` with the directories we just prepared using one of these methods:

```bash

# Start server locally. This is recommended for Mac.

cd target/release

RUST_LOG=info ./index_server --node-id 0 --index-config-path /mnt/muopdb/indices --index-data-path /mnt/muopdb/data --port 9002

# Start server with Docker. Only use this option on Linux.

docker-compose up --build

```

* Now you have an up and running MuopDB `index_server`.

  * You can send gRPC requests to this server (possibly with [Postman](https://www.postman.com/)).

  * You can use Server Reflection in Postman - it will automatically detect the RPCs for MuopDB.

### Examples using Postman

1. Create collection



```

{

    "collection_name": "test-collection-2",

    "num_features": 10,

    "wal_file_size": 1024000000,

    "max_time_to_flush_ms": 5000,

    "max_pending_ops": 10

}

```

2. Insert some data



```

{

    "collection_name": "test-collection-2",

    "doc_ids": [

        {

            "high_id": 0,

            "low_id": 100

        }

    ],

    "user_ids": [

        {

            "high_id": 0,

            "low_id": 0

        }

    ],

    "vectors": [

        100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0

    ]

}

```

3. Search



```

{

    "collection_name": "test-collection-2",

    "ef_construction": 200,

    "record_metrics": false,

    "top_k": 1,

    "user_ids": [

        {

            "high_id": 0,

            "low_id": 0

        }

    ],

    "vector": [100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0]

}

```

4. Remove



```

{

    "collection_name": "test-collection-2",

    "doc_ids": [

        {

            "low_id": 100,

            "high_id": 0

        }

    ],

    "user_ids": [

        {

            "low_id": 0,

            "high_id": 0

        }

    ]

}

```

5. Search again

You should see something else



```

{

    "collection_name": "test-collection-2",

    "ef_construction": 200,

    "record_metrics": false,

    "top_k": 1,

    "user_ids": [

        {

            "high_id": 0,

            "low_id": 0

        }

    ],

    "vector": [100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0]

}

```

This time it should give you something else

## Plans

### Phase 0 (Done)

- [x] Query path

  - [x] Vector similarity search

  - [x] Hierarchical Navigable Small Worlds (HNSW)

  - [x] Product Quantization (PQ)

- [x] Indexing path

  - [x] Support periodic offline indexing

- [x] Database Management

  - [x] Doc-sharding & query fan-out with aggregator-leaf architecture

  - [x] In-memory & disk-based storage with mmap

### Phase 1 (Done)

- [x] Query & Indexing

  - [x] Inverted File (IVF)

  - [x] Improve locality for HNSW

  - [x] SPANN

### Phase 2 (Done)

- [x] Query

  - [x] Multiple index segments

  - [x] L2 distance

- [x] Index

  - [x] Optimizing index build time

  - [x] Elias-Fano encoding for IVF

  - [x] Multi-user SPANN index

### Phase 3 (Done)

- [x] Features

  - [x] Delete vector from collection

- [x] Database Management

  - [x] Segment optimizer framework

  - [x] Write-ahead-log

  - [x] Segments merger

  - [x] Segments vacuum

### Phase 4 (Ongoing)

- [ ] Features

  - [ ] Hybrid search

- [ ] Database Management

  - [ ] Optimizing deletion with bloom filter

  - [ ] Automatic segment optimizer

  - [ ] Cloud-native MuopDB (Kafka + S3)

### Building

- Install prerequisites:

  - Rust: https://www.rust-lang.org/tools/install

  - Make sure you're on nightly: `rustup toolchain install nightly`

  - Libraries

```bash

# MacOS (using Homebrew)

brew install hdf5 protobuf openblas

# Linux (Arch-based)

# On Arch Linux (and its derivatives, such as EndeavourOS, CachyOS):

sudo pacman -Syu hdf5 protobuf openblas

# Linux (Debian-based)

sudo apt-get install libhdf5-dev libprotobuf-dev libopenblas-dev

```

- Build from Source:

```bash

git clone https://github.com/hicder/muopdb.git

cd muopdb

# Build

cargo build --release

# Run tests

cargo test --release

```

## Contributions

This project is done with [TechCare Coaching](https://techcarecoaching.com/). I am mentoring mentees who made contributions to this project.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hicder/muopdb

Awesome Lists containing this project

README