https://github.com/unum-cloud/udisk

The fastest ACID-transactional persisted Key-Value store designed as modified LSM-Tree for NVMe block-devices with GPU-acceleration and SPDK to bypass the Linux kernel
https://github.com/unum-cloud/udisk

cuda database io-uring iouring key-value key-value-store linux linux-kernel lsm-tree spdk

Last synced: about 1 month ago
JSON representation

The fastest ACID-transactional persisted Key-Value store designed as modified LSM-Tree for NVMe block-devices with GPU-acceleration and SPDK to bypass the Linux kernel

Host: GitHub
URL: https://github.com/unum-cloud/udisk
Owner: unum-cloud
Created: 2023-01-29T14:26:14.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2023-03-31T15:46:46.000Z (about 3 years ago)
Last Synced: 2025-06-26T12:51:49.168Z (9 months ago)
Topics: cuda, database, io-uring, iouring, key-value, key-value-store, linux, linux-kernel, lsm-tree, spdk
Homepage: https://unum.cloud/ustore
Size: 51.8 KB
Stars: 74
Watchers: 6
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# UDisk

The fastest ACID-transactional persisted Key-Value store designed for NVMe block-devices with GPU-acceleration and SPDK to bypass the Linux kernel.

## Configuration Parameters

Most important parameters:

- `directory`: Where to keep the configs, schemas, data, and logs? The boot drive?
- `memory_limit`: How much RAM to use? 80% of it?
- `value_max_size`: How large the values can get?

Tuning performance?

- `directories_data`: Want to separate large data and put it on some RAID array?
- `gpu_devices`: Do you a have a GPU we can borrow for acceleration?

### Versioning

```json
"version": 1
```

### Disk IO

Path where there database would store metadata:

```json
"directory": "/etc/udisk/"
```

Paths where the database would store data, with tuples containing the path and a limit:

```json
"data_directories": [
{
"path": "/mnt/md0/ukv/",
"max_size": 1e12,
},
{
"path": "/mnt/md1/ukv/",
"max_size": 1e12,
}
]
```

Choose the underlying IO mechanism:

```json
"io_mechanism": "posix"
"io_mechanism": "pulling"
"io_mechanism": "polling"
```

For asynchronous IO mechanisms you can set the queue depth:

```json
"io_queue_depth": 4096
```

### Random Access Memory

Limit the total amount of RAM available to the database:

```json
"memory_limit": 1000000000
```

Limit the amount of memory to be used for caching.
Greatly affects the performance of single reads.
Doesn't affect the performance of batch reads.

```json
"cache_limit": 100000000
```

Limit the maximum number of entries to be collected in-memory (and in the Write Ahead Log), before flushing them into files.

```json
"write_buffer_max_elements": 100000
```

Limit the maximum combined size of entries to be collected in-memory (and in the Write Ahead Log), before flushing them into files.

```json
"write_buffer_max_bytes": 100000000
```

### Hardware Acceleration

Threads to be used for compactions, non-blocking garbage collection, and polling mechanisms.
For optimal performance with high-end NVMe SSDs we suggest reserving 2 or 3 threads per SSD.

```json
"threads": 1
```

Define, how many threads may be accessing the database concurrently:

```json
"concurrency_limit": 64
```

Optional array of integer identifiers for the GPUs to be used for background compactions:

```json
"gpu_devices": "all"
"gpu_devices": [-1]
"gpu_devices": [0, 1, 2, 3]
```

The amount of Video RAM available to UDisk on the GPU:

```json
"gpu_memory_limit": 1000000000
```

### Structure

Define, how large the biggest value can be:

```json
"value_max_size": 4096
```

In some rare cases, you may have all of your values of the same size:

```json
"value_size": 0
```

Define, how much larger every layer of the Log-Structured-Merge Tree can get:

```json
"level_enlarge_factor": 4
```

How large the top-level of the Log-Structured-Merge Tree should be?

```json
"first_level_max_bytes": 1000000000
```

### Transactions

The maximum number of entries to be updated with a single transaction:

```json
"transaction_max_elements": 100000
```

The maximum capacity of a single transaction, limiting the size of all updates in it:

```json
"transaction_max_bytes": 100000000
```

---

### Full Example

```json
{
"version": 1,
"threads": 1,
"memory_limit": 100000000000,
"gpu_devices": [-1],
"gpu_memory_limit": 1000000000,
"directory": "./tmp/",
"data_directories": [
{
"path": "/mnt/disk1/",
"max_size": 100000000000
},
{
"path": "/mnt/disk2/",
"max_size": 100000000000
}
],
"io_mechanism": "pulling",
"io_queue_depth": 4096,
"transaction_max_elements": 1000,
"transaction_max_bytes": 13421773,
"concurrency_limit": 128,
"cache_limit": 134217728,
"write_buffer_max_elements": 100000,
"write_buffer_max_bytes": 134217728,
"first_level_max_bytes": 4831838208,
"level_enlarge_factor": 4,
"value_size": 0,
"value_max_size": 4194304
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/unum-cloud/udisk

Awesome Lists containing this project

README