https://github.com/openlake-project/openlake

High performance storage system for LLM Inference and GPU Training. Feed your GPUs at blazing fast speeds
https://github.com/openlake-project/openlake

blackwell gpt gpu high-performance llm llm-training model-serving rdma rust storage throughput

Last synced: 26 days ago
JSON representation

High performance storage system for LLM Inference and GPU Training. Feed your GPUs at blazing fast speeds

Host: GitHub
URL: https://github.com/openlake-project/openlake
Owner: openlake-project
License: apache-2.0
Created: 2026-04-27T04:07:33.000Z (2 months ago)
Default Branch: main
Last Pushed: 2026-06-13T13:29:15.000Z (27 days ago)
Last Synced: 2026-06-13T15:21:39.962Z (27 days ago)
Topics: blackwell, gpt, gpu, high-performance, llm, llm-training, model-serving, rdma, rust, storage, throughput
Language: Rust
Homepage: https://theopenlake.com
Size: 1.36 MB
Stars: 1,345
Watchers: 2
Forks: 201
Open Issues: 64
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Agents: AGENTS.md

Awesome Lists containing this project

README

OpenLake

The shortest path from NVMe to GPU memory.

Distributed object storage for GPU workloads. Built on Rust on `io_uring`, OpenLake is a state of the art storage engine delivering 6x throughput and million+ iops within 1ms.

[Discord](https://discord.gg/TNXqVSnP6x) · [Website](https://theopenlake.com) · [Comparison](https://theopenlake.com/compare.html) · [Architecture](https://github.com/openlake-project/openlake/tree/main/docs) · [Quickstart](#quickstart)

[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE)
[![Rust](https://img.shields.io/badge/rust-1.91%2B-orange.svg)](rust-toolchain.toml)
[![Discord](https://img.shields.io/badge/community-discord-5865F2?logo=discord&logoColor=white)](https://discord.gg/TNXqVSnP6x)
[![Web](https://img.shields.io/badge/web-theopenlake.com-1d4ed8.svg)](https://theopenlake.com)

---

## What is OpenLake?

OpenLake is an object store for AI infrastructure. Training and inference clusters spend a large fraction of their wall clock time moving bytes from storage into GPU memory, most object stores put the host CPU, the page cache, and several userspace copies directly in that path. OpenLake is a high throughput, low latency storage engine for high throughput GPU workloads.

- **`io_uring`, thread per core.** Built on the [`compio`](https://github.com/compio-rs/compio) completion based runtime. One runtime per core, pinned, no work stealing. The HTTP frontend and the storage engine run on the *same* thread, so a request never crosses a core boundary on the hot path.
- **No kernel involvement.** GPUDirect Storage and RDMA, data moves from peer NIC into GPU VRAM zerocopy, eliminating host memory and the page cache. see [Architecture](https://github.com/openlake-project/openlake#quickstart).
- **Erasure coded.** SIMD Reed Solomon across striped EC. Reduced storage cost for replication, high throughput without the CPU cost of conventional replication.
- **PacedRDMA.** Novel congestion control algorithm for high throughput RDMA. Credit based memory management to absorb request bursts, minimizing tail latencies. (Supporting S3 over RDMA)

_{OpenLake sustains 225 MiB/s GET at sub 10 ms p50, 3x MinIO and 9x RustFS at c=512.}

## Quickstart

### Prerequisites

Stable Rust 1.91 or newer (pinned via `rust-toolchain.toml`). Linux gives you the `io_uring` driver; macOS builds and runs against `kqueue` for development.

```sh
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup default stable
```

### Build

Clone the repo and build the workspace in release mode.

```sh
git clone openlake && cd openlake
cargo build --release --workspace
```

### Benchmark

The `openlake` CLI drives a `LocalFsBackend` directly for diagnostics and microbenchmarks. Not an S3 client, but the quickest way to confirm the build works and see local throughput.

```sh
./target/release/openlake bench --n 100000 --size 4096 --concurrency 64
```

### Start Cluster

Write one TOML file per node. The full schema lives at the top of [`crates/openlake_server/src/config.rs`](crates/openlake_server/src/config.rs).

Start `openlaked` on each host with its own config, then talk to the cluster with any S3 client.

```sh
./target/release/openlaked --config node0.toml

aws --endpoint-url http://10.0.0.10:9000 s3 mb s3://demo
aws --endpoint-url http://10.0.0.10:9000 s3 cp ./checkpoint.safetensors s3://demo/
aws --endpoint-url http://10.0.0.10:9000 s3 ls s3://demo/
```
## Contributing

We welcome and value any contributions and collaborations.
Please check out [Contributing to OpenLake](https://github.com/openlake-project/openlake/blob/main/CONTRIBUTING.md) for how to get involved.

## Contact Us

- For technical support, please reach out on [discord](https://discord.gg/TNXqVSnP6x).
- For technical issues, bugs, and feature requests, please open an issue on [GitHub](https://github.com/openlake-project/openlake/issues).
- For everything else, visit the [website](https://theopenlake.com) or reach out to the maintainers on discord.

## License

[Apache License 2.0](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/openlake-project/openlake

Awesome Lists containing this project

README

The shortest path from NVMe to GPU memory.