https://github.com/hpcaitech/tensornvme
A Python library transfers PyTorch tensors between CPU and NVMe
https://github.com/hpcaitech/tensornvme
colossal-ai deep-learning nvme pytorch
Last synced: 2 months ago
JSON representation
A Python library transfers PyTorch tensors between CPU and NVMe
- Host: GitHub
- URL: https://github.com/hpcaitech/tensornvme
- Owner: hpcaitech
- Created: 2022-07-01T08:28:52.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-11-27T10:19:54.000Z (10 months ago)
- Last Synced: 2025-05-24T04:07:29.174Z (4 months ago)
- Topics: colossal-ai, deep-learning, nvme, pytorch
- Language: C++
- Homepage:
- Size: 298 KB
- Stars: 115
- Watchers: 5
- Forks: 25
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# TensorNVME
A Python Library provides APIs to move PyTorch Tensors between CPU and NVMe.
## Dependencies
- [liburing](https://github.com/axboe/liburing)
- [libaio](https://pagure.io/libaio)## Install
This package is only supported on Linux. `liburing` and `libaio` can be automatically installed. `liburing` is supported on Linux >= `5.10`, and it won't be installed if the version of your Linux < `5.10`.
It will search `libaio` and `liburing` in `/usr/lib`, `/usr/lib64` and `$LD_LIBRARY_PATH`. If not found, backends will be installed in `~/.tensornvme`, and `~/.bashrc` will be modified to set `$LD_LIBRARY_PATH` correctly. **Please `source ~/.bashrc` after installation.** If you use other shells, please make sure `$LD_LIBRARY_PATH` is set correctly.
> You must install pytorch and cmake before installing tensornvme. Once you upgrade pytorch, remember to reinstall tensornvme.
### From source
```shell
git clone https://github.com/hpcaitech/TensorNVMe.git && cd TensorNVMe
```First, install requirements:
```shell
pip install -r requirements.txt
```To install `tensornvme` with `liburing` and `libaio`:
```shell
pip install -v --no-cache-dir .
```To install `tensornvme` with only `liburing`:
```shell
DISABLE_AIO=1 pip install -v --no-cache-dir .
```To install `tensornvme` with only `libaio`:
```shell
DISABLE_URING=1 pip install -v --no-cache-dir .
```If you want to install `libaio` or `liburing` for system:
```shell
WITH_ROOT=1 sudo pip install -v --no-cache-dir .
```Then they will be installed in `/usr` and `~/.bashrc` will not be modified. Make sure you have root access.
### From PIP
```shell
pip install packaging
pip install tensornvme
```All acceptable environment variables are the same as those when installing from source.
## Use docker
```shell
git clone https://github.com/hpcaitech/TensorNVMe.git && cd TensorNVMe/docker && docker build -t tensornvme .
```## CLI
We provide a CLI to test whether backends work well.
```shell
tensornvme check
```## Usage
It provide both synchronize and asynchronize I/O API.
> Only CPU and contiguous tensors can be offloaded.
Synchronize API:
```python
import torch
from tensornvme import DiskOffloaderx = torch.rand(2, 2)
y = torch.rand(4, 4, 4)
offloader = DiskOffloader('./offload')
offloader.sync_write(x)
# x is saved to a file on disk (in ./offload folder) and the memory of x is freed
offloader.sync_read(x)
# x is restored
offloader.sync_writev([x, y])
# x and y are offloaded
offloader.sync_readv([x, y])
# x and y are restored.
# sync_writev() and sync_readv() are order sensitive
# E.g. sync_writev([x, y]) and sync_writev([y, x]) are different
```Asynchronize API:
```python
import torch
from tensornvme import DiskOffloaderx = torch.rand(2, 2)
y = torch.rand(4, 4, 4)
offloader = DiskOffloader('./offload')
offloader.async_write(x)
# x is being offloaded in the background
offloader.sync_write_events()
# x is offloaded and the memory of x is freed
offloader.async_read(x)
# x is being restored in the background
offloader.sync_read_events()
# x is restored
offloader.async_writev([x, y])
# x and y are being offloaded in the background
offloader.synchronize()
# synchronize() will synchronize both write and read events.
offloader.async_readv([x, y])
offloader.synchronize()
# x and y are restored.
# async_writev() and async_readv() are also order sensitive
```You can use asynchronize API to overlap computation and data moving.
```python
tensors = []for _ in range(10):
tensor = torch.rand(2, 2)
tensors.append(tensor)
offloader.sync_write(tensor)offloader.sync_read(tensors[0])
# prefetch=1, writing tensor[i] and reading tensor[i+1]
for i, tensor in enumerate(tensors):
offloader.sync_read_events()
if i + 1 < len(tensors):
offloader.async_read(tensors[i+1])
tensor.mul_(2.0) # compute
offloader.sync_write_events()
offloader.async_write(tensor)
offloader.synchronize()
```## How to test
We have C++ test scrpits for `AsyncIO` and `SpaceManager` class. Make sure you have installed `liburing` and `libaio`, and set environment variables correctly before testing. To run the tests:
```shell
mkdir build
cd build
cmake ..
make
./test_asyncio
./test_space_mgr
```We also have python unit tests. Make sure you have installed `pytest`. To run:
```shell
pytest ./tests
```## How to benchmark
We have benchmarks for `Adam` and `CpuAdam` with different backend and prefetch depth to validate TensorNVME's speed. To run the benchmark:
```shell
cd benchmark
python benchmark_adam.py
python benchmark_cpuadam.py
```