https://github.com/zhuzilin/pytorch-malloc
An external memory allocator example for PyTorch.
https://github.com/zhuzilin/pytorch-malloc
memory-allocator pytorch
Last synced: 25 days ago
JSON representation
An external memory allocator example for PyTorch.
- Host: GitHub
- URL: https://github.com/zhuzilin/pytorch-malloc
- Owner: zhuzilin
- License: mit
- Created: 2021-10-30T17:01:43.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-11-02T07:59:47.000Z (over 3 years ago)
- Last Synced: 2025-04-03T03:22:52.015Z (2 months ago)
- Topics: memory-allocator, pytorch
- Language: C++
- Homepage:
- Size: 21.5 KB
- Stars: 14
- Watchers: 4
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Custom PyTorch Memory Management
This is an external memory allocator example for [PyTorch](https://github.com/pytorch/pytorch). The underlying memory allocator is [CNMeM](https://github.com/NVIDIA/cnmem).
## Usage
Compile with `nvcc`:
```bash
cd pytorch_malloc
make
```Note that we need `--cudart=none` to prevent linking the static libcudart.so.
For more information about the nvcc flags: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html
To make pytorch allocate without the inherit caching mechanism, run with `PYTORCH_NO_CUDA_MEMORY_CACHING`:
```bash
LD_PRELOAD=./libcudart.so PYTORCH_NO_CUDA_MEMORY_CACHING=1 python3 your_model.py
```## Profile
Use the `profiler` branch to profile the memory usage of your model:
```bash
git checkout profiler
make
```Run the example script with:
```bash
> LD_PRELOAD=./libcudart.so PYTORCH_NO_CUDA_MEMORY_CACHING=1 python3 torch_example.py
start allocate 0
[Allocator] create allocator
[Allocator] free mem: 33094893568 B, total mem: 34089730048 B.
[Allocator] malloc(139996541485056): 64 B, time: 357 us.
end allocate 0
start allocate 1
[Allocator] malloc(139996541485568): 64 B, time: 639 us.
[Allocator] free(139996541485056): 64 B, time: 699 us.
end allocate 1
start allocate 2
[Allocator] malloc(139996541485056): 64 B, time: 754 us.
[Allocator] free(139996541485568): 64 B, time: 781 us.
end allocate 2
[Allocator] free(139996541485056): 64 B, time: 14273 us
```