Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/TristanBilot/mlx-benchmark
Benchmark of Apple MLX operations on all Apple Silicon chips (GPU, CPU) + MPS and CUDA.
https://github.com/TristanBilot/mlx-benchmark
apple-silicon benchmark deep-learning machine-learning mlx pytorch
Last synced: about 2 months ago
JSON representation
Benchmark of Apple MLX operations on all Apple Silicon chips (GPU, CPU) + MPS and CUDA.
- Host: GitHub
- URL: https://github.com/TristanBilot/mlx-benchmark
- Owner: TristanBilot
- License: mit
- Created: 2023-12-21T09:46:01.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-12T14:05:59.000Z (2 months ago)
- Last Synced: 2024-11-12T15:18:30.196Z (2 months ago)
- Topics: apple-silicon, benchmark, deep-learning, machine-learning, mlx, pytorch
- Language: Python
- Homepage:
- Size: 1.27 MB
- Stars: 123
- Watchers: 7
- Forks: 23
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# โก๏ธ mlx-benchmark โก๏ธ
### A comprehensive benchmark of MLX ops.This repo aims to benchmark Apple's MLX operations and layers, on all Apple Silicon chips, along with some GPUs.
**Contributions:** Everyone can contribute to the benchmark! If you have a missing device or if you want to add a missing layer/operation, please read the [contribution guidelines](CONTRIBUTING.md).
Current M chips: `M1`, `M1 Pro`, `M1 Max`, `M2`, `M2 Pro`, `M2 Max`, `M2 Ultra`, `M3`, `M3 Pro`, `M3 Max`.
Current CUDA GPUs: `RTX4090`, `Tesla V100`.
Missing devices: `M1 Ultra`, and `other CUDA GPUs`.
> [!NOTE]
> You can submit your benchmark even for a device that is already listed, provided you use a newer version of MLX. Simply submit a PR by overriding the old benchmark table. Also, most of the existing benchmarks do not include the `mx.compile` feature, which has been recently added to mlx-benchmark.## Benchmarks ๐งช
Benchmarks are generated by measuring the runtime of every `mlx` operations on GPU and CPU, along with their equivalent in pytorch with `mps`, `cpu` and `cuda` backends. On MLX with GPU, the operations compiled with `mx.compile` are included in the benchmark by default. To not benchmark the compiled functions, set `--compile=False`.
For each operation, we measure the runtime of multiple experiments. We propose 2 benchmarks based on these experiments:
* [Detailed benchmark](benchmarks/detailed_benchmark.md): provides the runtime of each experiment.
* [Average runtime benchmark](benchmarks/average_benchmark.md): computes the mean of experiments. Easier to navigate, with fewer details.## Installation ๐ป
### Installation on Mac devices
Running the benchmark locally is straightforward. Create a new env with `osx-arm64` architecture and install the dependencies.
```shell
CONDA_SUBDIR=osx-arm64 conda create -n mlx_benchmark python=3.10 numpy pytorch torchvision scipy requests -c conda-forgepip install -r requirements.txt
```### Installation on other devices
Other operating systems than macOS can only run the torch experiments, on CPU or with a CUDA device. Install a new env without the `CONDA_SUBDIR=osx-arm64` prefix and install the torch package that matches your CUDA version. Then install all the requirements within `requirements.txt`, except `mlx`.Finally, open the `config.py` file and set:
```
USE_MLX = False
```
to avoid importing the mlx package, which cannot be installed on non-Mac devices.## Run the benchmark ๐งโ๐ป
### Run on Mac
To run the benchmark on mps, mlx and CPU:
```shell
python run_benchmark.py --include_mps=True --include_mlx_gpu=True --include_mlx_cpu=True --include_cpu=True
```### Run on other devices
To run the torch benchmark on CUDA and CPU:
```shell
python run_benchmark.py --include_mps=False --include_mlx_gpu=False --include_mlx_cpu=False --include_cuda=True --include_cpu=True
```### Run only compiled functions
If you're interested in benchmarking only operations against operations compiled with `mx.compile`, you can run:
```shell
python run_benchmark.py --include_mps=False --include_cpu=False --include_mlx_cpu=False
```## Contributing ๐
If you have a device not yet featured in the benchmark, especially the ones listed below, your PR is welcome to broaden the scope and accuracy of this project.