https://github.com/TristanBilot/mlx-benchmark

Benchmark of Apple MLX operations on all Apple Silicon chips (GPU, CPU) + MPS and CUDA.
https://github.com/TristanBilot/mlx-benchmark

apple-silicon benchmark deep-learning machine-learning mlx pytorch

Last synced: 4 months ago
JSON representation

Benchmark of Apple MLX operations on all Apple Silicon chips (GPU, CPU) + MPS and CUDA.

Host: GitHub
URL: https://github.com/TristanBilot/mlx-benchmark
Owner: TristanBilot
License: mit
Created: 2023-12-21T09:46:01.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-04-09T10:54:15.000Z (7 months ago)
Last Synced: 2025-04-09T11:44:49.663Z (7 months ago)
Topics: apple-silicon, benchmark, deep-learning, machine-learning, mlx, pytorch
Language: Python
Homepage:
Size: 1.27 MB
Stars: 168
Watchers: 7
Forks: 26
Open Issues: 2
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

# ⚡️ mlx-benchmark ⚡️
### A comprehensive benchmark of MLX ops.

This repo aims to benchmark Apple's MLX operations and layers, on all Apple Silicon chips, along with some GPUs.

**Contributions:** Everyone can contribute to the benchmark! If you have a missing device or if you want to add a missing layer/operation, please read the [contribution guidelines](CONTRIBUTING.md).

Current M chips: `M1`, `M1 Pro`, `M1 Max`, `M2`, `M2 Pro`, `M2 Max`, `M2 Ultra`, `M3`, `M3 Pro`, `M3 Max`, `M3 Ultra`, `M4`, `M4 Pro`, `M4 Max`.

Current CUDA GPUs: `RTX4090`, `Tesla V100`, `A100`.

Missing devices: `M1 Ultra`, and `other CUDA GPUs`.

> [!NOTE]
> You can submit your benchmark even for a device that is already listed, provided you use a newer version of MLX. Simply submit a PR by overriding the old benchmark table. Also, most of the existing benchmarks do not include the `mx.compile` feature, which has been recently added to mlx-benchmark.

## Benchmarks 🧪

Benchmarks are generated by measuring the runtime of every `mlx` operations on GPU and CPU, along with their equivalent in pytorch with `mps`, `cpu` and `cuda` backends. On MLX with GPU, the operations compiled with `mx.compile` are included in the benchmark by default. To not benchmark the compiled functions, set `--compile=False`.

For each operation, we measure the runtime of multiple experiments. We propose 2 benchmarks based on these experiments:

* [Detailed benchmark](benchmarks/detailed_benchmark.md): provides the runtime of each experiment.
* [Average runtime benchmark](benchmarks/average_benchmark.md): computes the mean of experiments. Easier to navigate, with fewer details.

## Installation 💻

### Installation on Mac devices

Running the benchmark locally is straightforward. Create a new env with `osx-arm64` architecture and install the dependencies.

```shell
CONDA_SUBDIR=osx-arm64 conda create -n mlx_benchmark python=3.10 numpy pytorch torchvision scipy requests -c conda-forge

pip install -r requirements.txt
```

### Installation on other devices
Other operating systems than macOS can only run the torch experiments, on CPU or with a CUDA device. Install a new env without the `CONDA_SUBDIR=osx-arm64` prefix and install the torch package that matches your CUDA version. Then install all the requirements within `requirements.txt`, except `mlx`.

Finally, open the `config.py` file and set:
```
USE_MLX = False
```
to avoid importing the mlx package, which cannot be installed on non-Mac devices.

## Run the benchmark 🧑‍💻

### Run on Mac

To run the benchmark on mps, mlx and CPU:

```shell
python run_benchmark.py --include_mps=True --include_mlx_gpu=True --include_mlx_cpu=True --include_cpu=True
```

### Run on other devices

To run the torch benchmark on CUDA and CPU:

```shell
python run_benchmark.py --include_mps=False --include_mlx_gpu=False --include_mlx_cpu=False --include_cuda=True --include_cpu=True
```

### Run only compiled functions

If you're interested in benchmarking only operations against operations compiled with `mx.compile`, you can run:

```shell
python run_benchmark.py --include_mps=False --include_cpu=False --include_mlx_cpu=False
```

## Contributing 🚀

If you have a device not yet featured in the benchmark, especially the ones listed below, your PR is welcome to broaden the scope and accuracy of this project.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/TristanBilot/mlx-benchmark

Awesome Lists containing this project

README