{"id":31026116,"url":"https://github.com/tristanbilot/mlx-benchmark","last_synced_at":"2025-09-13T17:58:58.473Z","repository":{"id":214859172,"uuid":"734280040","full_name":"TristanBilot/mlx-benchmark","owner":"TristanBilot","description":"Benchmark of Apple MLX operations on all Apple Silicon chips (GPU, CPU) + MPS and CUDA.","archived":false,"fork":false,"pushed_at":"2025-06-06T07:59:38.000Z","size":1465,"stargazers_count":189,"open_issues_count":1,"forks_count":27,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-07-18T08:30:58.929Z","etag":null,"topics":["apple-silicon","benchmark","deep-learning","machine-learning","mlx","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TristanBilot.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-12-21T09:46:01.000Z","updated_at":"2025-07-06T14:56:54.000Z","dependencies_parsed_at":"2024-11-12T15:18:19.939Z","dependency_job_id":"ad56c685-1224-45d0-b30a-a7f14cd23a50","html_url":"https://github.com/TristanBilot/mlx-benchmark","commit_stats":null,"previous_names":["tristanbilot/mlx-benchmark"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/TristanBilot/mlx-benchmark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TristanBilot%2Fmlx-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TristanBilot%2Fmlx-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TristanBilot%2Fmlx-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TristanBilot%2Fmlx-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TristanBilot","download_url":"https://codeload.github.com/TristanBilot/mlx-benchmark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TristanBilot%2Fmlx-benchmark/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275004531,"owners_count":25389192,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-13T02:00:10.085Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apple-silicon","benchmark","deep-learning","machine-learning","mlx","pytorch"],"created_at":"2025-09-13T17:58:32.817Z","updated_at":"2025-09-13T17:58:58.465Z","avatar_url":"https://github.com/TristanBilot.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ⚡️ mlx-benchmark ⚡️\n### A comprehensive benchmark of MLX ops.\n\nThis repo aims to benchmark Apple's MLX operations and layers, on all Apple Silicon chips, along with some GPUs.\n\n**Contributions:** Everyone can contribute to the benchmark! If you have a missing device or if you want to add a missing layer/operation, please read the [contribution guidelines](CONTRIBUTING.md).\n\nCurrent M chips: `M1`, `M1 Pro`, `M1 Max`, `M2`, `M2 Pro`, `M2 Max`, `M2 Ultra`, `M3`, `M3 Pro`, `M3 Max`, `M3 Ultra`, `M4`, `M4 Pro`, `M4 Max`.\n\nCurrent CUDA GPUs: `RTX4090`, `Tesla V100`, `A100`.\n\nMissing devices: `M1 Ultra`, and `other CUDA GPUs`.\n\n\u003e [!NOTE]\n\u003e You can submit your benchmark even for a device that is already listed, provided you use a newer version of MLX. Simply submit a PR by overriding the old benchmark table. Also, most of the existing benchmarks do not include the `mx.compile` feature, which has been recently added to mlx-benchmark.\n\n## Benchmarks 🧪\n\nBenchmarks are generated by measuring the runtime of every `mlx` operations on GPU and CPU, along with their equivalent in pytorch with `mps`, `cpu` and `cuda` backends. On MLX with GPU, the operations compiled with `mx.compile` are included in the benchmark by default. To not benchmark the compiled functions, set `--compile=False`. \n\nFor each operation, we measure the runtime of multiple experiments. We propose 2 benchmarks based on these experiments:\n\n* [Detailed benchmark](benchmarks/detailed_benchmark.md): provides the runtime of each experiment. \n* [Average runtime benchmark](benchmarks/average_benchmark.md): computes the mean of experiments. Easier to navigate, with fewer details.\n\n\n## Installation 💻\n\n\n### Installation on Mac devices\n\nRunning the benchmark locally is straightforward. Create a new env with `osx-arm64` architecture and install the dependencies.\n\n```shell\nCONDA_SUBDIR=osx-arm64 conda create -n mlx_benchmark python=3.10 numpy pytorch torchvision scipy requests -c conda-forge\n\npip install -r requirements.txt\n```\n\n\n### Installation on other devices\nOther operating systems than macOS can only run the torch experiments, on CPU or with a CUDA device. Install a new env without the `CONDA_SUBDIR=osx-arm64` prefix and install the torch package that matches your CUDA version. Then install all the requirements within `requirements.txt`, except `mlx`.\n\nFinally, open the `config.py` file and set:\n```\nUSE_MLX = False\n```\nto avoid importing the mlx package, which cannot be installed on non-Mac devices.\n\n## Run the benchmark 🧑‍💻\n\n### Run on Mac\n\nTo run the benchmark on mps, mlx and CPU:\n\n```shell\npython run_benchmark.py --include_mps=True --include_mlx_gpu=True --include_mlx_cpu=True --include_cpu=True\n```\n\n### Run on other devices\n\nTo run the torch benchmark on CUDA and CPU:\n\n```shell\npython run_benchmark.py --include_mps=False --include_mlx_gpu=False --include_mlx_cpu=False --include_cuda=True --include_cpu=True\n```\n\n### Run only compiled functions\n\nIf you're interested in benchmarking only operations against operations compiled with `mx.compile`, you can run:\n\n```shell\npython run_benchmark.py --include_mps=False --include_cpu=False --include_mlx_cpu=False\n```\n\n## Contributing 🚀\n\nIf you have a device not yet featured in the benchmark, especially the ones listed below, your PR is welcome to broaden the scope and accuracy of this project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftristanbilot%2Fmlx-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftristanbilot%2Fmlx-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftristanbilot%2Fmlx-benchmark/lists"}