{"id":23887094,"url":"https://github.com/saifhaq/alma","last_synced_at":"2025-04-10T03:25:53.582Z","repository":{"id":269062896,"uuid":"830223922","full_name":"saifhaq/alma","owner":"saifhaq","description":null,"archived":false,"fork":false,"pushed_at":"2025-04-08T01:02:27.000Z","size":22735,"stargazers_count":19,"open_issues_count":17,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-08T02:21:38.769Z","etag":null,"topics":["benchmarking","computer-vision","gpu","inference","ml","python","pytorch","quantization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/saifhaq.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-17T21:07:38.000Z","updated_at":"2025-03-16T13:42:07.000Z","dependencies_parsed_at":"2024-12-20T17:07:11.874Z","dependency_job_id":"4f27571a-0075-4925-ba2d-587ddf71c42b","html_url":"https://github.com/saifhaq/alma","commit_stats":null,"previous_names":["saifhaq/alma"],"tags_count":56,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saifhaq%2Falma","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saifhaq%2Falma/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saifhaq%2Falma/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saifhaq%2Falma/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/saifhaq","download_url":"https://codeload.github.com/saifhaq/alma/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247767230,"owners_count":20992539,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmarking","computer-vision","gpu","inference","ml","python","pytorch","quantization"],"created_at":"2025-01-04T07:29:33.116Z","updated_at":"2025-04-10T03:25:53.570Z","avatar_url":"https://github.com/saifhaq.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# alma\n\u003cp align=\"center\"\u003e\u003cimg width=\"900\" src=\"assets/alma-logo-banner.jpg\" alt=\"Alma Logo\"\u003e\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  A Python library for benchmarking PyTorch model speed for different conversion options 🚀\n\u003c/p\u003e\n\u003ch2 align=\"center\"\u003e\n  \u003ca href=\"https://opensource.org/licenses/MIT\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/License-MIT-yellow.svg\" alt=\"license\" style=\"height: 20px;\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://discord.gg/RASFKzqgfZ\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/discord-7289da.svg?style=flat-square\u0026logo=discord\" alt=\"discord\" style=\"height: 20px;\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pepy.tech/project/alma-torch\"\u003e\n    \u003cimg src=\"https://pepy.tech/badge/alma-torch?style=flat\" alt=\"Downloads\" style=\"height: 20px;\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pepy.tech/project/alma-torch\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/dm/alma-torch\" alt=\"monthly downloads\" style=\"height: 20px;\"\u003e\n  \u003c/a\u003e\n\u003c/h2\u003e\n\nWith just one function call, you can get a full report on how fast your PyTorch model runs for inference across over 90 conversion options, such as\nJIT tracing, torch.compile, torch.export, torchao, ONNX, OpenVINO, TensorRT, and many more!\n\nThis allows one to find the best option for one's model, data, and hardware. See \n[here](#conversion-options) for all supported options.\n\n## Table of Contents\n\n- [Getting Started](#getting-started)\n  - [Installation](#installation)\n  - [Docker](#docker)\n- [Basic Usage](#basic-usage)\n- [Documentation](#documentation)\n- [Conversion Options](#conversion-options)\n- [Future Work](#future-work)\n- [How to Contribute](#how-to-contribute)\n\n\n## Getting Started\n\n### Installation\n`alma` is available as a Python package.\n\nOne can install the package from python package index by running \n```bash\npip install alma-torch\n```\n\nAlternatively, it can be installed from the root of this \n[repository](https://github.com/saifhaq/alma) by running:\n\n```bash\npip install -e .\n```\n\n### Docker\nWe recommend that you build the provided Dockerfile to ensure an easy installation of all of the \nsystem dependencies and the alma pip packages. \n\n\u003cdetails\u003e\n\u003csummary\u003eWorking with the docker image\u003c/summary\u003e\n\u003cbr\u003e\n\n1. **Build the Docker Image**  \n   ```bash\n   bash scripts/build_docker.sh\n   ```\n\n2. **Run the Docker Container**  \n   Create and start a container named `alma`:  \n   ```bash\n   bash scripts/run_docker.sh\n   ```\n\n3. **Access the Running Container**  \n   Enter the container's shell:  \n   ```bash\n   docker exec -it alma bash\n   ```\n\n4. **Mount Your Repository**  \n   By default, the `run_docker.sh` script mounts your `/home` directory to `/home` inside the container.  \n   If your `alma` repository is in a different location, update the bind mount, for example:  \n   ```bash\n   -v /Users/myuser/alma:/home/alma\n   ```\n\u003c/details\u003e\n\n\n## Basic usage\nThe core API is `benchmark_model`, which is used to benchmark the speed of a model for different\nconversion options. The usage is as follows:\n\n```python\nfrom alma import benchmark_model\nfrom alma.benchmark import BenchmarkConfig\nfrom alma.benchmark.log import display_all_results\n\ndevice = torch.device(\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n\n# Load the model\nmodel = ...\n\n# Load the dataloader used in benchmarking\ndata_loader = ...\n\n# Set the configuration (this can also be passed in as a dict)\nconfig = BenchmarkConfig(\n    n_samples=2048,\n    batch_size=64,\n    device=device,  # The device to run the model on\n)\n\n# Choose with conversions to benchmark\nconversions = [\"EAGER\", \"TORCH_SCRIPT\", \"COMPILE_INDUCTOR_MAX_AUTOTUNE\", \"COMPILE_OPENXLA\"]\n\n# Benchmark the model\nresults = benchmark_model(model, config, conversions, data_loader=data_loader)\n\n# Print all results\ndisplay_all_results(results)\n```\n\nThe results will look like this, depending on one's model, dataloader, and hardware.\n\n```bash\nEAGER results:\nDevice: cuda\nTotal elapsed time: 0.0206 seconds\nTotal inference time (model only): 0.0074 seconds\nTotal samples: 2048 - Batch size: 64\nThroughput: 275643.45 samples/second\n\nTORCH_SCRIPT results:\nDevice: cuda\nTotal elapsed time: 0.0203 seconds\nTotal inference time (model only): 0.0043 seconds\nTotal samples: 2048 - Batch size: 64\nThroughput: 477575.34 samples/second\n\nCOMPILE_INDUCTOR_MAX_AUTOTUNE results:\nDevice: cuda\nTotal elapsed time: 0.0159 seconds\nTotal inference time (model only): 0.0035 seconds\nTotal samples: 2048 - Batch size: 64\nThroughput: 592801.70 samples/second\n\nCOMPILE_OPENXLA results:\nDevice: xla:0\nTotal elapsed time: 0.0146 seconds\nTotal inference time (model only): 0.0033 seconds\nTotal samples: 2048 - Batch size: 64\nThroughput: 611865.07 samples/second\n```\n\n## Documentation\nSee the [examples](./examples) for discussion of design choices and for examples of more advanced usage, e.g. controlling the \nmultiprocessing setup, controlling graceful failures, setting default device fallbacks if a conversion\noption is incompatible with your specified device, memory efficient usage of `alma`, etc.\n\n### Video\nSee below for a YouTube video going over `alma` as well as a discussion of the documentation for \nadvanced usages.\n\n[YouTube link.](https://www.youtube.com/watch?v=SV2LaqFv9HA)\n\n## Conversion Options\n\n### Naming conventions\n\nThe naming convention for conversion options is as follows:\n- Short but descriptive names for each technique, e.g. `EAGER`, `EXPORT`, etc.\n- Underscores `_` are used within each technique name to seperate the words for readability, \ne.g. `AOT_INDUCTOR`, `COMPILE_CUDAGRAPHS`, etc.\n- If multiple \"techniques\" are used in a conversion option, then the names are separated by a `+` sign in chronological order of operation. \n    For example, `EXPORT+EAGER`, `EXPORT+COMPILE_INDUCTOR_MAX_AUTOTUNE`. In both cases, \n    `EXPORT` is the first operation, followed by `EAGER` or `COMPILE_INDUCTOR_MAX_AUTOTUNE`.\n\n### Conversion Options Summary\nBelow is a table summarizing the currently supported conversion options and their identifiers:\n\n  | ID  | Conversion Option                                 | Device Support | Project |\n  |-----|---------------------------------------------------|----------------|---------|\n  | 0   |  EAGER                                            | CPU, MPS, GPU  | PyTorch   |\n  | 1   |  EXPORT+EAGER                                     | CPU, MPS, GPU  | torch.export |\n  | 2   |  ONNX_CPU                                         | CPU            | ONNXRT      |\n  | 3   |  ONNX_GPU                                         | GPU            | ONNXRT      |\n  | 4   |  ONNX+DYNAMO_EXPORT                               | CPU            | ONNXRT      |\n  | 5   |  COMPILE_CUDAGRAPHS                               | GPU (CUDA)     | torch.compile |\n  | 6   |  COMPILE_INDUCTOR_DEFAULT                         | CPU, MPS, GPU  | torch.compile |\n  | 7   |  COMPILE_INDUCTOR_REDUCE_OVERHEAD                 | CPU, MPS, GPU  | torch.compile |\n  | 8   |  COMPILE_INDUCTOR_MAX_AUTOTUNE                    | CPU, MPS, GPU  | torch.compile |\n  | 9   |  COMPILE_INDUCTOR_EAGER_FALLBACK                  | CPU, MPS, GPU  | torch.compile |\n  | 10  |  COMPILE_ONNXRT                                   | CPU, MPS, GPU  | torch.compile + ONNXRT |\n  | 11  |  COMPILE_OPENXLA                                  | XLA_GPU        | torch.compile + OpenXLA |\n  | 12  |  COMPILE_TVM                                      | CPU, MPS, GPU  | torch.compile + Apache TVM |\n  | 13  |  EXPORT+AI8WI8_FLOAT_QUANTIZED                    | CPU, MPS, GPU  | torch.export |\n  | 14  |  EXPORT+AI8WI8_FLOAT_QUANTIZED+RUN_DECOMPOSITION  | CPU, MPS, GPU  | torch.export |\n  | 15  |  EXPORT+AI8WI8_STATIC_QUANTIZED                   | CPU, MPS, GPU  | torch.export |\n  | 16  |  EXPORT+AI8WI8_STATIC_QUANTIZED+RUN_DECOMPOSITION | CPU, MPS, GPU  | torch.export |\n  | 17  |  EXPORT+AOT_INDUCTOR                              | CPU, MPS, GPU  | torch.export + aot_inductor |\n  | 18  |  EXPORT+COMPILE_CUDAGRAPHS                        | GPU (CUDA)     | torch.export + torch.compile |\n  | 19  |  EXPORT+COMPILE_INDUCTOR_DEFAULT                  | CPU, MPS, GPU  | torch.export + torch.compile |\n  | 20  |  EXPORT+COMPILE_INDUCTOR_REDUCE_OVERHEAD          | CPU, MPS, GPU  | torch.export + torch.compile |\n  | 21  |  EXPORT+COMPILE_INDUCTOR_MAX_AUTOTUNE             | CPU, MPS, GPU  | torch.export + torch.compile |\n  | 22  |  EXPORT+COMPILE_INDUCTOR_DEFAULT_EAGER_FALLBACK   | CPU, MPS, GPU  | torch.export + torch.compile |\n  | 23  |  EXPORT+COMPILE_ONNXRT                            | CPU, MPS, GPU  | torch.export + torch.compile + ONNXRT |\n  | 24  |  EXPORT+COMPILE_OPENXLA                           | XLA_GPU        | torch.export + torch.compile + OpenXLA |\n  | 25  |  EXPORT+COMPILE_TVM                               | CPU, MPS, GPU  | torch.export + torch.compile + Apache TVM |\n  | 26  |  NATIVE_CONVERT_AI8WI8_STATIC_QUANTIZED           | CPU            | CPU (PyTorch) |\n  | 27  |  NATIVE_FAKE_QUANTIZED_AI8WI8_STATIC              | CPU, GPU       | CPU (PyTorch) |\n  | 28  |  COMPILE_TENSORRT                                 | GPU (CUDA)     | torch.compile + NVIDIA TensorRT |\n  | 29  |  EXPORT+COMPILE_TENSORRT                          | GPU (CUDA)     | torch.export + torch.compile + NVIDIA TensorRT |\n  | 30  |  COMPILE_OPENVINO                                 | CPU (Intel)    | torch.compile + OpenVINO  |\n  | 31  |  JIT_TRACE                                        | CPU, MPS, GPU  | PyTorch   |\n  | 32  |  TORCH_SCRIPT                                     | CPU, MPS, GPU  | PyTorch   |\n  | 33  |  OPTIMUM_QUANTO_AI8WI8                            | CPU, MPS, GPU  | optimum quanto |\n  | 34  |  OPTIMUM_QUANTO_AI8WI4                            | CPU, MPS, GPU (not all GPUs supported) | optimum quanto |\n  | 35  |  OPTIMUM_QUANTO_AI8WI2                            | CPU, MPS, GPU (not all GPUs supported) | optimum quanto |\n  | 36  |  OPTIMUM_QUANTO_WI8                               | CPU, MPS, GPU  | optimum quanto |\n  | 37  |  OPTIMUM_QUANTO_WI4                               | CPU, MPS, GPU (not all GPUs supported) | optimum quanto |\n  | 38  |  OPTIMUM_QUANTO_WI2                               | CPU, MPS, GPU (not all GPUs supported) | optimum quanto |\n  | 39  |  OPTIMUM_QUANTO_Wf8E4M3N                          | CPU, MPS, GPU  | optimum quanto |\n  | 40  |  OPTIMUM_QUANTO_Wf8E4M3NUZ                        | CPU, MPS, GPU  | optimum quanto |\n  | 41  |  OPTIMUM_QUANTO_Wf8E5M2                           | CPU, MPS, GPU  | optimum quanto |\n  | 42  |  OPTIMUM_QUANTO_Wf8E5M2+COMPILE_CUDAGRAPHS        | GPU (CUDA)     | optimum quanto + torch.compile |\n  | 43  |  FP16+EAGER                                       | CPU, MPS, GPU  | PyTorch   |\n  | 44  |  BF16+EAGER                                       | CPU, MPS, GPU (not all GPUs natively supported)  | PyTorch   |\n  | 45  |  COMPILE_INDUCTOR_MAX_AUTOTUNE+\u003cbr\u003eTORCHAO_AUTOQUANT_DEFAULT    | GPU  | torch.compile + torchao |\n  | 46  |  COMPILE_INDUCTOR_MAX_AUTOTUNE+\u003cbr\u003eTORCHAO_AUTOQUANT_NONDEFAULT | GPU  | torch.compile + torchao |\n  | 47  |  COMPILE_CUDAGRAPHS+\u003cbr\u003eTORCHAO_AUTOQUANT_DEFAULT               | GPU (CUDA) | torch.compile + torchao |\n  | 48  |  COMPILE_INDUCTOR_MAX_AUTOTUNE+\u003cbr\u003eTORCHAO_QUANT_I4_WEIGHT_ONLY | GPU (requires bf16 support)  | torch.compile + torchao |\n  | 49  |  TORCHAO_QUANT_I4_WEIGHT_ONLY                                  | GPU (requires bf16 support) | torchao |\n  | 50  |  FP16+COMPILE_CUDAGRAPHS                                       | GPU (CUDA) | PyTorch + torch.compile |\n  | 51  |  FP16+COMPILE_INDUCTOR_DEFAULT                                 | CPU, MPS, GPU | PyTorch + torch.compile |\n  | 52  |  FP16+COMPILE_INDUCTOR_REDUCE_OVERHEAD                         | CPU, MPS, GPU | PyTorch + torch.compile |\n  | 53  |  FP16+COMPILE_INDUCTOR_MAX_AUTOTUNE                            | CPU, MPS, GPU | PyTorch + torch.compile |\n  | 54  |  FP16+COMPILE_INDUCTOR_EAGER_FALLBACK                          | CPU, MPS, GPU | PyTorch + torch.compile |\n  | 55  |  FP16+COMPILE_ONNXRT                                           | CPU, MPS, GPU | PyTorch + torch.compile + ONNXRT |\n  | 56  |  FP16+COMPILE_OPENXLA                                          | XLA_GPU       | PyTorch + torch.compile + OpenXLA |\n  | 57  |  FP16+COMPILE_TVM                                              | CPU, MPS, GPU | PyTorch + torch.compile + Apache TVM |\n  | 58  |  FP16+COMPILE_TENSORRT                                         | GPU (CUDA)    | PyTorch + torch.compile + NVIDIA TensorRT |\n  | 59  |  FP16+COMPILE_OPENVINO                                         | CPU (Intel)   | PyTorch + torch.compile + OpenVINO |\n  | 60  |  FP16+EXPORT+COMPILE_CUDAGRAPHS                                | GPU (CUDA)    | torch.export + torch.compile |\n  | 61  |  FP16+EXPORT+COMPILE_INDUCTOR_DEFAULT                          | CPU, MPS, GPU | torch.export + torch.compile |\n  | 62  |  FP16+EXPORT+COMPILE_INDUCTOR_REDUCE_OVERHEAD                  | CPU, MPS, GPU | torch.export + torch.compile |\n  | 63  |  FP16+EXPORT+COMPILE_INDUCTOR_MAX_AUTOTUNE                     | CPU, MPS, GPU | torch.export + torch.compile |\n  | 64  |  FP16+EXPORT+COMPILE_INDUCTOR_DEFAULT_EAGER_FALLBACK           | CPU, MPS, GPU | torch.export + torch.compile |\n  | 65  |  FP16+EXPORT+COMPILE_ONNXRT                                    | CPU, MPS, GPU | torch.export + torch.compile + ONNXRT |\n  | 66  |  FP16+EXPORT+COMPILE_OPENXLA                                   | XLA_GPU       | torch.export + torch.compile + OpenXLA |\n  | 67  |  FP16+EXPORT+COMPILE_TVM                                       | CPU, MPS, GPU | torch.export + torch.compile + Apache TVM |    \n  | 68  |  FP16+EXPORT+COMPILE_TENSORRT                                  | GPU (CUDA)    | torch.export + torch.compile + NVIDIA TensorRT |\n  | 69  |  FP16+EXPORT+COMPILE_OPENVINO                                  | CPU (Intel)   | torch.export + torch.compile + OpenVINO |\n  | 70  |  FP16+JIT_TRACE                                                | CPU, MPS, GPU | PyTorch   |\n  | 71  |  FP16+TORCH_SCRIPT                                             | CPU, MPS, GPU | PyTorch   |\n  | 72  |  BF16+COMPILE_CUDAGRAPHS                                       | GPU (CUDA)    | PyTorch + torch.compile |\n  | 73  |  BF16+COMPILE_INDUCTOR_DEFAULT                                 | CPU, MPS, GPU | PyTorch + torch.compile |\n  | 74  |  BF16+COMPILE_INDUCTOR_REDUCE_OVERHEAD                         | CPU, MPS, GPU | PyTorch + torch.compile |\n  | 75  |  BF16+COMPILE_INDUCTOR_MAX_AUTOTUNE                            | CPU, MPS, GPU | PyTorch + torch.compile |\n  | 76  |  BF16+COMPILE_INDUCTOR_EAGER_FALLBACK                          | CPU, MPS, GPU | PyTorch + torch.compile |\n  | 77  |  BF16+COMPILE_ONNXRT                                           | CPU, MPS, GPU | PyTorch + torch.compile + ONNXRT |\n  | 78  |  BF16+COMPILE_OPENXLA                                          | XLA_GPU       | PyTorch + torch.compile + OpenXLA |\n  | 79  |  BF16+COMPILE_TVM                                              | CPU, MPS, GPU | PyTorch + torch.compile + Apache TVM |\n  | 80  |  BF16+COMPILE_TENSORRT                                         | GPU (CUDA)    | PyTorch + torch.compile + NVIDIA TensorRT |\n  | 81  |  BF16+COMPILE_OPENVINO                                         | CPU (Intel)   | PyTorch + torch.compile + OpenVINO |\n  | 82  |  BF16+EXPORT+COMPILE_CUDAGRAPHS                                | GPU (CUDA)    | torch.export + torch.compile |\n  | 83  |  BF16+EXPORT+COMPILE_INDUCTOR_DEFAULT                          | CPU, MPS, GPU | torch.export + torch.compile |\n  | 84  |  BF16+EXPORT+COMPILE_INDUCTOR_REDUCE_OVERHEAD                  | CPU, MPS, GPU | torch.export + torch.compile |\n  | 85  |  BF16+EXPORT+COMPILE_INDUCTOR_MAX_AUTOTUNE                     | CPU, MPS, GPU | torch.export + torch.compile |\n  | 86  |  BF16+EXPORT+COMPILE_INDUCTOR_DEFAULT_EAGER_FALLBACK           | CPU, MPS, GPU | torch.export + torch.compile |\n  | 87  |  BF16+EXPORT+COMPILE_ONNXRT                                    | CPU, MPS, GPU | torch.export + torch.compile + ONNXRT |\n  | 88  |  BF16+EXPORT+COMPILE_OPENXLA                                   | XLA_GPU       | torch.export + torch.compile + OpenXLA |\n  | 89  |  BF16+EXPORT+COMPILE_TVM                                       | CPU, MPS, GPU | torch.export + torch.compile + Apache TVM |    \n  | 90  |  BF16+EXPORT+COMPILE_TENSORRT                                  | GPU (CUDA)    | torch.export + torch.compile + NVIDIA TensorRT |\n  | 91  |  BF16+EXPORT+COMPILE_OPENVINO                                  | CPU (Intel)   | torch.export + torch.compile + OpenVINO |\n  | 92  |  BF16+JIT_TRACE                                                | CPU, MPS, GPU | PyTorch   |\n  | 93  |  BF16+TORCH_SCRIPT                                             | CPU, MPS, GPU | PyTorch   |\n\nThese conversion options are also all hard-coded in the [conversion options](src/alma/conversions/conversion_options.py)\nfile, which is the source of truth.\n\n\n## Testing:\n\nWe use pytest for testing. Simply run:\n```bash\npytest\n```\n\nWe currently don't have comprehensive tests, but we are working on adding more tests to ensure that\nthe conversion options are working as expected in known environments (e.g. the Docker container).\n\n## Future work:\n\n- Add more conversion options. This is a work in progress, and we are always looking for more conversion options.\n- Multi-device benchmarking. Currently `alma` only supports single-device benchmarking, but ideally a model\n  could be split across multiple devices.\n- Integrating conversion options beyond PyTorch, e.g. HuggingFace, JAX, llama.cpp, etc.\n\n## How to contribute:\n\nContributions are welcome! If you have a new conversion option, feature, or other you would like to add, \nso that the whole community can benefit, please open a pull request! We are always looking for new \nconversion options, and we are happy to help you get started with adding a new conversion \noption/feature!\n\nSee the [CONTRIBUTING.md](./CONTRIBUTING.md) file for more detailed information on how to contribute.\n\n\n## Citation\n```bibtex\n@Misc{alma,\n  title =        {Alma: PyTorch model speed benchmarking across all conversion types},\n  author =       {Oscar Savolainen and Saif Haq},\n  howpublished = {\\url{https://github.com/saifhaq/alma}},\n  year =         {2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaifhaq%2Falma","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaifhaq%2Falma","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaifhaq%2Falma/lists"}