https://github.com/huggingface/optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
https://github.com/huggingface/optimum

graphcore habana inference intel onnx onnxruntime optimization pytorch quantization tflite training transformers

Last synced: about 2 months ago
JSON representation

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

Host: GitHub
URL: https://github.com/huggingface/optimum
Owner: huggingface
License: apache-2.0
Created: 2021-07-20T12:36:40.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2025-04-30T21:16:39.000Z (2 months ago)
Last Synced: 2025-05-07T11:41:41.033Z (about 2 months ago)
Topics: graphcore, habana, inference, intel, onnx, onnxruntime, optimization, pytorch, quantization, tflite, training, transformers
Language: Python
Homepage: https://huggingface.co/docs/optimum/main/
Size: 5.64 MB
Stars: 2,878
Watchers: 56
Forks: 532
Open Issues: 369
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

awesome-oneapi - optimum - Optimum is an extension of Transformers and Diffusers that provides optimization tools for efficiency to train and run machine learning models on targeted hardware, while also being easy to use. (Table of Contents / AI - Frameworks and Toolkits)
StarryDivineSky - huggingface/optimum
awesome-production-machine-learning - Optimum - Optimum is an extension of Transformers and Diffusers, providing a set of optimization tools enabling maximum efficiency to train and run models on targeted hardware while keeping things easy to use. (Computation Optimisation)

README

        [![ONNX Runtime](https://github.com/huggingface/optimum/actions/workflows/test_onnxruntime.yml/badge.svg)](https://github.com/huggingface/optimum/actions/workflows/test_onnxruntime.yml)

# Hugging Face Optimum

🤗 Optimum is an extension of 🤗 Transformers and Diffusers, providing a set of optimization tools enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.

## Installation

🤗 Optimum can be installed using `pip` as follows:

```bash

python -m pip install optimum

```

If you'd like to use the accelerator-specific features of 🤗 Optimum, you can install the required dependencies according to the table below:

| Accelerator                                                                                                            | Installation                                                      |

|:-----------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------|

| [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/overview)                                               | `pip install --upgrade --upgrade-strategy eager optimum[onnxruntime]`      |

| [ExecuTorch](https://github.com/huggingface/optimum-executorch)                                                         | `pip install --upgrade --upgrade-strategy eager optimum[executorch]`

| [Intel Neural Compressor](https://huggingface.co/docs/optimum/intel/index)                                             | `pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]`|

| [OpenVINO](https://huggingface.co/docs/optimum/intel/index)                                                            | `pip install --upgrade --upgrade-strategy eager optimum[openvino]`         |

| [NVIDIA TensorRT-LLM](https://huggingface.co/docs/optimum/main/en/nvidia_overview)                                     | `docker run -it --gpus all --ipc host huggingface/optimum-nvidia`          |

| [AMD Instinct GPUs and Ryzen AI NPU](https://huggingface.co/docs/optimum/amd/index)                                    | `pip install --upgrade --upgrade-strategy eager optimum[amd]`              |

| [AWS Trainum & Inferentia](https://huggingface.co/docs/optimum-neuron/index)                                           | `pip install --upgrade --upgrade-strategy eager optimum[neuronx]`          |

| [Habana Gaudi Processor (HPU)](https://huggingface.co/docs/optimum/habana/index)                                       | `pip install --upgrade --upgrade-strategy eager optimum[habana]`           |

| [FuriosaAI](https://huggingface.co/docs/optimum/furiosa/index)                                                         | `pip install --upgrade --upgrade-strategy eager optimum[furiosa]`          |

The `--upgrade --upgrade-strategy eager` option is needed to ensure the different packages are upgraded to the latest possible version.

To install from source:

```bash

python -m pip install git+https://github.com/huggingface/optimum.git

```

For the accelerator-specific features, append `optimum[accelerator_type]` to the above command:

```bash

python -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git

```

## Accelerated Inference

🤗 Optimum provides multiple tools to export and run optimized models on various ecosystems:

- [ONNX](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model) / [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models)

- [ExecuTorch](https://huggingface.co/docs/optimum-executorch/guides/export), PyTorch’s native solution to inference on the Edge, more details [here](https://pytorch.org/executorch/stable/)

- TensorFlow Lite

- [OpenVINO](https://huggingface.co/docs/optimum/intel/inference)

- Habana first-gen Gaudi / Gaudi2, more details [here](https://huggingface.co/docs/optimum/main/en/habana/usage_guides/accelerate_inference)

- AWS Inferentia 2 / Inferentia 1, more details [here](https://huggingface.co/docs/optimum-neuron/en/guides/models)

- NVIDIA TensorRT-LLM , more details [here](https://huggingface.co/blog/optimum-nvidia)

The [export](https://huggingface.co/docs/optimum/exporters/overview) and optimizations can be done both programmatically and with a command line.

### ONNX + ONNX Runtime

Before you begin, make sure you have all the necessary libraries installed :

```bash

pip install optimum[exporters,onnxruntime]

```

It is possible to export 🤗 Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily.

For more information on the ONNX export, please check the [documentation](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model).

Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend.

More details on how to run ONNX models with `ORTModelForXXX` classes [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models).

### ExecuTorch

Before you begin, make sure you have all the necessary libraries installed :

```bash

pip install optimum[exporters-executorch]

```

Users can export 🤗 Transformers models to [ExecuTorch](https://github.com/pytorch/executorch) and run inference on edge devices within PyTorch's ecosystem.

For more information about export 🤗 Transformers to ExecuTorch, please check the doc for [Optimum-ExecuTorch](https://huggingface.co/docs/optimum-executorch/guides/export).

### TensorFlow Lite

Before you begin, make sure you have all the necessary libraries installed :

```bash

pip install optimum[exporters-tf]

```

Just as for ONNX, it is possible to export models to [TensorFlow Lite](https://www.tensorflow.org/lite) and quantize them.

You can find more information in our [documentation](https://huggingface.co/docs/optimum/main/exporters/tflite/usage_guides/export_a_model).

### Intel (OpenVINO + Neural Compressor + IPEX)

Before you begin, make sure you have all the necessary [libraries installed](https://huggingface.co/docs/optimum/main/en/intel/installation).

You can find more information on the different integration in our [documentation](https://huggingface.co/docs/optimum/main/en/intel/index) and in the examples of [`optimum-intel`](https://github.com/huggingface/optimum-intel).

### Quanto

[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backenb which allowss you to quantize a model either using the python API or the `optimum-cli`.

You can see more details and [examples](https://github.com/huggingface/optimum-quanto/tree/main/examples) in the [Quanto](https://github.com/huggingface/optimum-quanto) repository.

## Accelerated training

🤗 Optimum provides wrappers around the original 🤗 Transformers [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) to enable training on powerful hardware easily.

We support many providers:

- Habana's Gaudi processors

- AWS Trainium instances, check [here](https://huggingface.co/docs/optimum-neuron/en/guides/distributed_training)

- ONNX Runtime (optimized for GPUs)

### Habana

Before you begin, make sure you have all the necessary libraries installed :

```bash

pip install --upgrade --upgrade-strategy eager optimum[habana]

```

You can find examples in the [documentation](https://huggingface.co/docs/optimum/habana/quickstart) and in the [examples](https://github.com/huggingface/optimum-habana/tree/main/examples).

### ONNX Runtime

Before you begin, make sure you have all the necessary libraries installed :

```bash

pip install optimum[onnxruntime-training]

```

You can find examples in the [documentation](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/trainer) and in the [examples](https://github.com/huggingface/optimum/tree/main/examples/onnxruntime/training).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/huggingface/optimum

Awesome Lists containing this project

README