{"id":31959375,"url":"https://github.com/huggingface/exporters","last_synced_at":"2025-10-14T15:28:59.964Z","repository":{"id":58711855,"uuid":"495371447","full_name":"huggingface/exporters","owner":"huggingface","description":"Export Hugging Face models to Core ML and TensorFlow Lite","archived":false,"fork":false,"pushed_at":"2024-07-23T15:48:39.000Z","size":292,"stargazers_count":675,"open_issues_count":40,"forks_count":52,"subscribers_count":20,"default_branch":"main","last_synced_at":"2025-10-13T17:44:27.704Z","etag":null,"topics":["coreml","coremltools","deep-learning","machine-learning","model-converter","pytorch","tensorflow","tflite","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/huggingface.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-05-23T10:57:55.000Z","updated_at":"2025-10-04T15:42:47.000Z","dependencies_parsed_at":"2023-10-02T08:45:03.160Z","dependency_job_id":null,"html_url":"https://github.com/huggingface/exporters","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/huggingface/exporters","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fexporters","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fexporters/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fexporters/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fexporters/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/huggingface","download_url":"https://codeload.github.com/huggingface/exporters/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fexporters/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279019314,"owners_count":26086711,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-14T02:00:06.444Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["coreml","coremltools","deep-learning","machine-learning","model-converter","pytorch","tensorflow","tflite","transformer"],"created_at":"2025-10-14T15:28:56.679Z","updated_at":"2025-10-14T15:28:59.956Z","avatar_url":"https://github.com/huggingface.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!---\nCopyright 2022 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n--\u003e\n\n# 🤗 Exporters\n\n👷 **WORK IN PROGRESS** 👷\n\nThis package lets you export 🤗 Transformers models to Core ML.\n\n\u003e For converting models to TFLite, we recommend using [Optimum](https://huggingface.co/docs/optimum/exporters/tflite/usage_guides/export_a_model).\n\n## When to use 🤗 Exporters\n\n🤗 Transformers models are implemented in PyTorch, TensorFlow, or JAX. However, for deployment you might want to use a different framework such as Core ML. This library makes it easy to convert Transformers models to this format.\n\nThe aim of the Exporters package is to be more convenient than writing your own conversion script with *coremltools* and to be tightly integrated with the 🤗 Transformers library and the Hugging Face Hub.\n\nFor an even more convenient approach, `Exporters` powers a [no-code transformers to Core ML conversion Space](https://huggingface.co/spaces/huggingface-projects/transformers-to-coreml). You can try it out without installing anything to check whether the model you are interested in can be converted. If conversion succeeds, the converted Core ML weights will be pushed to the Hub. For additional flexibility and details about the conversion process, please read on.\n\nNote: Keep in mind that Transformer models are usually quite large and are not always suitable for use on mobile devices. It might be a good idea to [optimize the model for inference](https://github.com/huggingface/optimum) first using 🤗 Optimum.\n\n## Installation\n\nClone this repo:\n\n```bash\n$ git clone https://github.com/huggingface/exporters.git\n```\n\nInstall it as a Python package:\n\n```bash\n$ cd exporters\n$ pip install -e .\n```\n\nAll done!\n\nNote: The Core ML exporter can be used from Linux but macOS is recommended.\n\n## Core ML\n\n[Core ML](https://developer.apple.com/machine-learning/core-ml/) is Apple's software library for fast on-device model inference with neural networks and other types of machine learning models. It can be used on macOS, iOS, tvOS, and watchOS, and is optimized for using the CPU, GPU, and Apple Neural Engine. Although the Core ML framework is proprietary, the Core ML file format is an open format.\n\nThe Core ML exporter uses [coremltools](https://coremltools.readme.io/docs) to perform the conversion from PyTorch or TensorFlow to Core ML.\n\nThe `exporters.coreml` package enables you to convert model checkpoints to a Core ML model by leveraging configuration objects. These configuration objects come ready-made for a number of model architectures, and are designed to be easily extendable to other architectures.\n\nReady-made configurations include the following architectures:\n\n- BEiT\n- BERT\n- ConvNeXT\n- CTRL\n- CvT\n- DistilBERT\n- DistilGPT2\n- GPT2\n- LeViT\n- MobileBERT\n- MobileViT\n- SegFormer\n- SqueezeBERT\n- Vision Transformer (ViT)\n- YOLOS\n\n\u003c!-- TODO: automatically generate this list --\u003e\n\n[See here](MODELS.md) for a complete list of supported models.\n\n### Exporting a model to Core ML\n\n\u003c!--\nTo export a 🤗 Transformers model to Core ML, you'll first need to install some extra dependencies:\n\n``bash\npip install transformers[coreml]\n``\n\nThe `transformers.coreml` package can then be used as a Python module:\n--\u003e\n\nThe `exporters.coreml` package can be used as a Python module from the command line. To export a checkpoint using a ready-made configuration, do the following:\n\n```bash\npython -m exporters.coreml --model=distilbert-base-uncased exported/\n```\n\nThis exports a Core ML version of the checkpoint defined by the `--model` argument. In this example it is `distilbert-base-uncased`, but it can be any checkpoint on the Hugging Face Hub or one that's stored locally.\n\nThe resulting Core ML file will be saved to the `exported` directory as `Model.mlpackage`. Instead of a directory you can specify a filename, such as `DistilBERT.mlpackage`.\n\nIt's normal for the conversion process to output many warning messages and other logging information. You can safely ignore these. If all went well, the export should conclude with the following logs:\n\n```bash\nValidating Core ML model...\n\t-[✓] Core ML model output names match reference model ({'last_hidden_state'})\n\t- Validating Core ML model output \"last_hidden_state\":\n\t\t-[✓] (1, 128, 768) matches (1, 128, 768)\n\t\t-[✓] all values close (atol: 0.0001)\nAll good, model saved at: exported/Model.mlpackage\n```\n\nNote: While it is possible to export models to Core ML on Linux, the validation step will only be performed on Mac, as it requires the Core ML framework to run the model.\n\nThe resulting file is `Model.mlpackage`. This file can be added to an Xcode project and be loaded into a macOS or iOS app.\n\nThe exported Core ML models use the **mlpackage** format with the **ML Program** model type. This format was introduced in 2021 and requires at least iOS 15, macOS 12.0, and Xcode 13. We prefer to use this format as it is the future of Core ML. The Core ML exporter can also make models in the older `.mlmodel` format, but this is not recommended.\n\nThe process is identical for TensorFlow checkpoints on the Hub. For example, you can export a pure TensorFlow checkpoint from the [Keras organization](https://huggingface.co/keras-io) as follows:\n\n```bash\npython -m exporters.coreml --model=keras-io/transformers-qa exported/\n```\n\nTo export a model that's stored locally, you'll need to have the model's weights and tokenizer files stored in a directory. For example, we can load and save a checkpoint as follows:\n\n```python\n\u003e\u003e\u003e from transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n\u003e\u003e\u003e # Load tokenizer and PyTorch weights form the Hub\n\u003e\u003e\u003e tokenizer = AutoTokenizer.from_pretrained(\"distilbert-base-uncased\")\n\u003e\u003e\u003e pt_model = AutoModelForSequenceClassification.from_pretrained(\"distilbert-base-uncased\")\n\u003e\u003e\u003e # Save to disk\n\u003e\u003e\u003e tokenizer.save_pretrained(\"local-pt-checkpoint\")\n\u003e\u003e\u003e pt_model.save_pretrained(\"local-pt-checkpoint\")\n```\n\nOnce the checkpoint is saved, you can export it to Core ML by pointing the `--model` argument to the directory holding the checkpoint files:\n\n```bash\npython -m exporters.coreml --model=local-pt-checkpoint exported/\n```\n\n\u003c!--\nTODO: also TFAutoModel example\n--\u003e\n\n### Selecting features for different model topologies\n\nEach ready-made configuration comes with a set of _features_ that enable you to export models for different types of topologies or tasks. As shown in the table below, each feature is associated with a different auto class:\n\n| Feature                                      | Auto Class                           |\n| -------------------------------------------- | ------------------------------------ |\n| `default`, `default-with-past`               | `AutoModel`                          |\n| `causal-lm`, `causal-lm-with-past`           | `AutoModelForCausalLM`               |\n| `ctc`                                        | `AutoModelForCTC`                    |\n| `image-classification`                       | `AutoModelForImageClassification`    |\n| `masked-im`                                  | `AutoModelForMaskedImageModeling`    |\n| `masked-lm`                                  | `AutoModelForMaskedLM`               |\n| `multiple-choice`                            | `AutoModelForMultipleChoice`         |\n| `next-sentence-prediction`                   | `AutoModelForNextSentencePrediction` |\n| `object-detection`                           | `AutoModelForObjectDetection`        |\n| `question-answering`                         | `AutoModelForQuestionAnswering`      |\n| `semantic-segmentation`                      | `AutoModelForSemanticSegmentation`   |\n| `seq2seq-lm`, `seq2seq-lm-with-past`         | `AutoModelForSeq2SeqLM`              |\n| `sequence-classification`                    | `AutoModelForSequenceClassification` |\n| `speech-seq2seq`, `speech-seq2seq-with-past` | `AutoModelForSpeechSeq2Seq`          |\n| `token-classification`                       | `AutoModelForTokenClassification`    |\n\nFor each configuration, you can find the list of supported features via the `FeaturesManager`. For example, for DistilBERT we have:\n\n```python\n\u003e\u003e\u003e from exporters.coreml.features import FeaturesManager\n\n\u003e\u003e\u003e distilbert_features = list(FeaturesManager.get_supported_features_for_model_type(\"distilbert\").keys())\n\u003e\u003e\u003e print(distilbert_features)\n['default', 'masked-lm', 'multiple-choice', 'question-answering', 'sequence-classification', 'token-classification']\n```\n\nYou can then pass one of these features to the `--feature` argument in the `exporters.coreml` package. For example, to export a text-classification model we can pick a fine-tuned model from the Hub and run:\n\n```bash\npython -m exporters.coreml --model=distilbert-base-uncased-finetuned-sst-2-english \\\n                           --feature=sequence-classification exported/\n```\n\nwhich will display the following logs:\n\n```bash\nValidating Core ML model...\n\t- Core ML model is classifier, validating output\n\t\t-[✓] predicted class NEGATIVE matches NEGATIVE\n\t\t-[✓] number of classes 2 matches 2\n\t\t-[✓] all values close (atol: 0.0001)\nAll good, model saved at: exported/Model.mlpackage\n```\n\nNotice that in this case, the exported model is a Core ML classifier, which predicts the highest scoring class name in addition to a dictionary of probabilities, instead of the `last_hidden_state` we saw with the `distilbert-base-uncased` checkpoint earlier. This is expected since the fine-tuned model has a sequence classification head.\n\n\u003cTip\u003e\n\nThe features that have a `with-past` suffix (e.g. `causal-lm-with-past`) correspond to model topologies with precomputed hidden states (key and values in the attention blocks) that can be used for fast autoregressive decoding.\n\n\u003c/Tip\u003e\n\n### Configuring the export options\n\nTo see the full list of possible options, run the following from the command line:\n\n```bash\npython -m exporters.coreml --help\n```\n\nExporting a model requires at least these arguments:\n\n- `-m \u003cmodel\u003e`: The model ID from the Hugging Face Hub, or a local path to load the model from.\n- `--feature \u003ctask\u003e`: The task the model should perform, for example `\"image-classification\"`. See the table above for possible task names.\n- `\u003coutput\u003e`: The path where to store the generated Core ML model.\n\nThe output path can be a folder, in which case the file will be named `Model.mlpackage`, or you can also specify the filename directly.\n\nAdditional arguments that can be provided:\n\n- `--preprocessor \u003cvalue\u003e`: Which type of preprocessor to use. `auto` tries to automatically detect it. Possible values are: `auto` (the default), `tokenizer`, `feature_extractor`, `processor`.\n- `--atol \u003cnumber\u003e`: The absolute difference tolerence used when validating the model. The default value is 1e-4.\n- `--quantize \u003cvalue\u003e`: Whether to quantize the model weights. The possible quantization options are: `float32` for no quantization (the default) or `float16` for 16-bit floating point.\n- `--compute_units \u003cvalue\u003e`: Whether to optimize the model for CPU, GPU, and/or Neural Engine. Possible values are: `all` (the default), `cpu_and_gpu`, `cpu_only`, `cpu_and_ne`.\n\n### Using the exported model\n\nUsing the exported model in an app is just like using any other Core ML model. After adding the model to Xcode, it will auto-generate a Swift class that lets you make predictions from within the app.\n\nDepending on the chosen export options, you may still need to preprocess or postprocess the input and output tensors.\n\nFor image inputs, there is no need to perform any preprocessing as the Core ML model will already normalize the pixels. For classifier models, the Core ML model will output the predictions as a dictionary of probabilities. For other models, you might need to do more work.\n\nCore ML does not have the concept of a tokenizer and so text models will still require manual tokenization of the input data. [Here is an example](https://github.com/huggingface/swift-coreml-transformers) of how to perform tokenization in Swift.\n\n### Overriding default choices in the configuration object\n\nAn important goal of Core ML is to make it easy to use the models inside apps. Where possible, the Core ML exporter will add extra operations to the model, so that you do not have to do your own pre- and postprocessing.\n\nIn particular,\n\n- Image models will automatically perform pixel normalization as part of the model. You do not need to preprocess the image yourself, except potentially resizing or cropping it.\n\n- For classification models, a softmax layer is added and the labels are included in the model file. Core ML makes a distinction between classifier models and other types of neural networks. For a model that outputs a single classification prediction per input example, Core ML makes it so that the model predicts the winning class label and a dictionary of probabilities instead of a raw logits tensor. Where possible, the exporter uses this special classifier model type.\n\n- Other models predict logits but do not fit into Core ML's definition of a classifier, such as the `token-classificaton` task that outputs a prediction for each token in the sequence. Here, the exporter also adds a softmax to convert the logits into probabilities. The label names are added to the model's metadata. Core ML ignores these label names but they can be retrieved by writing a few lines of Swift code.\n\n- A `semantic-segmentation` model will upsample the output image to the original spatial dimensions and apply an argmax to obtain the predicted class label indices. It does not automatically apply a softmax.\n\nThe Core ML exporter makes these choices because they are the settings you're most likely to need. To override any of the above defaults, you must create a subclass of the configuration object, and then export the model to Core ML by writing a short Python program.\n\nExample: To prevent the MobileViT semantic segmentation model from upsampling the output image, you would create a subclass of `MobileViTCoreMLConfig` and override the `outputs` property to set `do_upsample` to False. Other options you can set for this output are `do_argmax` and `do_softmax`.\n\n```python\nfrom collections import OrderedDict\nfrom exporters.coreml.models import MobileViTCoreMLConfig\nfrom exporters.coreml.config import OutputDescription\n\nclass MyCoreMLConfig(MobileViTCoreMLConfig):\n    @property\n    def outputs(self) -\u003e OrderedDict[str, OutputDescription]:\n        return OrderedDict(\n            [\n                (\n                    \"logits\",\n                    OutputDescription(\n                        \"classLabels\",\n                        \"Classification scores for each pixel\",\n                        do_softmax=True,\n                        do_upsample=False,\n                        do_argmax=False,\n                    )\n                ),\n            ]\n        )\n\nconfig = MyCoreMLConfig(model.config, \"semantic-segmentation\")\n```\n\nHere you can also change the name of the output from `classLabels` to something else, or fill in the output description (\"Classification scores for each pixel\").\n\nIt is also possible to change the properties of the model inputs. For example, for text models the default sequence length is between 1 and 128 tokens. To set the input sequence length on a DistilBERT model to a fixed length of 32 tokens, you could override the config object as follows:\n\n```python\nfrom collections import OrderedDict\nfrom exporters.coreml.models import DistilBertCoreMLConfig\nfrom exporters.coreml.config import InputDescription\n\nclass MyCoreMLConfig(DistilBertCoreMLConfig):\n    @property\n    def inputs(self) -\u003e OrderedDict[str, InputDescription]:\n        input_descs = super().inputs\n        input_descs[\"input_ids\"].sequence_length = 32\n        return input_descs\n\nconfig = MyCoreMLConfig(model.config, \"text-classification\")\n```\n\nUsing a fixed sequence length generally outputs a simpler, and possibly faster, Core ML model. However, for many models the input needs to have a flexible length. In that case, specify a tuple for `sequence_length` to set the (min, max) lengths. Use (1, -1) to have no upper limit on the sequence length. (Note: if `sequence_length` is set to a fixed value, then the batch size is fixed to 1.)\n\nTo find out what input and output options are available for the model you're interested in, create its `CoreMLConfig` object and examine the `config.inputs` and `config.outputs` properties.\n\nNot all inputs or outputs are always required: For text models, you may remove the `attention_mask` input. Without this input, the attention mask is always assumed to be filled with ones (no padding). However, if the task requires a `token_type_ids` input, there must also be an `attention_mask` input.\n\nRemoving inputs and/or outputs is accomplished by making a subclass of `CoreMLConfig` and overriding the `inputs` and `outputs` properties.\n\nBy default, a model is generated in the ML Program format. By overriding the `use_legacy_format` property to return `True`, the older NeuralNetwork format will be used. This is not recommended and only exists as a workaround for models that fail to convert to the ML Program format.\n\nOnce you have the modified `config` instance, you can use it to export the model following the instructions from the section \"Exporting the model\" below.\n\nNot everything is described by the configuration objects. The behavior of the converted model is also determined by the model's tokenizer or feature extractor. For example, to use a different input image size, you'd create the feature extractor with different resizing or cropping settings and use that during the conversion instead of the default feature extractor.\n\n### Exporting a model for an unsupported architecture\n\nIf you wish to export a model whose architecture is not natively supported by the library, there are three main steps to follow:\n\n1. Implement a custom Core ML configuration.\n2. Export the model to Core ML.\n3. Validate the outputs of the PyTorch and exported models.\n\nIn this section, we'll look at how DistilBERT was implemented to show what's involved with each step.\n\n#### Implementing a custom Core ML configuration\n\nTODO: didn't write this section yet because the implementation is not done yet\n\nLet’s start with the configuration object. We provide an abstract classes that you should inherit from, `CoreMLConfig`.\n\n```python\nfrom exporters.coreml import CoreMLConfig\n```\n\nTODO: stuff to cover here:\n\n- `modality` property\n- how to implement custom ops + link to coremltools documentation on this topic\n- decoder models (`use_past`) and encoder-decoder models (`seq2seq`)\n\n#### Exporting the model\n\nOnce you have implemented the Core ML configuration, the next step is to export the model. Here we can use the `export()` function provided by the `exporters.coreml` package. This function expects the Core ML configuration, along with the base model and tokenizer (for text models) or feature extractor (for vision models):\n\n```python\nfrom transformers import AutoConfig, AutoModelForSequenceClassification, AutoTokenizer\nfrom exporters.coreml import export\nfrom exporters.coreml.models import DistilBertCoreMLConfig\n\nmodel_ckpt = \"distilbert-base-uncased\"\nbase_model = AutoModelForSequenceClassification.from_pretrained(model_ckpt, torchscript=True)\npreprocessor = AutoTokenizer.from_pretrained(model_ckpt)\n\ncoreml_config = DistilBertCoreMLConfig(base_model.config, task=\"text-classification\")\nmlmodel = export(preprocessor, base_model, coreml_config)\n```\n\nNote: For the best results, pass the argument `torchscript=True` to `from_pretrained` when loading the model. This allows the model to configure itself for PyTorch tracing, which is needed for the Core ML conversion.\n\nAdditional options that can be passed into `export()`:\n\n- `quantize`: Use `\"float32\"` for no quantization (the default), `\"float16\"` to quantize the weights to 16-bit floats.\n- `compute_units`: Whether to optimize the model for CPU, GPU, and/or Neural Engine. Defaults to `coremltools.ComputeUnit.ALL`.\n\nTo export the model with precomputed hidden states (key and values in the attention blocks) for fast autoregressive decoding, pass the argument `use_past=True` when creating the `CoreMLConfig` object.\n\nIt is normal for the Core ML exporter to print out a lot of warning and information messages. In particular, you might see messages such as these:\n\n\u003e TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!\n\nThose messages are to be expected and are a normal part of the conversion process. If there is a real problem, the converter will throw an error.\n\nIf the export succeeded, the return value from `export()` is a `coremltools.models.MLModel` object. Write `print(mlmodel)` to examine the Core ML model's inputs, outputs, and metadata.\n\nOptionally fill in the model's metadata:\n\n```python\nmlmodel.short_description = \"Your awesome model\"\nmlmodel.author = \"Your name\"\nmlmodel.license = \"Fill in the copyright information here\"\nmlmodel.version = \"1.0\"\n```\n\nFinally, save the model. You can open the resulting **mlpackage** file in Xcode and examine it there.\n\n```python\nmlmodel.save(\"DistilBert.mlpackage\")\n```\n\nNote: If the configuration object used returns `True` from `use_legacy_format`, the model can be saved as `ModelName.mlmodel` instead of `.mlpackage`.\n\n#### Exporting a decoder model\n\nDecoder-based models can use a `past_key_values` input that ontains pre-computed hidden-states (key and values in the self-attention blocks), which allows for much faster sequential decoding. This feature is enabled by passing `use_cache=True` to the Transformer model.\n\nTo enable this feature with the Core ML exporter, set the `use_past=True` argument when creating the `CoreMLConfig` object:\n\n```python\ncoreml_config = CTRLCoreMLConfig(base_model.config, task=\"text-generation\", use_past=True)\n\n# or:\ncoreml_config = CTRLCoreMLConfig.with_past(base_model.config, task=\"text-generation\")\n```\n\nThis adds multiple new inputs and outputs to the model with names such as `past_key_values_0_key`, `past_key_values_0_value`, ... (inputs) and `present_key_values_0_key`, `present_key_values_0_value`, ... (outputs).\n\nEnabling this option makes the model less convenient to use, since you will have to keep track of many additional tensors, but it does make inference much faster on sequences.\n\nThe Transformers model must be loaded with `is_decoder=True`, for example:\n\n```python\nbase_model = BigBirdForCausalLM.from_pretrained(\"google/bigbird-roberta-base\", torchscript=True, is_decoder=True)\n```\n\nTODO: Example of how to use this in Core ML. The `past_key_values` tensors will grow larger over time. The `attention_mask` tensor must have the size of `past_key_values` plus new `input_ids`.\n\n#### Exporting an encoder-decoder model\n\nTODO: properly write this section\n\nYou'll need to export the model as two separate Core ML models: the encoder and the decoder.\n\nExport the model like so:\n\n```python\ncoreml_config = TODOCoreMLConfig(base_model.config, task=\"text2text-generation\", seq2seq=\"encoder\")\nencoder_mlmodel = export(preprocessor, base_model.get_encoder(), coreml_config)\n\ncoreml_config = TODOCoreMLConfig(base_model.config, task=\"text2text-generation\", seq2seq=\"decoder\")\ndecoder_mlmodel = export(preprocessor, base_model, coreml_config)\n```\n\nWhen the `seq2seq` option is used, the sequence length in the Core ML model is always unbounded. The `sequence_length` specified in the configuration object is ignored.\n\nThis can also be combined with `use_past=True`. TODO: explain how to use this.\n\n#### Validating the model outputs\n\nThe final step is to validate that the outputs from the base and exported model agree within some absolute tolerance. You can use the `validate_model_outputs()` function provided by the `exporters.coreml` package as follows.\n\nFirst enable logging:\n\n```python\nfrom exporters.utils import logging\nlogger = logging.get_logger(\"exporters.coreml\")\nlogger.setLevel(logging.INFO)\n```\n\nThen validate the model:\n\n```python\nfrom exporters.coreml import validate_model_outputs\n\nvalidate_model_outputs(\n    coreml_config, preprocessor, base_model, mlmodel, coreml_config.atol_for_validation\n)\n```\n\nNote: `validate_model_outputs` only works on Mac computers, as it depends on the Core ML framework to make predictions with the model.\n\nThis function uses the `CoreMLConfig.generate_dummy_inputs()` method to generate inputs for the base and exported model, and the absolute tolerance can be defined in the configuration. We generally find numerical agreement in the 1e-6 to 1e-4 range, although anything smaller than 1e-3 is likely to be OK.\n\nIf validation fails with an error such as the following, it doesn't necessarily mean the model is broken:\n\n\u003e ValueError: Output values do not match between reference model and Core ML exported model: Got max absolute difference of: 0.12345\n\nThe comparison is done using an absolute difference value, which in this example is 0.12345. That is much larger than the default tolerance value of 1e-4, hence the reported error. However, the magnitude of the activations also matters. For a model whose activations are on the order of 1e+3, a maximum absolute difference of 0.12345 would usually be acceptable.\n\nIf validation fails with this error and you're not entirely sure if this is a true problem, call `mlmodel.predict()` on a dummy input tensor and look at the largest absolute magnitude in the output tensor.\n\n### Contributing a new configuration to 🤗 Transformers\n\nWe are looking to expand the set of ready-made configurations and welcome contributions from the community! If you would like to contribute your addition to the library, you will need to:\n\n* Implement the Core ML configuration in the `models.py` file\n* Include the model architecture and corresponding features in [`~coreml.features.FeatureManager`]\n* Add your model architecture to the tests in `test_coreml.py`\n\n### Troubleshooting: What if Core ML Exporters doesn't work for your model?\n\nIt's possible that the model you wish to export fails to convert using Core ML Exporters or even when you try to use `coremltools` directly. When running these automated conversion tools, it's quite possible the conversion bails out with an inscrutable error message. Or, the conversion may appear to succeed but the model does not work or produces incorrect outputs.\n\nThe most common reasons for conversion errors are:\n\n- You provided incorrect arguments to the converter. The `task` argument should match the chosen model architecture. For example, the `\"feature-extraction\"` task should only be used with models of type `AutoModel`, not `AutoModelForXYZ`. Additionally, the `seq2seq` argument is required to tell apart encoder-decoder type models from encoder-only or decoder-only models. Passing invalid choices for these arguments may give an error during the conversion process or it may create a model that works but does the wrong thing.\n\n- The model performs an operation that is not supported by Core ML or coremltools. It's also possible coremltools has a bug or can't handle particularly complex models.\n\nIf the Core ML export fails due to the latter, you have a couple of options:\n\n1. Implement the missing operator in the `CoreMLConfig`'s `patch_pytorch_ops()` function.\n\n2. Fix the original model. This requires a deep understanding of how the model works and is not trivial. However, sometimes the fix is to hardcode certain values rather than letting PyTorch or TensorFlow calculate them from the shapes of tensors.\n\n3. Fix coremltools. It is sometimes possible to hack coremltools so that it ignores the issue.\n\n4. Forget about automated conversion and [build the model from scratch using MIL](https://coremltools.readme.io/docs/model-intermediate-language). This is the intermediate language that coremltools uses internally to represent models. It's similar in many ways to PyTorch.\n\n5. Submit an issue and we'll see what we can do. 😀\n\n### Known issues\n\nThe Core ML exporter writes models in the **mlpackage** format. Unfortunately, for some models the generated ML Program is incorrect, in which case it's recommended to convert the model to the older NeuralNetwork format by setting the configuration object's `use_legacy_format` property to `True`. On certain hardware, the older format may also run more efficiently. If you're not sure which one to use, export the model twice and compare the two versions.\n\nKnown models that need to be exported with `use_legacy_format=True` are: GPT2, DistilGPT2.\n\nUsing flexible input sequence length with GPT2 or GPT-Neo causes the converter to be extremely slow and allocate over 200 GB of RAM. This is clearly a bug in coremltools or the Core ML framework, as the allocated memory is never used (the computer won't start swapping). After many minutes, the conversion does succeed, but the model may not be 100% correct. Loading the model afterwards takes a very long time and makes similar memory allocations. Likewise for making predictions. While theoretically the conversion succeeds (if you have enough patience), the model is not really usable like this.\n\n## Pushing the model to the Hugging Face Hub\n\nThe [Hugging Face Hub](https://huggingface.co) can also host your Core ML models. You can use the [`huggingface_hub` package](https://huggingface.co/docs/huggingface_hub/main/en/index) to upload the converted model to the Hub from Python.\n\nFirst log in to your Hugging Face account account with the following command:\n\n```bash\nhuggingface-cli login\n```\n\nOnce you are logged in, save the **mlpackage** to the Hub as follows:\n\n```python\nfrom huggingface_hub import Repository\n\nwith Repository(\n        \"\u003cmodel name\u003e\", clone_from=\"https://huggingface.co/\u003cuser\u003e/\u003cmodel name\u003e\",\n        use_auth_token=True).commit(commit_message=\"add Core ML model\"):\n    mlmodel.save(\"\u003cmodel name\u003e.mlpackage\")\n```\n\nMake sure to replace `\u003cmodel name\u003e` with the name of the model and `\u003cuser\u003e` with your Hugging Face username.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuggingface%2Fexporters","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhuggingface%2Fexporters","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuggingface%2Fexporters/lists"}