{"id":13441757,"url":"https://github.com/apple/ml-fastvit","last_synced_at":"2025-04-08T09:08:52.732Z","repository":{"id":188569600,"uuid":"678630946","full_name":"apple/ml-fastvit","owner":"apple","description":"This repository contains the official implementation of the research paper, \"FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization\" ICCV 2023","archived":false,"fork":false,"pushed_at":"2023-11-30T19:22:00.000Z","size":3185,"stargazers_count":1886,"open_issues_count":4,"forks_count":110,"subscribers_count":29,"default_branch":"main","last_synced_at":"2025-04-01T07:52:24.911Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apple.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-08-15T02:03:14.000Z","updated_at":"2025-03-31T18:45:26.000Z","dependencies_parsed_at":"2024-01-06T14:57:33.224Z","dependency_job_id":"685aeca2-de6b-4567-9caf-3a0903b97016","html_url":"https://github.com/apple/ml-fastvit","commit_stats":null,"previous_names":["apple/ml-fastvit"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-fastvit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-fastvit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-fastvit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-fastvit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apple","download_url":"https://codeload.github.com/apple/ml-fastvit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247809962,"owners_count":20999816,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T03:01:37.729Z","updated_at":"2025-04-08T09:08:52.697Z","avatar_url":"https://github.com/apple.png","language":"Python","funding_links":[],"categories":["Python","⚡ Efficient Mobile Models"],"sub_categories":["🚀 Backbone Networks"],"readme":"# FastViT:  A Fast Hybrid Vision Transformer using Structural Reparameterization\n\nThis is the official repository of \n\n**FastViT:  A Fast Hybrid Vision Transformer using Structural Reparameterization.** \n*Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, Anurag Ranjan.* ICCV 2023\n\n[![arxiv](https://shields.io/badge/paper-green?logo=arxiv\u0026style=for-the-badge)](https://arxiv.org/abs/2303.14189)\n[![webpage](https://shields.io/badge/Webpage-green?logo=safari\u0026style=for-the-badge)](https://machinelearning.apple.com/research/)\n\n![FastViT Performance](docs/intro/acc_vs_latency.png)\n\nAll models are trained on ImageNet-1K and benchmarked on iPhone 12 Pro using [ModelBench app](https://github.com/apple/ml-mobileone/tree/main/ModelBench).\n\n\n## Setup\n```bash\nconda create -n fastvit python=3.9\nconda activate fastvit\nconda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch\npip install -r requirements.txt\n```\n\n## Usage\nTo use our model, follow the code snippet below,\n\n```python\nimport torch\nimport models\nfrom timm.models import create_model\nfrom models.modules.mobileone import reparameterize_model\n\n# To Train from scratch/fine-tuning\nmodel = create_model(\"fastvit_t8\")\n# ... train ...\n\n# Load unfused pre-trained checkpoint for fine-tuning\n# or for downstream task training like detection/segmentation\ncheckpoint = torch.load('/path/to/unfused_checkpoint.pth.tar')\nmodel.load_state_dict(checkpoint['state_dict'])\n# ... train ...\n\n# For inference\nmodel.eval()      \nmodel_inf = reparameterize_model(model)\n# Use model_inf at test-time\n```\n\n## FastViT Model Zoo\n### Image Classification\n\nModels trained on ImageNet-1K\n\n| Model        | Top-1 Acc. |           Latency           | Pytorch Checkpoint (url) |       CoreML Model       |\n|:-------------|:----------:|:----------------------------:|:------------------------:|:------------------------:|\n| FastViT-T8   |    76.2    |  [0.8](docs/latency/t8.PNG)  |   [T8](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_models/fastvit_t8_reparam.pth.tar)([unfused](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_models/fastvit_t8.pth.tar))    |  [fastvit_t8.mlpackage.zip](https://docs-assets.developer.apple.com/ml-research/models/fastvit/coreml_models/fastvit_t8_reparam.pth.mlpackage.zip)  |\n| FastViT-T12  |    79.3    | [1.2](docs/latency/t12.PNG)  |   [T12](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_models/fastvit_t12_reparam.pth.tar)([unfused](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_models/fastvit_t12.pth.tar))   | [fastvit_t12.mlpackage.zip](https://docs-assets.developer.apple.com/ml-research/models/fastvit/coreml_models/fastvit_t12_reparam.pth.mlpackage.zip)  |\n| FastViT-S12  |    79.9    | [1.4](docs/latency/s12.PNG)  |   [S12](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_models/fastvit_s12_reparam.pth.tar)([unfused](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_models/fastvit_s12.pth.tar))   | [fastvit_s12.mlpackage.zip](https://docs-assets.developer.apple.com/ml-research/models/fastvit/coreml_models/fastvit_s12_reparam.pth.mlpackage.zip)  |\n| FastViT-SA12 |    80.9    | [1.6](docs/latency/sa12.PNG) |  [SA12](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_models/fastvit_sa12_reparam.pth.tar)([unfused](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_models/fastvit_sa12.pth.tar))   | [fastvit_sa12.mlpackage.zip](https://docs-assets.developer.apple.com/ml-research/models/fastvit/coreml_models/fastvit_sa12_reparam.pth.mlpackage.zip) |\n| FastViT-SA24 |    82.7    | [2.6](docs/latency/sa24.PNG) |  [SA24](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_models/fastvit_sa24_reparam.pth.tar)([unfused](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_models/fastvit_sa24.pth.tar))   | [fastvit_sa24.mlpackage.zip](https://docs-assets.developer.apple.com/ml-research/models/fastvit/coreml_models/fastvit_sa24_reparam.pth.mlpackage.zip) |\n| FastViT-SA36 |    83.6    | [3.5](docs/latency/sa36.PNG) |  [SA36](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_models/fastvit_sa36_reparam.pth.tar)([unfused](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_models/fastvit_sa36.pth.tar))   | [fastvit_sa36.mlpackage.zip](https://docs-assets.developer.apple.com/ml-research/models/fastvit/coreml_models/fastvit_sa36_reparam.pth.mlpackage.zip) |\n| FastViT-MA36 |    83.9    | [4.6](docs/latency/ma36.PNG) |  [MA36](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_models/fastvit_ma36_reparam.pth.tar)([unfused](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_models/fastvit_ma36.pth.tar))   | [fastvit_ma36.mlpackage.zip](https://docs-assets.developer.apple.com/ml-research/models/fastvit/coreml_models/fastvit_ma36_reparam.pth.mlpackage.zip) |\n\n\nModels trained on ImageNet-1K with knowledge distillation.\n\n| Model        | Top-1 Acc. |           Latency           | Pytorch Checkpoint (url) |                                                                           CoreML Model                                                                            |\n|:-------------|:----------:|:----------------------------:|:------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------:|\n| FastViT-T8   |    77.2    |  [0.8](docs/latency/t8.PNG)  |   [T8](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_distilled_models/fastvit_t8_reparam.pth.tar)([unfused](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_distilled_models/fastvit_t8.pth.tar))    |    [fastvit_t8.mlpackage.zip](https://docs-assets.developer.apple.com/ml-research/models/fastvit/coreml_distilled_models/fastvit_t8_reparam.pth.mlpackage.zip)    |\n| FastViT-T12  |    80.3    | [1.2](docs/latency/t12.PNG)  |   [T12](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_distilled_models/fastvit_t12_reparam.pth.tar)([unfused](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_distilled_models/fastvit_t12.pth.tar))   |   [fastvit_t12.mlpackage.zip](https://docs-assets.developer.apple.com/ml-research/models/fastvit/coreml_distilled_models/fastvit_t12_reparam.pth.mlpackage.zip)   |\n| FastViT-S12  |    81.1    | [1.4](docs/latency/s12.PNG)  |   [S12](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_distilled_models/fastvit_s12_reparam.pth.tar)([unfused](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_distilled_models/fastvit_s12.pth.tar))   |      [fastvit_s12.mlpackage.zip](https://docs-assets.developer.apple.com/ml-research/models/fastvit/coreml_distilled_models/fastvit_s12_reparam.pth.mlpackage.zip)      |\n| FastViT-SA12 |    81.9    | [1.6](docs/latency/sa12.PNG) |  [SA12](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_distilled_models/fastvit_sa12_reparam.pth.tar)([unfused](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_distilled_models/fastvit_sa12.pth.tar))   |     [fastvit_sa12.mlpackage.zip](https://docs-assets.developer.apple.com/ml-research/models/fastvit/coreml_distilled_models/fastvit_sa12_reparam.pth.mlpackage.zip)     |\n| FastViT-SA24 |    83.4    | [2.6](docs/latency/sa24.PNG) |  [SA24](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_distilled_models/fastvit_sa24_reparam.pth.tar)([unfused](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_distilled_models/fastvit_sa24.pth.tar))   |     [fastvit_sa24.mlpackage.zip](https://docs-assets.developer.apple.com/ml-research/models/fastvit/coreml_distilled_models/fastvit_sa24_reparam.pth.mlpackage.zip)     |\n| FastViT-SA36 |    84.2    | [3.5](docs/latency/sa36.PNG) |  [SA36](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_distilled_models/fastvit_sa36_reparam.pth.tar)([unfused](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_distilled_models/fastvit_sa36.pth.tar))   |     [fastvit_sa36.mlpackage.zip](https://docs-assets.developer.apple.com/ml-research/models/fastvit/coreml_distilled_models/fastvit_sa36_reparam.pth.mlpackage.zip)     |\n| FastViT-MA36 |    84.6    | [4.6](docs/latency/ma36.PNG) |  [MA36](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_distilled_models/fastvit_ma36_reparam.pth.tar)([unfused](https://docs-assets.developer.apple.com/ml-research/models/fastvit/image_classification_distilled_models/fastvit_ma36.pth.tar))   |     [fastvit_ma36.mlpackage.zip](https://docs-assets.developer.apple.com/ml-research/models/fastvit/coreml_distilled_models/fastvit_ma36_reparam.pth.mlpackage.zip)     |\n\n#### Latency Benchmarking\nLatency of all models measured on iPhone 12 Pro using [ModelBench app](https://github.com/apple/ml-mobileone/tree/main/ModelBench). \nFor further details please contact [James Gabriel](mailto:james_gabriel@apple.com) and [Jeff Zhu](mailto:jeff.zhu@apple.com).\nAll reported numbers are rounded to the nearest decimal.\n\n## Training\n### Image Classification\n#### Dataset Preparation\n\nDownload the [ImageNet-1K](http://image-net.org/) dataset and structure the data as follows:\n```\n/path/to/imagenet-1k/\n  train/\n    class1/\n      img1.jpeg\n    class2/\n      img2.jpeg\n  validation/\n    class1/\n      img3.jpeg\n    class2/\n      img4.jpeg\n```\n\nTo train a variant of FastViT model, follow the respective command below:\n\u003cdetails\u003e\n\u003csummary\u003e\nFastViT-T8\n\u003c/summary\u003e\n\n```\n# Without Distillation\npython -m torch.distributed.launch --nproc_per_node=8 train.py \\\n/path/to/ImageNet/dataset --model fastvit_t8 -b 128 --lr 1e-3 \\\n--native-amp --mixup 0.2 --output /path/to/save/results \\\n--input-size 3 256 256\n\n# With Distillation\npython -m torch.distributed.launch --nproc_per_node=8 train.py \\\n/path/to/ImageNet/dataset --model fastvit_t8 -b 128 --lr 1e-3 \\ \n--native-amp --mixup 0.2 --output /path/to/save/results \\\n--input-size 3 256 256 \n--distillation-type \"hard\"\n```\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\u003csummary\u003e\nFastViT-T12\n\u003c/summary\u003e\n\n```\n# Without Distillation\npython -m torch.distributed.launch --nproc_per_node=8 train.py \\\n/path/to/ImageNet/dataset --model fastvit_t12 -b 128 --lr 1e-3 \\\n--native-amp --mixup 0.2 --output /path/to/save/results \\\n--input-size 3 256 256\n\n# With Distillation\npython -m torch.distributed.launch --nproc_per_node=8 train.py \\\n/path/to/ImageNet/dataset --model fastvit_t12 -b 128 --lr 1e-3 \\ \n--native-amp --mixup 0.2 --output /path/to/save/results \\\n--input-size 3 256 256 \n--distillation-type \"hard\"\n```\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\u003csummary\u003e\nFastViT-S12\n\u003c/summary\u003e\n\n```\n# Without Distillation\npython -m torch.distributed.launch --nproc_per_node=8 train.py \\\n/path/to/ImageNet/dataset --model fastvit_s12 -b 128 --lr 1e-3 \\\n--native-amp --mixup 0.2 --output /path/to/save/results \\\n--input-size 3 256 256\n\n# With Distillation\npython -m torch.distributed.launch --nproc_per_node=8 train.py \\\n/path/to/ImageNet/dataset --model fastvit_s12 -b 128 --lr 1e-3 \\ \n--native-amp --mixup 0.2 --output /path/to/save/results \\\n--input-size 3 256 256 \n--distillation-type \"hard\"\n```\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\u003csummary\u003e\nFastViT-SA12\n\u003c/summary\u003e\n\n```\n# Without Distillation\npython -m torch.distributed.launch --nproc_per_node=8 train.py \\\n/path/to/ImageNet/dataset --model fastvit_sa12 -b 128 --lr 1e-3 \\\n--native-amp --mixup 0.2 --output /path/to/save/results \\\n--input-size 3 256 256 --drop-path 0.1\n\n# With Distillation\npython -m torch.distributed.launch --nproc_per_node=8 train.py \\\n/path/to/ImageNet/dataset --model fastvit_sa12 -b 128 --lr 1e-3 \\ \n--native-amp --output /path/to/save/results \\\n--input-size 3 256 256 \n--distillation-type \"hard\"\n```\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\u003csummary\u003e\nFastViT-SA24\n\u003c/summary\u003e\n\n```\n# Without Distillation\npython -m torch.distributed.launch --nproc_per_node=8 train.py \\\n/path/to/ImageNet/dataset --model fastvit_sa24 -b 128 --lr 1e-3 \\\n--native-amp --mixup 0.2 --output /path/to/save/results \\\n--input-size 3 256 256 --drop-path 0.1\n\n# With Distillation\npython -m torch.distributed.launch --nproc_per_node=8 train.py \\\n/path/to/ImageNet/dataset --model fastvit_sa24 -b 128 --lr 1e-3 \\ \n--native-amp --output /path/to/save/results \\\n--input-size 3 256 256 --drop-path 0.05 \\\n--distillation-type \"hard\"\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nFastViT-SA36\n\u003c/summary\u003e\n\n```\n# Without Distillation\npython -m torch.distributed.launch --nproc_per_node=8 train.py \\\n/path/to/ImageNet/dataset --model fastvit_sa36 -b 128 --lr 1e-3 \\\n--native-amp --mixup 0.2 --output /path/to/save/results \\\n--input-size 3 256 256 --drop-path 0.2\n\n# With Distillation\npython -m torch.distributed.launch --nproc_per_node=8 train.py \\\n/path/to/ImageNet/dataset --model fastvit_sa36 -b 128 --lr 1e-3 \\ \n--native-amp --output /path/to/save/results \\\n--input-size 3 256 256 --drop-path 0.1 \\\n--distillation-type \"hard\"\n```\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\u003csummary\u003e\nFastViT-MA36\n\u003c/summary\u003e\n\n```\n# Without Distillation\npython -m torch.distributed.launch --nproc_per_node=8 train.py \\\n/path/to/ImageNet/dataset --model fastvit_t8 -b 128 --lr 1e-3 \\\n--native-amp --output /path/to/save/results \\\n--input-size 3 256 256 --drop-path 0.35\n\n# With Distillation\npython -m torch.distributed.launch --nproc_per_node=8 train.py \\\n/path/to/ImageNet/dataset --model fastvit_t8 -b 128 --lr 1e-3 \\ \n--native-amp --output /path/to/save/results \\\n--input-size 3 256 256 --drop-path 0.2 \\\n--distillation-type \"hard\"\n```\n\u003c/details\u003e\n\n## Evaluation\nTo run evaluation on ImageNet, follow the example command below:\n\u003cdetails\u003e\n\u003csummary\u003e\nFastViT-T8\n\u003c/summary\u003e\n\n```\n# Evaluate unfused checkpoint\npython validate.py /path/to/ImageNet/dataset --model fastvit_t8 \\\n--checkpoint /path/to/pretrained_checkpoints/fastvit_t8.pth.tar\n\n# Evaluate fused checkpoint\npython validate.py /path/to/ImageNet/dataset --model fastvit_t8 \\\n--checkpoint /path/to/pretrained_checkpoints/fastvit_t8_reparam.pth.tar \\\n--use-inference-mode\n```\n\u003c/details\u003e\n\n## Model Export\nTo export a coreml package file from a pytorch checkpoint, follow the example command below:\n\u003cdetails\u003e\n\u003csummary\u003e\nFastViT-T8\n\u003c/summary\u003e\n\n```\npython export_model.py --variant fastvit_t8 --output-dir /path/to/save/exported_model \\\n--checkpoint /path/to/pretrained_checkpoints/fastvit_t8_reparam.pth.tar\n```\n\u003c/details\u003e\n\n## Citation\n\n```\n@inproceedings{vasufastvit2023,\n  author = {Pavan Kumar Anasosalu Vasu and James Gabriel and Jeff Zhu and Oncel Tuzel and Anurag Ranjan},\n  title = {FastViT:  A Fast Hybrid Vision Transformer using Structural Reparameterization},\n  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},\n  year = {2023}\n}\n```\n\n## Acknowledgements\nOur codebase is built using multiple opensource contributions, please see [ACKNOWLEDGEMENTS](ACKNOWLEDGEMENTS) for more details. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapple%2Fml-fastvit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapple%2Fml-fastvit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapple%2Fml-fastvit/lists"}