{"id":13442123,"url":"https://github.com/NVlabs/FasterViT","last_synced_at":"2025-03-20T13:32:29.119Z","repository":{"id":174070707,"uuid":"643019926","full_name":"NVlabs/FasterViT","owner":"NVlabs","description":"[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention","archived":false,"fork":false,"pushed_at":"2024-06-02T19:30:17.000Z","size":1345,"stargazers_count":763,"open_issues_count":4,"forks_count":62,"subscribers_count":18,"default_branch":"main","last_synced_at":"2024-09-16T14:19:17.416Z","etag":null,"topics":["ade20k","backbone","coco","deep-learning","foundation-models","image-classification","image-net","object-detection","pre-trained-model","self-attention","semantic-segmentation","vision-transformer","visual-recognition"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2306.06189","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NVlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-19T22:19:34.000Z","updated_at":"2024-09-12T21:25:08.000Z","dependencies_parsed_at":"2024-01-15T19:45:44.768Z","dependency_job_id":"e777bfff-3b33-4eab-8454-93f727b99f49","html_url":"https://github.com/NVlabs/FasterViT","commit_stats":{"total_commits":176,"total_committers":7,"mean_commits":"25.142857142857142","dds":"0.045454545454545414","last_synced_commit":"f3c0211d2b096ac2ea4e75cd25f92f5bc6c809b0"},"previous_names":["nvlabs/fastervit"],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FFasterViT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FFasterViT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FFasterViT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FFasterViT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NVlabs","download_url":"https://codeload.github.com/NVlabs/FasterViT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221768462,"owners_count":16877642,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ade20k","backbone","coco","deep-learning","foundation-models","image-classification","image-net","object-detection","pre-trained-model","self-attention","semantic-segmentation","vision-transformer","visual-recognition"],"created_at":"2024-07-31T03:01:41.953Z","updated_at":"2025-03-20T13:32:29.113Z","avatar_url":"https://github.com/NVlabs.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# FasterViT: Fast Vision Transformers with Hierarchical Attention\n\nOfficial PyTorch implementation of [**FasterViT: Fast Vision Transformers with Hierarchical Attention**](https://arxiv.org/abs/2306.06189).\n\n[![Star on GitHub](https://img.shields.io/github/stars/NVlabs/FasterViT.svg?style=social)](https://github.com/NVlabs/FasterViT/stargazers)\n\n[Ali Hatamizadeh](https://research.nvidia.com/person/ali-hatamizadeh),\n[Greg Heinrich](https://developer.nvidia.com/blog/author/gheinrich/),\n[Hongxu (Danny) Yin](https://hongxu-yin.github.io/),\n[Andrew Tao](https://developer.nvidia.com/blog/author/atao/),\n[Jose M. Alvarez](https://alvarezlopezjosem.github.io/),\n[Jan Kautz](https://jankautz.com/), \n[Pavlo Molchanov](https://www.pmolchanov.com/).\n\nFor business inquiries, please visit our website and submit the form: [NVIDIA Research Licensing](https://www.nvidia.com/en-us/research/inquiries/)\n\n--- \n\nFasterViT achieves a new SOTA Pareto-front in\nterms of Top-1 accuracy and throughput without extra training data !\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://github.com/NVlabs/FasterViT/assets/26806394/6357de9e-5d7f-4e03-8009-2bad1373096c\" width=62% height=62% \nclass=\"center\"\u003e\n\u003c/p\u003e\n\nWe introduce a new self-attention mechanism, denoted as Hierarchical\nAttention (HAT), that captures both short and long-range information by learning\ncross-window carrier tokens.\n\n![teaser](./fastervit/assets/hierarchial_attn.png)\n\nNote: Please use the [**latest NVIDIA TensorRT release**](https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/index.html) to enjoy the benefits of optimized FasterViT ops. \n\n## 💥 News 💥\n- **[04.02.2024]** 🔥 Updated [manuscript](https://arxiv.org/abs/2306.06189) now available on arXiv !\n- **[01.24.2024]** 🔥🔥🔥 **Object Tracking with MOTRv2 + FasterViT** is now open-sourced ([link](./downstream/object_tracking/motrv2/README.md)) ! \n- **[01.17.2024]** 🔥🔥🔥 FasterViT paper has been accepted to [ICLR 2024](https://openreview.net/group?id=ICLR.cc/2024/Conference#tab-your-consoles) !\n- **[10.14.2023]** 🔥🔥 We have added the FasterViT [object detection repository](./downstream/object_detection/dino/README.md) with [DINO](https://arxiv.org/abs/2203.03605) !\n- **[08.24.2023]** 🔥 FasterViT Keras models with pre-trained weights published in [keras_cv_attention_models](https://github.com/leondgarse/keras_cv_attention_models/tree/main/keras_cv_attention_models/fastervit) !  \n- **[08.20.2023]** 🔥🔥 We have added ImageNet-21K SOTA pre-trained models for various resolutions !   \n- **[07.20.2023]** We have created official NVIDIA FasterViT [HuggingFace](https://huggingface.co/nvidia/FasterViT) page.\n- **[07.06.2023]** FasterViT checkpoints are now also accecible in HuggingFace!\n- **[07.04.2023]** ImageNet pretrained FasterViT models can now be imported with **1 line of code**. Please install the latest FasterViT pip package to use this functionality (also supports Any-resolution FasterViT models).\n- **[06.30.2023]** We have further improved the [TensorRT](https://developer.nvidia.com/tensorrt-getting-started) throughput of FasterViT models by 10-15% on average across different models. Please use the [**latest NVIDIA TensorRT release**](https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/index.html) to use these throughput performance gains. \n- **[06.29.2023]** Any-resolution FasterViT model can now be intitialized from pre-trained ImageNet resolution (224 x 244) models.\n- **[06.18.2023]** We have released the FasterViT [pip package](https://pypi.org/project/fastervit/) !\n- **[06.17.2023]** [Any-resolution FasterViT](./fastervit/models/faster_vit_any_res.py)  model is now available ! the model can be used for variety of applications such as detection and segmentation or high-resolution fine-tuning with arbitrary input image resolutions.\n- **[06.09.2023]** 🔥🔥 We have released source code and ImageNet-1K FasterViT-models !\n\n## Quick Start\n\n### Object Detection\n\nPlease see FasterViT [object detection repository](./object_detection/README.md) with [DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection](https://arxiv.org/abs/2203.03605) for more details. \n\n### Classification\n\nWe can import pre-trained FasterViT models with **1 line of code**. Firstly, FasterViT can be simply installed:\n\n```bash\npip install fastervit\n```\nNote: Please upgrate the package to ```fastervit\u003e=0.9.8``` if you have already installed the package to use the pretrained weights. \n\nA pretrained FasterViT model with default hyper-parameters can be created as in:\n\n```python\n\u003e\u003e\u003e from fastervit import create_model\n\n# Define fastervit-0 model with 224 x 224 resolution\n\n\u003e\u003e\u003e model = create_model('faster_vit_0_224', \n                          pretrained=True,\n                          model_path=\"/tmp/faster_vit_0.pth.tar\")\n```\n\n`model_path` is used to set the directory to download the model.\n\nWe can also simply test the model by passing a dummy input image. The output is the logits:\n\n```python\n\u003e\u003e\u003e import torch\n\n\u003e\u003e\u003e image = torch.rand(1, 3, 224, 224)\n\u003e\u003e\u003e output = model(image) # torch.Size([1, 1000])\n```\n\nWe can also use the any-resolution FasterViT model to accommodate arbitrary image resolutions. In the following, we define an any-resolution FasterViT-0\nmodel with input resolution of 576 x 960, window sizes of 12 and 6 in 3rd and 4th stages, carrier token size of 2 and embedding dimension of\n64:\n\n```python\n\u003e\u003e\u003e from fastervit import create_model\n\n# Define any-resolution FasterViT-0 model with 576 x 960 resolution\n\u003e\u003e\u003e model = create_model('faster_vit_0_any_res', \n                          resolution=[576, 960],\n                          window_size=[7, 7, 12, 6],\n                          ct_size=2,\n                          dim=64,\n                          pretrained=True)\n```\nNote that the above model is intiliazed from the original ImageNet pre-trained FasterViT with original resolution of 224 x 224. As a result, missing keys and mis-matches could be expected since we are addign new layers (e.g. addition of new carrier tokens, etc.) \n\nWe can test the model by passing a dummy input image. The output is the logits:\n\n```python\n\u003e\u003e\u003e import torch\n\n\u003e\u003e\u003e image = torch.rand(1, 3, 576, 960)\n\u003e\u003e\u003e output = model(image) # torch.Size([1, 1000])\n```\n\n\n\n## Catalog\n- [x] ImageNet-1K training code\n- [x] ImageNet-1K pre-trained models\n- [x] Any-resolution FasterViT\n- [x] FasterViT pip-package release\n- [x] Add capablity to initialize any-resolution FasterViT from ImageNet-pretrained weights. \n- [x] ImageNet-21K pre-trained models\n- [x] Detection code + models\n\n--- \n\n## Results + Pretrained Models\n\n### ImageNet-1K\n**FasterViT ImageNet-1K Pretrained Models**\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003cth\u003eName\u003c/th\u003e\n    \u003cth\u003eAcc@1(%)\u003c/th\u003e\n    \u003cth\u003eAcc@5(%)\u003c/th\u003e\n    \u003cth\u003eThroughput(Img/Sec)\u003c/th\u003e\n    \u003cth\u003eResolution\u003c/th\u003e\n    \u003cth\u003e#Params(M)\u003c/th\u003e\n    \u003cth\u003eFLOPs(G)\u003c/th\u003e\n    \u003cth\u003eDownload\u003c/th\u003e\n  \u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-0\u003c/td\u003e\n    \u003ctd\u003e82.1\u003c/td\u003e\n    \u003ctd\u003e95.9\u003c/td\u003e\n    \u003ctd\u003e5802\u003c/td\u003e\n    \u003ctd\u003e224x224\u003c/td\u003e\n    \u003ctd\u003e31.4\u003c/td\u003e\n    \u003ctd\u003e3.3\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=1twI2LFJs391Yrj8MR4Ui9PfrvWqjE1iB\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-1\u003c/td\u003e\n    \u003ctd\u003e83.2\u003c/td\u003e\n    \u003ctd\u003e96.5\u003c/td\u003e\n    \u003ctd\u003e4188\u003c/td\u003e\n    \u003ctd\u003e224x224\u003c/td\u003e\n    \u003ctd\u003e53.4\u003c/td\u003e\n    \u003ctd\u003e5.3\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=1r7W10n5-bFtM3sz4bmaLrowN2gYPkLGT\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-2\u003c/td\u003e\n    \u003ctd\u003e84.2\u003c/td\u003e\n    \u003ctd\u003e96.8\u003c/td\u003e\n    \u003ctd\u003e3161\u003c/td\u003e\n    \u003ctd\u003e224x224\u003c/td\u003e\n    \u003ctd\u003e75.9\u003c/td\u003e\n    \u003ctd\u003e8.7\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=1n_a6s0pgi0jVZOGmDei2vXHU5E6RH5wU\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-3\u003c/td\u003e\n    \u003ctd\u003e84.9\u003c/td\u003e\n    \u003ctd\u003e97.2\u003c/td\u003e\n    \u003ctd\u003e1780\u003c/td\u003e\n    \u003ctd\u003e224x224\u003c/td\u003e\n    \u003ctd\u003e159.5\u003c/td\u003e\n    \u003ctd\u003e18.2\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=1tvWElZ91Sia2SsXYXFMNYQwfipCxtI7X\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-4\u003c/td\u003e\n    \u003ctd\u003e85.4\u003c/td\u003e\n    \u003ctd\u003e97.3\u003c/td\u003e\n    \u003ctd\u003e849\u003c/td\u003e\n    \u003ctd\u003e224x224\u003c/td\u003e\n    \u003ctd\u003e424.6\u003c/td\u003e\n    \u003ctd\u003e36.6\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=1gYhXA32Q-_9C5DXel17avV_ZLoaHwdgz\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-5\u003c/td\u003e\n    \u003ctd\u003e85.6\u003c/td\u003e\n    \u003ctd\u003e97.4\u003c/td\u003e\n    \u003ctd\u003e449\u003c/td\u003e\n    \u003ctd\u003e224x224\u003c/td\u003e\n    \u003ctd\u003e975.5\u003c/td\u003e\n    \u003ctd\u003e113.0\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=1mqpai7XiHLr_n1tjxjzT8q369xTCq_z-\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-6\u003c/td\u003e\n    \u003ctd\u003e85.8\u003c/td\u003e\n    \u003ctd\u003e97.4\u003c/td\u003e\n    \u003ctd\u003e352\u003c/td\u003e\n    \u003ctd\u003e224x224\u003c/td\u003e\n    \u003ctd\u003e1360.0\u003c/td\u003e\n    \u003ctd\u003e142.0\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=12jtavR2QxmMzcKwPzWe7kw-oy34IYi59\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003c/table\u003e\n\n### ImageNet-21K\n**FasterViT ImageNet-21K Pretrained Models (ImageNet-1K Fine-tuned)**\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003cth\u003eName\u003c/th\u003e\n    \u003cth\u003eAcc@1(%)\u003c/th\u003e\n    \u003cth\u003eAcc@5(%)\u003c/th\u003e\n    \u003cth\u003eResolution\u003c/th\u003e\n    \u003cth\u003e#Params(M)\u003c/th\u003e\n    \u003cth\u003eFLOPs(G)\u003c/th\u003e\n    \u003cth\u003eDownload\u003c/th\u003e\n  \u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-4-21K-224\u003c/td\u003e\n    \u003ctd\u003e86.6\u003c/td\u003e\n    \u003ctd\u003e97.8\u003c/td\u003e\n    \u003ctd\u003e224x224\u003c/td\u003e\n    \u003ctd\u003e271.9\u003c/td\u003e\n    \u003ctd\u003e40.8\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://huggingface.co/ahatamiz/FasterViT/resolve/main/fastervit_4_21k_224_w14.pth.tar\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-4-21K-384\u003c/td\u003e\n    \u003ctd\u003e87.6\u003c/td\u003e\n    \u003ctd\u003e98.3\u003c/td\u003e\n    \u003ctd\u003e384x384\u003c/td\u003e\n    \u003ctd\u003e271.9\u003c/td\u003e\n    \u003ctd\u003e120.1\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://huggingface.co/ahatamiz/FasterViT/resolve/main/fastervit_4_21k_384_w24.pth.tar\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-4-21K-512\u003c/td\u003e\n    \u003ctd\u003e87.8\u003c/td\u003e\n    \u003ctd\u003e98.4\u003c/td\u003e\n    \u003ctd\u003e512x512\u003c/td\u003e\n    \u003ctd\u003e271.9\u003c/td\u003e\n    \u003ctd\u003e213.5\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://huggingface.co/ahatamiz/FasterViT/resolve/main/fastervit_4_21k_512_w32.pth.tar\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-4-21K-768\u003c/td\u003e\n    \u003ctd\u003e87.9\u003c/td\u003e\n    \u003ctd\u003e98.5\u003c/td\u003e\n    \u003ctd\u003e768x768\u003c/td\u003e\n    \u003ctd\u003e271.9\u003c/td\u003e\n    \u003ctd\u003e480.4\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://huggingface.co/ahatamiz/FasterViT/resolve/main/fastervit_4_21k_768_w48.pth.tar\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003c/table\u003e\n\nRaw pre-trained ImageNet-21K model weights for FasterViT-4 is also available for download in this [link](https://drive.google.com/file/d/1T3jDrzlTmTcZVS1Dh01Fl3J2LXZHWKdL/view?usp=sharing).\n### Robustness (ImageNet-A - ImageNet-R - ImageNet-V2)\n\n\nAll models use `crop_pct=0.875`. Results are obtained by running inference on ImageNet-1K pretrained models without finetuning.\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003cth\u003eName\u003c/th\u003e\n    \u003cth\u003eA-Acc@1(%)\u003c/th\u003e\n    \u003cth\u003eA-Acc@5(%)\u003c/th\u003e\n    \u003cth\u003eR-Acc@1(%)\u003c/th\u003e\n    \u003cth\u003eR-Acc@5(%)\u003c/th\u003e\n    \u003cth\u003eV2-Acc@1(%)\u003c/th\u003e\n    \u003cth\u003eV2-Acc@5(%)\u003c/th\u003e\n  \u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-0\u003c/td\u003e\n    \u003ctd\u003e23.9\u003c/td\u003e\n    \u003ctd\u003e57.6\u003c/td\u003e\n    \u003ctd\u003e45.9\u003c/td\u003e\n    \u003ctd\u003e60.4\u003c/td\u003e\n    \u003ctd\u003e70.9\u003c/td\u003e\n    \u003ctd\u003e90.0\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-1\u003c/td\u003e\n    \u003ctd\u003e31.2\u003c/td\u003e\n    \u003ctd\u003e63.3\u003c/td\u003e\n    \u003ctd\u003e47.5\u003c/td\u003e\n    \u003ctd\u003e61.9\u003c/td\u003e\n    \u003ctd\u003e72.6\u003c/td\u003e\n    \u003ctd\u003e91.0\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-2\u003c/td\u003e\n    \u003ctd\u003e38.2\u003c/td\u003e\n    \u003ctd\u003e68.9\u003c/td\u003e\n    \u003ctd\u003e49.6\u003c/td\u003e\n    \u003ctd\u003e63.4\u003c/td\u003e\n    \u003ctd\u003e73.7\u003c/td\u003e\n    \u003ctd\u003e91.6\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-3\u003c/td\u003e\n    \u003ctd\u003e44.2\u003c/td\u003e\n    \u003ctd\u003e73.0\u003c/td\u003e\n    \u003ctd\u003e51.9\u003c/td\u003e\n    \u003ctd\u003e65.6\u003c/td\u003e\n    \u003ctd\u003e75.0\u003c/td\u003e\n    \u003ctd\u003e92.2\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-4\u003c/td\u003e\n    \u003ctd\u003e49.0\u003c/td\u003e\n    \u003ctd\u003e75.4\u003c/td\u003e\n    \u003ctd\u003e56.0\u003c/td\u003e\n    \u003ctd\u003e69.6\u003c/td\u003e\n    \u003ctd\u003e75.7\u003c/td\u003e\n    \u003ctd\u003e92.7\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-5\u003c/td\u003e\n    \u003ctd\u003e52.7\u003c/td\u003e\n    \u003ctd\u003e77.6\u003c/td\u003e\n    \u003ctd\u003e56.9\u003c/td\u003e\n    \u003ctd\u003e70.0\u003c/td\u003e\n    \u003ctd\u003e76.0\u003c/td\u003e\n    \u003ctd\u003e93.0\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eFasterViT-6\u003c/td\u003e\n    \u003ctd\u003e53.7\u003c/td\u003e\n    \u003ctd\u003e78.4\u003c/td\u003e\n    \u003ctd\u003e57.1\u003c/td\u003e\n    \u003ctd\u003e70.1\u003c/td\u003e\n    \u003ctd\u003e76.1\u003c/td\u003e\n    \u003ctd\u003e93.0\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003c/table\u003e\n\nA, R and V2 denote ImageNet-A, ImageNet-R and ImageNet-V2 respectively. \n\n## Installation\n\nWe provide a [docker file](./Dockerfile). In addition, assuming that a recent [PyTorch](https://pytorch.org/get-started/locally/) package is installed, the dependencies can be installed by running:\n\n```bash\npip install -r requirements.txt\n```\n\n## Training\n\nPlease see [TRAINING.md](TRAINING.md) for detailed training instructions of all models. \n\n## Evaluation\n\nThe FasterViT models can be evaluated on ImageNet-1K validation set using the following: \n\n```\npython validate.py \\\n--model \u003cmodel-name\u003e\n--checkpoint \u003ccheckpoint-path\u003e\n--data_dir \u003cimagenet-path\u003e\n--batch-size \u003cbatch-size-per-gpu\n``` \n\nHere `--model` is the FasterViT variant (e.g. `faster_vit_0_224_1k`), `--checkpoint` is the path to pretrained model weights, `--data_dir` is the path to ImageNet-1K validation set and `--batch-size` is the number of batch size. We also provide a sample script [here](./fastervit/validate.sh). \n\n## ONNX Conversion\n\nWe provide ONNX conversion script to enable dynamic batch size inference. For instance, to generate ONNX model for `faster_vit_0_any_res` with resolution 576 x 960 and ONNX opset number 17, the following can be used. \n\n```bash \npython onnx_convert --model-name faster_vit_0_any_res --simplify --resolution-h 576 --resolution-w 960 --onnx-opset 17\n\n```\n\n## CoreML Conversion\n\nTo generate FasterViT CoreML models, please install `coremltools==5.2.0` and use our provided [script](./coreml_convert.py). \n\nIt is recommended to benchmark the performance by using [Xcode14](https://developer.apple.com/documentation/xcode-release-notes/xcode-14-release-notes) or newer releases. \n\n\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=NVlabs/FasterViT\u0026type=Date)](https://star-history.com/#NVlabs/FasterViT\u0026Date)\n\n\n## Third-party Extentions\nWe always welcome third-party extentions/implementations and usage for other purposes. The following represent third-party contributions by other users.\n\n| Name | Link | Contributor | Framework\n|:---:|:---:|:---:|:---------:|\n|keras_cv_attention_models|[Link](https://github.com/leondgarse/keras_cv_attention_models/tree/main/keras_cv_attention_models/fastervit)| @leondgarse | Keras\n\nIf you would like your work to be listed in this repository, please raise an issue and provide us with detailed information.  \n\n## Citation\n\nPlease consider citing FasterViT if this repository is useful for your work. \n\n```\n@article{hatamizadeh2023fastervit,\n  title={FasterViT: Fast Vision Transformers with Hierarchical Attention},\n  author={Hatamizadeh, Ali and Heinrich, Greg and Yin, Hongxu and Tao, Andrew and Alvarez, Jose M and Kautz, Jan and Molchanov, Pavlo},\n  journal={arXiv preprint arXiv:2306.06189},\n  year={2023}\n}\n```\n\n\n## Licenses\n\nCopyright © 2023, NVIDIA Corporation. All rights reserved.\n\nThis work is made available under the NVIDIA Source Code License-NC. Click [here](LICENSE) to view a copy of this license.\n\nFor license information regarding the timm repository, please refer to its [repository](https://github.com/rwightman/pytorch-image-models).\n\nFor license information regarding the ImageNet dataset, please see the [ImageNet official website](https://www.image-net.org/). \n\n## Acknowledgement\nThis repository is built on top of the [timm](https://github.com/huggingface/pytorch-image-models) repository. We thank [Ross Wrightman](https://rwightman.com/) for creating and maintaining this high-quality library.  \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNVlabs%2FFasterViT","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNVlabs%2FFasterViT","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNVlabs%2FFasterViT/lists"}