{"id":13737879,"url":"https://github.com/NVlabs/GCVit","last_synced_at":"2025-05-08T15:31:58.343Z","repository":{"id":37367861,"uuid":"504708536","full_name":"NVlabs/GCVit","owner":"NVlabs","description":"[ICML 2023] Official PyTorch implementation of Global Context Vision Transformers","archived":false,"fork":false,"pushed_at":"2023-12-22T13:04:04.000Z","size":879,"stargazers_count":425,"open_issues_count":1,"forks_count":49,"subscribers_count":10,"default_branch":"main","last_synced_at":"2024-11-15T06:32:43.074Z","etag":null,"topics":["ade20k","backbone","coco","deep-learning","imagenet","imagenet-classification","object-detection","pre-train","pre-trained-model","self-attention","semantic-segmentation","vision-transformer","visual-recognition"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2206.09959","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NVlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-06-18T00:44:21.000Z","updated_at":"2024-11-05T12:34:08.000Z","dependencies_parsed_at":"2023-12-22T14:31:28.185Z","dependency_job_id":"3f95ed5d-9c09-43ef-a8f7-dd7da9ab36f9","html_url":"https://github.com/NVlabs/GCVit","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FGCVit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FGCVit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FGCVit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FGCVit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NVlabs","download_url":"https://codeload.github.com/NVlabs/GCVit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253096212,"owners_count":21853559,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ade20k","backbone","coco","deep-learning","imagenet","imagenet-classification","object-detection","pre-train","pre-trained-model","self-attention","semantic-segmentation","vision-transformer","visual-recognition"],"created_at":"2024-08-03T03:02:04.287Z","updated_at":"2025-05-08T15:31:58.073Z","avatar_url":"https://github.com/NVlabs.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Global Context Vision Transformer (GC ViT)\n\nThis repository presents the official PyTorch implementation of **Global Context Vision Transformers** (ICML2023) \\\n \\\n[Global Context Vision\nTransformers](https://arxiv.org/pdf/2206.09959.pdf) \\\n[Ali Hatamizadeh](https://research.nvidia.com/person/ali-hatamizadeh),\n[Hongxu (Danny) Yin](https://scholar.princeton.edu/hongxu),\n[Greg Heinrich](https://developer.nvidia.com/blog/author/gheinrich/),\n[Jan Kautz](https://jankautz.com/), \nand [Pavlo Molchanov](https://www.pmolchanov.com/).\n\nGC ViT  achieves state-of-the-art results across image classification, object detection and semantic segmentation tasks. On ImageNet-1K dataset for classification, GC ViT variants with `51M`, `90M` and `201M` parameters achieve `84.3`, `85.9` and `85.7` Top-1 accuracy, respectively, surpassing comparably-sized prior art such as CNN-based ConvNeXt and ViT-based Swin Transformer.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://github.com/NVlabs/GCVit/assets/26806394/d1820d6d-3aef-470e-a1d3-af370f1c1f77\" width=63% height=63% \nclass=\"center\"\u003e\n\u003c/p\u003e\n\n\nThe architecture of GC ViT is demonstrated in the following:\n\n![gc_vit](https://github.com/NVlabs/GCVit/assets/26806394/86ca853e-56bc-4907-b3e3-0c4611ef9073)\n\n\n## 💥 News 💥\n- **[10.14.2023]** 🔥 We have released the [object detection code](https://github.com/NVlabs/GCVit/tree/main/detection\n) !\n- **[07.27.2023]**  We will present GC ViT in the (1:30-3:30 HDT) ICML23 session in exhibit hall#1, poster #516.   \n- **[07.22.2023]** 🔥🔥 We have released pretrained 21K GC ViT-L checkpoint for 512 x 512 resolution ! \n- **[07.22.2023]** Pretrained checkpoints are now available in official [NVIDIA GCViT HuggingFace](https://huggingface.co/nvidia/GCViT) page !\n- **[07.21.2023]** 🔥 We have released the object detection/instance segmentation [code](./detection/README.md) ! \n- **[05.21.2023]** 🔥 We have released ImageNet-21K fine-tuned GC ViT model weights for 224x224 and 384x384.\n- **[05.21.2023]** 🔥🔥 We have released new ImageNet-1K GC ViT model weights with **better performance** !\n- **[04.24.2023]** 🔥🔥🔥 GC ViT has been accepted to **ICML 2023** !\n\n\n## Introduction\n\n**GC ViT** leverages global context self-attention modules, joint with local self-attention, to effectively yet efficiently model both long and short-range spatial interactions, without the need for expensive \noperations such as computing attention masks or shifting local windows.\n\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://github.com/NVlabs/GCVit/assets/26806394/da64f22a-e7af-4577-8884-b08ba4e24e49\" width=72% height=72% \nclass=\"center\"\u003e\n\u003c/p\u003e\n\n\n## ImageNet Benchmarks\n\n\n**ImageNet-1K Pretrained Models**\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003cth\u003eModel Variant\u003c/th\u003e\n    \u003cth\u003eAcc@1\u003c/th\u003e\n    \u003cth\u003e#Params(M)\u003c/th\u003e\n    \u003cth\u003eFLOPs(G)\u003c/th\u003e\n    \u003cth\u003eDownload\u003c/th\u003e\n  \u003c/tr\u003e\n\u003ctr\u003e\n    \u003ctd\u003eGC ViT-XXT\u003c/td\u003e\n    \u003cth\u003e79.9\u003c/th\u003e\n    \u003ctd\u003e12\u003c/td\u003e\n    \u003ctd\u003e2.1\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=1apSIWQCa5VhWLJws8ugMTuyKzyayw4Eh\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n    \u003ctd\u003eGC ViT-XT\u003c/td\u003e\n    \u003cth\u003e82.0\u003c/th\u003e\n    \u003ctd\u003e20\u003c/td\u003e\n    \u003ctd\u003e2.6\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=1OgSbX73AXmE0beStoJf2Jtda1yin9t9m\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n    \u003ctd\u003eGC ViT-T\u003c/td\u003e\n    \u003cth\u003e83.5\u003c/th\u003e\n    \u003ctd\u003e28\u003c/td\u003e\n    \u003ctd\u003e4.7\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=11M6AsxKLhfOpD12Nm_c7lOvIIAn9cljy\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n    \u003ctd\u003eGC ViT-T2\u003c/td\u003e\n    \u003cth\u003e83.7\u003c/th\u003e\n    \u003ctd\u003e34\u003c/td\u003e\n    \u003ctd\u003e5.5\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=1cTD8VemWFiwAx0FB9cRMT-P4vRuylvmQ\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n    \u003ctd\u003eGC ViT-S\u003c/td\u003e\n    \u003cth\u003e84.3\u003c/th\u003e\n    \u003ctd\u003e51\u003c/td\u003e\n    \u003ctd\u003e8.5\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=1Nn6ABKmYjylyWC0I41Q3oExrn4fTzO9Y\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n    \u003ctd\u003eGC ViT-S2\u003c/td\u003e\n    \u003cth\u003e84.8\u003c/th\u003e\n    \u003ctd\u003e68\u003c/td\u003e\n    \u003ctd\u003e10.7\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=1E5TtYpTqILznjBLLBTlO5CGq343RbEan\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n    \u003ctd\u003eGC ViT-B\u003c/td\u003e\n    \u003cth\u003e85.0\u003c/th\u003e\n    \u003ctd\u003e90\u003c/td\u003e\n    \u003ctd\u003e14.8\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=1PF7qfxKLcv_ASOMetDP75n8lC50gaqyH\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd\u003eGC ViT-L\u003c/td\u003e\n    \u003cth\u003e85.7\u003c/th\u003e\n    \u003ctd\u003e201\u003c/td\u003e\n    \u003ctd\u003e32.6\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=1Lkz1nWKTwCCUR7yQJM6zu_xwN1TR0mxS\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003c/table\u003e\n\n\n**ImageNet-21K Pretrained Models**\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003cth\u003eModel Variant\u003c/th\u003e\n    \u003cth\u003eResolution\u003c/th\u003e\n    \u003cth\u003eAcc@1\u003c/th\u003e\n    \u003cth\u003e#Params(M)\u003c/th\u003e\n    \u003cth\u003eFLOPs(G)\u003c/th\u003e\n    \u003cth\u003eDownload\u003c/th\u003e\n  \u003c/tr\u003e\n\u003ctr\u003e\n    \u003ctd\u003eGC ViT-L\u003c/td\u003e\n    \u003ctd\u003e224 x 224\u003c/td\u003e\n    \u003cth\u003e86.6\u003c/th\u003e\n    \u003ctd\u003e201\u003c/td\u003e\n    \u003ctd\u003e32.6\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=1maGDr6mJkLyRTUkspMzCgSlhDzNRFGEf\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n    \u003ctd\u003eGC ViT-L\u003c/td\u003e\n    \u003ctd\u003e384 x 384\u003c/td\u003e\n    \u003cth\u003e87.4\u003c/th\u003e\n    \u003ctd\u003e201\u003c/td\u003e\n    \u003ctd\u003e120.4\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://drive.google.com/uc?export=download\u0026id=1P-IEhvQbJ3FjnunVkM1Z9dEpKw-tsuWv\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n    \u003ctd\u003eGC ViT-L\u003c/td\u003e\n    \u003ctd\u003e512 x 512\u003c/td\u003e\n    \u003cth\u003e87.6\u003c/th\u003e\n    \u003ctd\u003e201\u003c/td\u003e\n    \u003ctd\u003e245.0\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://huggingface.co/nvidia/GCViT/resolve/main/gcvit_21k_large_512.pth.tar\"\u003emodel\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003c/table\u003e\n\n\n## Installation\n\nThe dependencies can be installed by running:\n\n```bash\npip install -r requirements.txt\n```\n\n## Data Preparation\n\nPlease download the ImageNet dataset from its official website. The training and validation images need to have\nsub-folders for each class with the following structure:\n\n```bash\n  imagenet\n  ├── train\n  │   ├── class1\n  │   │   ├── img1.jpeg\n  │   │   ├── img2.jpeg\n  │   │   └── ...\n  │   ├── class2\n  │   │   ├── img3.jpeg\n  │   │   └── ...\n  │   └── ...\n  └── val\n      ├── class1\n      │   ├── img4.jpeg\n      │   ├── img5.jpeg\n      │   └── ...\n      ├── class2\n      │   ├── img6.jpeg\n      │   └── ...\n      └── ...\n \n  ```\n\n## Commands\n\n### Training on ImageNet-1K From Scratch (Multi-GPU)\n\nThe `GC ViT` model can be trained on ImageNet-1K dataset by running:\n\n```bash\npython -m torch.distributed.launch --nproc_per_node \u003cnum-of-gpus\u003e --master_port 11223  train.py \\ \n--config \u003cconfig-file\u003e --data_dir \u003cimagenet-path\u003e --batch-size --amp \u003cbatch-size-per-gpu\u003e --tag \u003crun-tag\u003e --model-ema\n```\n\nTo resume training from a pre-trained checkpoint:\n\n```bash\npython -m torch.distributed.launch --nproc_per_node \u003cnum-of-gpus\u003e --master_port 11223  train.py \\ \n--resume \u003ccheckpoint-path\u003e --config \u003cconfig-file\u003e --amp --data_dir \u003cimagenet-path\u003e --batch-size \u003cbatch-size-per-gpu\u003e --tag \u003crun-tag\u003e --model-ema\n```\n\n### Evaluation\n\nTo evaluate a pre-trained checkpoint using ImageNet-1K validation set on a single GPU:\n\n```bash\npython validate.py --model \u003cmodel-name\u003e --checkpoint \u003ccheckpoint-path\u003e --data_dir \u003cimagenet-path\u003e --batch-size \u003cbatch-size-per-gpu\u003e\n```\n\n## Citation\n\nPlease consider citing GC ViT paper if it is useful for your work:\n\n```\n@inproceedings{hatamizadeh2023global,\n  title={Global context vision transformers},\n  author={Hatamizadeh, Ali and Yin, Hongxu and Heinrich, Greg and Kautz, Jan and Molchanov, Pavlo},\n  booktitle={International Conference on Machine Learning},\n  pages={12633--12646},\n  year={2023},\n  organization={PMLR}\n}\n```\n\n## Third-party Implementations and Resources\n\nIn this section, we list third-party contributions by other users. If you would like to have your work included here, please\nraise an issue in this repository.\n\n| Name | Link | Contributor | Framework\n|:---:|:---:|:---:|:---------:|\n|timm|[Link](https://github.com/rwightman/pytorch-image-models)| @rwightman | PyTorch\n|tfgcvit|[Link](https://github.com/shkarupa-alex/tfgcvit)| @shkarupa-alex | Tensorflow 2.0 (Keras)\n|gcvit-tf|[Link](https://github.com/awsaf49/gcvit-tf)| @awsaf49 | Tensorflow 2.0 (Keras)\n|GCViT-TensorFlow|[Link](https://github.com/EMalagoli92/GCViT-TensorFlow)| @EMalagoli92 | Tensorflow 2.0 (Keras)\n|keras_cv_attention_models|[Link](https://github.com/leondgarse/keras_cv_attention_models/tree/main/keras_cv_attention_models/gcvit)| @leondgarse | Keras\n|flaim|[Link](https://github.com/BobMcDear/flaim)| @BobMcDear | JAX/Flax\n\n## Additional Resources\n\nWe list additional GC ViT resources such as notebooks, demos, paper explanations in this section. If you have created similar items and would like to be included, please raise an issue in this repository.\n\n| Name | Link | Contributor | Note\n|:---:|:---:|:---:|:---------:|\n|Paper Explanation|[Link](https://www.kaggle.com/code/awsaf49/guie-global-context-vit-gcvit)| @awsaf49 | Annotated GC ViT\n|Colab Notebook|[Link](https://colab.research.google.com/github/awsaf49/gcvit-tf/blob/main/notebooks/GCViT_Flower_Classification.ipynb)| @awsaf49 | Flower classification\n|Kaggle Notebook|[Link](https://www.kaggle.com/code/awsaf49/flower-classification-gcvit-global-context-vit/notebook)| @awsaf49 | Flower classification\n|Live Demo|[Link](https://huggingface.co/spaces/awsaf49/gcvit-tf)| @awsaf49 | Hugging Face demo\n\n\n## Licenses\n\nCopyright © 2023, NVIDIA Corporation. All rights reserved.\n\nThis work is made available under the Nvidia Source Code License-NC. Click [here](LICENSE) to view a copy of this license.\n\nThe pre-trained models are shared under [CC-BY-NC-SA-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.\n\nFor license information regarding the timm, please refer to its [repository](https://github.com/rwightman/pytorch-image-models).\n\nFor license information regarding the ImageNet dataset, please refer to the ImageNet [official website](https://www.image-net.org/). \n\n\n\n## Acknowledgement\n\n- This repository is built upon the [timm](https://github.com/rwightman/pytorch-image-models) library. \n\n- We would like to sincerely thank the community especially Github users @rwightman, @shkarupa-alex, @awsaf49, @leondgarse, who have provided insightful feedback, which has helped us to further improve GC ViT and achieve even better benchmarks.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNVlabs%2FGCVit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNVlabs%2FGCVit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNVlabs%2FGCVit/lists"}