{"id":20423992,"url":"https://github.com/emalagoli92/gcvit-tensorflow","last_synced_at":"2025-04-12T18:43:29.766Z","repository":{"id":58090775,"uuid":"529889215","full_name":"EMalagoli92/GCViT-TensorFlow","owner":"EMalagoli92","description":"TensorFlow 2.X reimplementation of Global Context Vision Transformers, Ali Hatamizadeh, Hongxu (Danny) Yin, Jan Kautz Pavlo Molchanov.","archived":false,"fork":false,"pushed_at":"2023-01-25T17:19:13.000Z","size":346,"stargazers_count":7,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-06T18:51:10.727Z","etag":null,"topics":["computer-vision","deep-learning","image-classification","python","pytorch","tensorflow","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EMalagoli92.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-08-28T14:40:22.000Z","updated_at":"2023-11-05T15:35:38.000Z","dependencies_parsed_at":"2023-02-14T09:46:59.963Z","dependency_job_id":null,"html_url":"https://github.com/EMalagoli92/GCViT-TensorFlow","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EMalagoli92%2FGCViT-TensorFlow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EMalagoli92%2FGCViT-TensorFlow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EMalagoli92%2FGCViT-TensorFlow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EMalagoli92%2FGCViT-TensorFlow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EMalagoli92","download_url":"https://codeload.github.com/EMalagoli92/GCViT-TensorFlow/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248617141,"owners_count":21134190,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","deep-learning","image-classification","python","pytorch","tensorflow","transformers"],"created_at":"2024-11-15T07:08:24.576Z","updated_at":"2025-04-12T18:43:29.732Z","avatar_url":"https://github.com/EMalagoli92.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n  \u003ca href=\"https://www.tensorflow.org\"\u003e![TensorFLow](https://img.shields.io/badge/TensorFlow-2.X-orange?style=for-the-badge) \n  \u003ca href=\"https://github.com/EMalagoli92/GCViT-TensorFlow/blob/main/LICENSE\"\u003e![License](https://img.shields.io/github/license/EMalagoli92/GCViT-TensorFlow?style=for-the-badge) \n  \u003ca href=\"https://www.python.org\"\u003e![Python](https://img.shields.io/badge/python-%3E%3D%203.9-blue?style=for-the-badge)\u003c/a\u003e  \n  \n\u003c/div\u003e\n\n# GCViT-TensorFlow\nTensorFlow 2.X reimplementation of [Global Context Vision Transformers](https://arxiv.org/abs/2206.09959) [Ali Hatamizadeh](http://web.cs.ucla.edu/~ahatamiz),\n[Hongxu (Danny) Yin](https://scholar.princeton.edu/hongxu), [Jan Kautz](https://jankautz.com/) [Pavlo Molchanov](https://www.pmolchanov.com/).\n\n- Exact TensorFlow reimplementation of official PyTorch repo, including `timm` modules used by authors, preserving models and layers structure.\n- ImageNet pretrained weights ported from PyTorch official implementation.\n\n## Table of contents\n- [Abstract](#abstract)\n- [Results](#results)\n- [Installation](#installation)\n- [Usage](#usage)\n- [Acknowledgement](#acknowledgement)\n- [Citations](#citations)\n- [License](#license)\n\n\u003cdiv id=\"abstract\"/\u003e\n\n## Abstract\n*GC ViT  achieves state-of-the-art results across image classification, object detection and semantic segmentation tasks. On ImageNet-1K dataset for classification, the tiny, small and base variants of GC ViT with `28M`, `51M` and `90M`, surpass comparably-sized prior art such as CNN-based ConvNeXt and ViT-based Swin Transformer by a large margin. Pre-trained GC ViT backbones in downstream tasks of object detection, instance segmentation, \nand semantic segmentation using MS COCO and ADE20K datasets outperform prior work consistently, sometimes by large margins.*\n\n![Alt text](https://raw.githubusercontent.com/EMalagoli92/GCViT-TensorFlow/main/assets/images/comp_plots.png?raw=true)\n\u003cp align = \"center\"\u003e \u003csub\u003eTop-1 accuracy vs. model FLOPs/parameter size on ImageNet-1K dataset. GC ViT achieves\nnew SOTA benchmarks for different model sizes as well as FLOPs, outperforming competing approaches by a\nsignificant margin.\u003c/sub\u003e \u003c/p\u003e\n\n![Alt text](https://github.com/EMalagoli92/GCViT-TensorFlow/raw/main/assets/images/arch.png?raw=true)\n\u003cp align = \"center\"\u003e\u003csub\u003eArchitecture of the Global Context ViT. The authors use alternating blocks of local and global\ncontext self attention layers in each stage of the architecture.\u003c/sub\u003e\u003c/p\u003e\n\n\u003cdiv id=\"results\"/\u003e\n\n## Results\nTensorFlow implementation and ImageNet ported weights have been compared to the official PyTorch implementation on [ImageNet-V2](https://www.tensorflow.org/datasets/catalog/imagenet_v2) test set.\n\n| Configuration  | Top-1 (Original) | Top-1 (Ported) | Top-5 (Original) | Top-5 (Ported) | #Params\n| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |\n| GCViT-XXTiny  | 68.79 | 68.73 | 88.52 | 88.47 | 12M |\n| GCViT-XTiny  | 70.97 | 71 | 89.8 | 89.79 | 20M |\n| GCViT-Tiny  | 72.93 | 72.9| 90.7 | 90.7 | 28M | \n| GCViT-Small  | 73.46 | 73.5 | 91.14 | 91.08 | 51M |\n| GCViT-Base  | 74.13 | 74.16 | 91.66 | 91.69 | 90M |\n\nMean metrics difference: `3e-4`.\n\n\u003cdiv id=\"installation\"/\u003e\n\n## Installation\n- Install from PyPI\n```\npip install gcvit-tensorflow\n```\n- Install from Github\n```\npip install git+https://github.com/EMalagoli92/GCViT-TensorFlow\n```\n- Clone the repo and install necessary packages \n```\ngit clone https://github.com/EMalagoli92/GCViT-TensorFlow.git\npip install -r requirements.txt\n```\n\nTested on *Ubuntu 20.04.4 LTS x86_64*, *python 3.9.7*.\n\n\u003cdiv id=\"usage\"/\u003e\n\n## Usage\n- Define a custom GCViT configuration.\n```python\nfrom gcvit_tensorflow import GCViT\n\n# Define a custom GCViT configuration\nmodel = GCViT(\n    depths=[2, 2, 6, 2],\n    num_heads=[2, 4, 8, 16],\n    window_size=[7, 7, 14, 7],\n    dim=64,\n    resolution=224,\n    in_chans=3,\n    mlp_ratio=3,\n    drop_path_rate=0.2,\n    data_format=\"channels_last\",\n    num_classes=100,\n    classifier_activation=\"softmax\",\n)\n```\n- Use a predefined GCViT configuration.\n```python\nfrom gcvit_tensorflow import GCViT\n\nmodel = GCViT(configuration=\"xxtiny\")\nmodel.build((None, 224, 224, 3))\nprint(model.summary())\n```\n```\nModel: \"xxtiny\"\n_________________________________________________________________\n Layer (type)                Output Shape              Param #   \n=================================================================\n patch_embed (PatchEmbed)    (None, 56, 56, 64)        45632     \n                                                                 \n pos_drop (Dropout)          (None, 56, 56, 64)        0         \n                                                                 \n levels/0 (GCViTLayer)       (None, 28, 28, 128)       185766    \n                                                                 \n levels/1 (GCViTLayer)       (None, 14, 14, 256)       693258    \n                                                                 \n levels/2 (GCViTLayer)       (None, 7, 7, 512)         5401104   \n                                                                 \n levels/3 (GCViTLayer)       (None, 7, 7, 512)         5400546   \n                                                                 \n norm (LayerNorm_)           (None, 7, 7, 512)         1024      \n                                                                 \n avgpool (AdaptiveAveragePoo  (None, 512, 1, 1)        0         \n ling2D)                                                         \n                                                                 \n head (Linear_)              (None, 1000)              513000    \n                                                                 \n=================================================================\nTotal params: 12,240,330\nTrainable params: 11,995,428\nNon-trainable params: 244,902\n_________________________________________________________________\n```\n- Train from scratch the model.\n```python    \n# Example\nmodel.compile(\n    optimizer=\"sgd\",\n    loss=\"sparse_categorical_crossentropy\",\n    metrics=[\"accuracy\", \"sparse_top_k_categorical_accuracy\"],\n)\nmodel.fit(x, y)\n```\n- Use ported ImageNet pretrained weights\n```python\n# Example\nfrom gcvit_tensorflow import GCViT\n\nmodel = GCViT(configuration=\"base\", pretrained=True, classifier_activation=\"softmax\")\ny_pred = model(image)\n```\n\n\u003cdiv id=\"acknowledgement\"/\u003e\n\n## Acknowledgement\n- [GCViT](https://github.com/nvlabs/gcvit) (Official PyTorch implementation)\n- [gcvit_tf](https://github.com/awsaf49/gcvit-tf)\n- [tfgcvit](https://github.com/shkarupa-alex/tfgcvit)\n\n\u003cdiv id=\"citations\"/\u003e\n\n## Citations\n```bibtex\n@article{hatamizadeh2022global,\n  title={Global Context Vision Transformers},\n  author={Hatamizadeh, Ali and Yin, Hongxu and Kautz, Jan and Molchanov, Pavlo},\n  journal={arXiv preprint arXiv:2206.09959},\n  year={2022}\n}\n```\n\n\u003cdiv id=\"license\"/\u003e\n\n## License\nThis work is made available under the [MIT License](https://github.com/EMalagoli92/GCViT-TensorFlow/blob/main/LICENSE)\n\nThe pre-trained weights are shared under [CC-BY-NC-SA-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femalagoli92%2Fgcvit-tensorflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Femalagoli92%2Fgcvit-tensorflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femalagoli92%2Fgcvit-tensorflow/lists"}