{"id":37065647,"url":"https://github.com/awsaf49/gcvit-tf","last_synced_at":"2026-01-14T07:41:16.071Z","repository":{"id":47053597,"uuid":"514655609","full_name":"awsaf49/gcvit-tf","owner":"awsaf49","description":"Tensorflow 2.0 Implementation of GCViT: Global Context Vision Transformer","archived":false,"fork":false,"pushed_at":"2023-12-24T18:55:04.000Z","size":28961,"stargazers_count":27,"open_issues_count":1,"forks_count":6,"subscribers_count":5,"default_branch":"main","last_synced_at":"2026-01-13T06:20:53.803Z","etag":null,"topics":["attention","cnn","computer-vision","image-classification","image-recognition","imagenet","self-attention","transformer"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/awsaf49.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-07-16T18:26:22.000Z","updated_at":"2025-11-19T21:57:09.000Z","dependencies_parsed_at":"2023-02-09T22:45:42.364Z","dependency_job_id":"2f98e93d-2e6c-4e3b-bbc6-0b318d73c7d5","html_url":"https://github.com/awsaf49/gcvit-tf","commit_stats":{"total_commits":212,"total_committers":4,"mean_commits":53.0,"dds":"0.17452830188679247","last_synced_commit":"174ea5855020a4fbcbacdf8e08eba638039df95b"},"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"purl":"pkg:github/awsaf49/gcvit-tf","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awsaf49%2Fgcvit-tf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awsaf49%2Fgcvit-tf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awsaf49%2Fgcvit-tf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awsaf49%2Fgcvit-tf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/awsaf49","download_url":"https://codeload.github.com/awsaf49/gcvit-tf/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awsaf49%2Fgcvit-tf/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28413470,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T05:26:33.345Z","status":"ssl_error","status_checked_at":"2026-01-14T05:21:57.251Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention","cnn","computer-vision","image-classification","image-recognition","imagenet","self-attention","transformer"],"created_at":"2026-01-14T07:41:15.515Z","updated_at":"2026-01-14T07:41:16.064Z","avatar_url":"https://github.com/awsaf49.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\n\u003cp\u003e\u003ca href='https://arxiv.org/pdf/2206.09959v1.pdf'\u003eGCViT: Global Context Vision Transformer\u003c/a\u003e\u003c/p\u003e\n\u003c/h1\u003e\n\u003cdiv align=center\u003e\u003cimg src=\"https://raw.githubusercontent.com/awsaf49/gcvit-tf/main/image/lvg_arch.PNG\" width=800\u003e\u003c/div\u003e\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://github.com/awsaf49/gcvit-tf/blob/main/LICENSE.md\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/License-MIT-yellow.svg\"\u003e\n\u003c/a\u003e\n\u003cimg alt=\"python\" src=\"https://img.shields.io/badge/python-%3E%3D3.6-blue?logo=python\"\u003e\n\u003cimg alt=\"tensorflow\" src=\"https://img.shields.io/badge/tensorflow-%3E%3D2.4.1-orange?logo=tensorflow\"\u003e\n\u003cdiv align=center\u003e\u003cp\u003e\n\u003ca target=\"_blank\" href=\"https://huggingface.co/spaces/awsaf49/gcvit-tf\"\u003e\u003cimg src=\"https://img.shields.io/badge/🤗%20Hugging%20Face-Spaces-yellow.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://colab.research.google.com/github/awsaf49/gcvit-tf/blob/main/notebooks/GCViT_Flower_Classification.ipynb\" target=\"_parent\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\u003c/a\u003e\n\u003ca href=\"https://www.kaggle.com/awsaf49/flower-classification-gcvit-global-context-vit\"\u003e\u003cimg src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" alt=\"Open In Kaggle\"\u003e\u003c/a\u003e\n\u003c/p\u003e\u003c/div\u003e\n\u003ch2 align=\"center\"\u003e\n\u003cp\u003eTensorflow 2.0 Implementation of GCViT\u003c/p\u003e\n\u003c/h2\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\nThis library implements \u003cb\u003eGCViT\u003c/b\u003e using Tensorflow 2.0 specifically in \u003ccode\u003etf.keras.Model\u003c/code\u003e manner to get PyTorch flavor.\n\u003c/p\u003e\n\n## Update\n* **15 Jan 2023** : `GCViTLarge` model added with ckpt.\n* **3 Sept 2022** : Annotated [kaggle-notebook](https://www.kaggle.com/code/awsaf49/gcvit-global-context-vision-transformer) based on this project won [Kaggle ML Research Spotlight: August 2022](https://www.kaggle.com/discussions/general/349817).\n* **19 Aug 2022** : This project got acknowledged by [Official](https://github.com/NVlabs/GCVit) repo [here](https://github.com/NVlabs/GCVit#third-party-implementations-and-resources)\n\n## Paper Implementation \u0026 Explanation **\nI have explained the GCViT paper in a Kaggle notebook **[GCViT: Global Context Vision Transformer](https://www.kaggle.com/code/awsaf49/gcvit-global-context-vision-transformer)**, which also includes a detailed implementation of the model from scratch. The notebook provides a comprehensive explanation of each part of the model, with intuition.\n\nDo check it out, especially if you are interested in learning more about GCViT or implementing it yourself. Note that this notebook has won the **Kaggle ML Research Award 2022.**\n\n## Model\n* Architecture:\n\n\u003cimg src=\"https://raw.githubusercontent.com/awsaf49/gcvit-tf/main/image/arch.PNG\"\u003e\n\n* Local Vs Global Attention:\n\n\u003cimg src=\"https://raw.githubusercontent.com/awsaf49/gcvit-tf/main/image/lvg_msa.PNG\"\u003e\n\n## Result\n\u003cimg src=\"https://raw.githubusercontent.com/awsaf49/gcvit-tf/main/image/result.PNG\" width=900\u003e\n\nOfficial codebase had some issue which has been fixed recently (12 August 2022). Here's the result of ported weights on **ImageNetV2-Test** data,\n\n| Model        | Acc@1 | Acc@5 | #Params |\n|--------------|-------|-------|---------|\n| GCViT-XXTiny | 0.663    | 0.873    | 12M     |\n| GCViT-XTiny  | 0.685    | 0.885    | 20M     |\n| GCViT-Tiny   | 0.708    | 0.899    | 28M     |\n| GCViT-Small  | 0.720    | 0.901    | 51M     |\n| GCViT-Base   | 0.731    | 0.907    | 90M     |\n| GCViT-Large  | 0.734    | 0.913    | 202M    |\n\n## Installation\n```bash\npip install -U gcvit\n# or\n# pip install -U git+https://github.com/awsaf49/gcvit-tf\n```\n\n## Usage\nLoad model using following codes,\n```py\nfrom gcvit import GCViTTiny\nmodel = GCViTTiny(pretrain=True)\n```\n\nAny input size other than **224x224**,\n```py\nfrom gcvit import GCViTTiny\nmodel = GCViTTiny(input_shape=(512,512,3), pretrain=True, resize_query=True)\n```\nSimple code to check model's prediction,\n```py\nfrom skimage.data import chelsea\nimg = tf.keras.applications.imagenet_utils.preprocess_input(chelsea(), mode='torch') # Chelsea the cat\nimg = tf.image.resize(img, (224, 224))[None,] # resize \u0026 create batch\npred = model(img).numpy()\nprint(tf.keras.applications.imagenet_utils.decode_predictions(pred)[0])\n```\nPrediction:\n```py\n[('n02124075', 'Egyptian_cat', 0.9194835),\n('n02123045', 'tabby', 0.009686623), \n('n02123159', 'tiger_cat', 0.0061576385),\n('n02127052', 'lynx', 0.0011503297), \n('n02883205', 'bow_tie', 0.00042479983)]\n```\nFor feature extraction:\n```py\nmodel = GCViTTiny(pretrain=True)  # when pretrain=True, num_classes must be 1000\nmodel.reset_classifier(num_classes=0, head_act=None)\nfeature = model(img)\nprint(feature.shape)\n```\nFeature:\n```py\n(None, 512)\n```\nFor feature map:\n```py\nmodel = GCViTTiny(pretrain=True)  # when pretrain=True, num_classes must be 1000\nfeature = model.forward_features(img)\nprint(feature.shape)\n```\nFeature map:\n```py\n(None, 7, 7, 512)\n```\n\n## Kaggle Models\nThese pre-trained models can also be loaded using [Kaggle Models](https://www.kaggle.com/models/awsaf49/gcvit-tf). Setting `from_kaggle=True` will enforce model to load weights from Kaggle Models without downloading, thus can be used without internet in Kaggle.\n```py\nfrom gcvit import GCViTTiny\nmodel = GCViTTiny(pretrain=True, from_kaggle=True)\n```\n\n## Live-Demo\n* For live demo on Image Classification \u0026 Grad-CAM, with **ImageNet** weights, click \u003ca target=\"_blank\" href=\"https://huggingface.co/spaces/awsaf49/gcvit-tf\"\u003e\u003cimg src=\"https://img.shields.io/badge/Try%20on-Gradio-orange\"\u003e\u003c/a\u003e powered by 🤗 Space and Gradio. here's an example,\n\n\u003ca href=\"https://huggingface.co/spaces/awsaf49/gcvit-tf\"\u003e\u003cimg src=\"image/gradio_demo.JPG\" height=500\u003e\u003c/a\u003e\n\n## Example\nFor working training example checkout these notebooks on **Google Colab** \u003ca href=\"https://colab.research.google.com/github/awsaf49/gcvit-tf/blob/main/notebooks/GCViT_Flower_Classification.ipynb\" target=\"_parent\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\u003c/a\u003e \u0026 **Kaggle** \u003ca href=\"https://www.kaggle.com/awsaf49/flower-classification-gcvit-global-context-vit\"\u003e\u003cimg src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" alt=\"Open In Kaggle\"\u003e\u003c/a\u003e.\n\nHere is grad-cam result after training on Flower Classification Dataset,\n\n\u003cimg src=\"https://raw.githubusercontent.com/awsaf49/gcvit-tf/main/image/flower_gradcam.PNG\" height=500\u003e\n\n\n\n## To Do\n- [ ] Convert it to multi-backend `Keras 3.0`\n- [ ] Segmentation Pipeline\n- [x] Support for `Kaggle Models`\n- [x] Remove `tensorflow_addons`\n- [x] New updated weights have been added.\n- [x] Working training example in Colab \u0026 Kaggle.\n- [x] GradCAM showcase.\n- [x] Gradio Demo.\n- [x] Build model with `tf.keras.Model`.\n- [x] Port weights from official repo.\n- [x] Support for `TPU`.\n\n## Acknowledgement\n* [GCVit](https://github.com/NVlabs/GCVit) (Official)\n* [Swin-Transformer-TF](https://github.com/rishigami/Swin-Transformer-TF)\n* [tfgcvit](https://github.com/shkarupa-alex/tfgcvit/tree/develop/tfgcvit)\n* [keras_cv_attention_models](https://github.com/leondgarse/keras_cv_attention_model)\n\n\n## Citation\n```bibtex\n@article{hatamizadeh2022global,\n  title={Global Context Vision Transformers},\n  author={Hatamizadeh, Ali and Yin, Hongxu and Kautz, Jan and Molchanov, Pavlo},\n  journal={arXiv preprint arXiv:2206.09959},\n  year={2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fawsaf49%2Fgcvit-tf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fawsaf49%2Fgcvit-tf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fawsaf49%2Fgcvit-tf/lists"}