{"id":13935220,"url":"https://github.com/NVlabs/FAN","last_synced_at":"2025-07-19T20:31:16.152Z","repository":{"id":37614286,"uuid":"483551063","full_name":"NVlabs/FAN","owner":"NVlabs","description":"Official PyTorch implementation of Fully Attentional Networks","archived":false,"fork":false,"pushed_at":"2023-03-31T18:48:41.000Z","size":9020,"stargazers_count":463,"open_issues_count":15,"forks_count":28,"subscribers_count":21,"default_branch":"master","last_synced_at":"2024-08-08T23:20:38.579Z","etag":null,"topics":["backbone","cityscapes","coco","corruption","deep-learning","image-classification","imagenet","information-bottleneck","object-detection","out-of-distribution","pre-train","self-attention","semantic-segmentation","vision-transformers","visual-grouping","visual-recognition"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2204.12451","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NVlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-04-20T07:27:02.000Z","updated_at":"2024-07-28T08:55:52.000Z","dependencies_parsed_at":"2024-04-22T23:59:00.571Z","dependency_job_id":null,"html_url":"https://github.com/NVlabs/FAN","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FFAN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FFAN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FFAN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2FFAN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NVlabs","download_url":"https://codeload.github.com/NVlabs/FAN/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226666565,"owners_count":17665043,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["backbone","cityscapes","coco","corruption","deep-learning","image-classification","imagenet","information-bottleneck","object-detection","out-of-distribution","pre-train","self-attention","semantic-segmentation","vision-transformers","visual-grouping","visual-recognition"],"created_at":"2024-08-07T23:01:29.902Z","updated_at":"2025-07-19T20:31:16.137Z","avatar_url":"https://github.com/NVlabs.png","language":"Python","funding_links":[],"categories":["Others"],"sub_categories":[],"readme":"# Fully Attentional Networks\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/understanding-the-robustness-in-vision/domain-generalization-on-imagenet-c)](https://paperswithcode.com/sota/domain-generalization-on-imagenet-c?p=understanding-the-robustness-in-vision) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/understanding-the-robustness-in-vision/domain-generalization-on-imagenet-r)](https://paperswithcode.com/sota/domain-generalization-on-imagenet-r?p=understanding-the-robustness-in-vision) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/understanding-the-robustness-in-vision/domain-generalization-on-imagenet-a)](https://paperswithcode.com/sota/domain-generalization-on-imagenet-a?p=understanding-the-robustness-in-vision)\n### [Project Page](https://github.com/NVlabs/FAN) | [Paper](https://arxiv.org/abs/2204.12451) | [Slides](https://docs.google.com/presentation/d/10PCDvHYeb3bvLTOZZR9puxGfoAIdwc8i/edit?usp=sharing\u0026ouid=103738831029004572557\u0026rtpof=true\u0026sd=true) | [Poster](https://drive.google.com/file/d/1wQMRdUI7YqVMBJBxSAI_efHUddDDQOQc/view?usp=sharing)\n\nUnderstanding The Robustness in Vision Transformers. \\\n[Daquan Zhou](https://scholar.google.com/citations?user=DdCAbWwAAAAJ\u0026hl=en), [Zhiding Yu](https://chrisding.github.io/), [Enze Xie](https://xieenze.github.io/), [Chaowei Xiao](https://xiaocw11.github.io/), [Anima Anandkumar](https://research.nvidia.com/person/anima-anandkumar), [Jiashi Feng](https://sites.google.com/site/jshfeng/home) and [Jose M. Alvarez](https://alvarezlopezjosem.github.io/). \\\nInternational Conference on Machine Learning, 2022.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"demo/Teaser.png\" width=60% height=60% \nclass=\"center\"\u003e\n\u003c/p\u003e\n\nThis repository contains the official Pytorch implementation of the training/evaluation code and the pretrained models of [Fully Attentional Network](https://arxiv.org/abs/2204.12451) (**FAN**).\n\n**FAN** is a family of general-purpose Vision Transformer backbones that are highly robust to unseen natural corruptions in various visual recognition tasks.\n\n## Catalog\n- [ ] ImageNet-22K Fine-tuning Code Release\n- [ ] Cityscape-C and COCO-C Dataset Release\n- [x] Pre-trained Model Release\n- [x] Cityscape-C and COCO-C Dataset Generation Script\n- [x] Downstream Transfer (Detection, Segmentation) Code Release\n- [x] ImageNet-1K Training \u0026 Fine-tuning Code Release\n- [x] Init Repo\n\n\n\n\u003c!-- ✅ ⬜️  --\u003e\n\n# Dependencies\nThe repo is built based on timm library, which can be installed via:\npip3 install timm==0.5.4\npip3 install torchvision==0.9.0\n\n# Dataset preparation\nDownload [ImageNet](http://image-net.org/) clean dataset and [ImageNet-C](https://zenodo.org/record/2235448) dataset and structure the datasets as follows:\n\n```\n/path/to/imagenet-C/\n  clean/\n    class1/\n      img3.jpeg\n    class2/\n      img4.jpeg\n  corruption1/\n    severity1/\n      class1/\n        img3.jpeg\n      class2/\n        img4.jpeg\n    severity2/\n      class1/\n        img3.jpeg\n      class2/\n        img4.jpeg\n```\n\nFor other out-of-distribution shift benchmarks, we use [ImageNet-A](https://github.com/hendrycks/natural-adv-examples) or [ImageNet-R](https://github.com/hendrycks/imagenet-r/) for evaluation.\n\n## Results and Pre-trained Models\n### FAN-ViT ImageNet-1K trained models\n\n| Model | Resolution |IN-1K | IN-C| IN-A| IN-R | #Params | Download |\n|:---:|:---:|:---:|:---:| :---:|:---:|:---:|:---:|\n| FAN-T-ViT | 224x224 | 79.2 | 57.5| 15.6 | 42.5 | 7.3M | [model](https://github.com/zhoudaquan/fully_attentional_network_ckpt/releases/download/v1.0.0/fan_vit_tiny.pth.tar) |\n| FAN-S-ViT | 224x224 | 82.5 | 64.5| 29.1 | 50.4 | 28.0M  | [model](https://github.com/zhoudaquan/fully_attentional_network_ckpt/releases/download/v1.0.0/fan_vit_small.pth.tar) |\n| FAN-B-ViT | 224x224 | 83.6 | 67.0| 35.4 | 51.8 | 54.0M  | [model](https://github.com/zhoudaquan/fully_attentional_network_ckpt/releases/download/v1.0.0/fan_vit_base.pth.tar) |\n| FAN-L-ViT | 224x224 | 83.9 | 67.7| 37.2 | 53.1 | 80.5M | [model]() |\n\n### FAN-Hybrid ImageNet-1K trained models\n| Model | Resolution |IN-1K / IN-C| City / City-C| COCO / COCO-C | #Params | Download |\n|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\n| FAN-T-Hybrid | 224x224 | 80.1/57.4 | 81.2/57.1 | 50.2/33.1 | 7.4M | [model](https://github.com/zhoudaquan/fully_attentional_network_ckpt/releases/download/v1.0.0/fan_hybrid_tiny.pth.tar) |\n| FAN-S-Hybrid | 224x224 | 83.5/64.7 | 81.5/66.4 | 53.3/38.7 |26.3M | [model](https://github.com/zhoudaquan/fully_attentional_network_ckpt/releases/download/v1.0.0/fan_hybrid_small.pth.tar) |\n| FAN-B-Hybrid | 224x224 | 83.9/66.4| 82.2/66.9 | 54.2/40.6 |50.4M | [model](https://github.com/zhoudaquan/fully_attentional_network_ckpt/releases/download/v1.0.0/fan_hybrid_base.pth.tar) |\n| FAN-L-Hybrid | 224x224 | 84.3/68.3| 82.3/68.7| 55.1/42.0 |76.8M | [model]() |\n\n### FAN-Hybrid ImageNet-22K trained models\n| Model | Resolution |IN-1K/IN-C | #Params | Download |\n|:---:|:---:|:---:|:---:|:---:|\n| FAN-B-Hybrid | 224x224 | 85.3/70.5 | 50.4M  | [model](https://github.com/zhoudaquan/fully_attentional_network_ckpt/releases/download/v1.0.0/fan_hybrid_base_in22k_1k.pth.tar) |\n| FAN-B-Hybrid | 384x384 | 85.6/- | 50.4M  | [model](https://github.com/zhoudaquan/fully_attentional_network_ckpt/releases/download/v1.0.0/fan_hybrid_base_in22k_1k_384.pth.tar) |\n| FAN-L-Hybrid | 224x224 | 86.5/73.6 | 76.8M | [model](https://github.com/zhoudaquan/fully_attentional_network_ckpt/releases/download/v1.0.0/fan_hybrid_large_in22k_1k.pth.tar) |\n| FAN-L-Hybrid | 384x384 | 87.1/- | 76.8M | [model](https://github.com/zhoudaquan/fully_attentional_network_ckpt/releases/download/v1.0.0/fan_hybrid_large_in22k_1k_384.pth.tar) |\n\nThe pre-trained model weights for [FAN-B-Hybrid](https://github.com/zhoudaquan/fully_attentional_network_ckpt/releases/download/v1.0.0/fan_hybrid_base_in22k.pth.tar) and [FAN-L-Hybrid](https://github.com/zhoudaquan/fully_attentional_network_ckpt/releases/download/v1.0.0/fan_hybrid_large_in22k.pth.tar) on ImageNet22K without fine-tuning on ImageNet-1k are also uploaded. Checkpoints cabn be downloaded by clicking on the model name.\n## Demos\n### Semantic Segmentation on Cityscapes-C\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"demo/Demo_CityC.gif\" alt=\"animated\"\u003e\n\u003c/p\u003e\n\n\n## ImageNet-1K Training \nFAN-T training on ImageNet-1K with 4 8-GPU nodes:\n```\npython3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=$rank_num \\\n\t--node_rank=$rank_index --master_addr=\"ip.addr\" --master_port=$MASTER_PORT \\\n\t main.py  /PATH/TO/IMAGENET/ --model fan_tiny_8_p4_hybrid -b 32 --sched cosine --epochs 300 \\\n\t--opt adamw -j 16 --warmup-epochs 5  \\\n\t--lr 10e-4 --drop-path .1 --img-size 224 \\\n\t--output ../fan_tiny_8_p4_hybrid/ \\\n\t--amp --model-ema \\\n```\n\n## Robustness on ImageNet-C\n```\nbash scripts/imagenet_c_val.sh $model_name $ckpt\n```\n\n## Measurement on ImageNet-A\n```\nbash scripts/imagenet_a_val.sh $model_name $ckpt\n```\n\n## Measurement on ImageNet-R\n```\nbash scripts/imagenet_r_val.sh $model_name $ckpt\n```\n\n## Acknowledgement\nThis repository is built using the [timm](https://github.com/rwightman/pytorch-image-models) library, [DeiT](https://github.com/facebookresearch/deit), [PVT](https://github.com/whai362/PVT) and [SegFormer](https://github.com/NVlabs/SegFormer) repositories.\n\n## Citation\nIf you find this repository helpful, please consider citing:\n```\n@inproceedings{zhou2022understanding,\n  title   = {Understanding The Robustness in Vision Transformers},\n  author  = {Daquan Zhou, Zhiding Yu, Enze Xie, Chaowei Xiao, Anima Anandkumar, Jiashi Feng, Jose M. Alvarez},\n  booktitle = {International Conference on Machine Learning (ICML)},\n  year    = {2022},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNVlabs%2FFAN","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNVlabs%2FFAN","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNVlabs%2FFAN/lists"}