{"id":13736754,"url":"https://github.com/raoyongming/GFNet","last_synced_at":"2025-05-08T13:31:21.637Z","repository":{"id":44440619,"uuid":"382114749","full_name":"raoyongming/GFNet","owner":"raoyongming","description":"[NeurIPS 2021] [T-PAMI] Global Filter Networks for Image Classification","archived":false,"fork":false,"pushed_at":"2023-06-12T04:14:30.000Z","size":5462,"stargazers_count":445,"open_issues_count":8,"forks_count":41,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-11-15T05:31:57.643Z","etag":null,"topics":["computer-vision","deep-learning","image-classification","image-recognition","vision-transformer"],"latest_commit_sha":null,"homepage":"https://gfnet.ivg-research.xyz/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/raoyongming.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-07-01T17:52:42.000Z","updated_at":"2024-11-08T17:01:24.000Z","dependencies_parsed_at":"2024-11-15T05:40:43.442Z","dependency_job_id":null,"html_url":"https://github.com/raoyongming/GFNet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raoyongming%2FGFNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raoyongming%2FGFNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raoyongming%2FGFNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raoyongming%2FGFNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/raoyongming","download_url":"https://codeload.github.com/raoyongming/GFNet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253077253,"owners_count":21850288,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","deep-learning","image-classification","image-recognition","vision-transformer"],"created_at":"2024-08-03T03:01:27.959Z","updated_at":"2025-05-08T13:31:19.966Z","avatar_url":"https://github.com/raoyongming.png","language":"Jupyter Notebook","readme":"# Global Filter Networks for Image Classification\n\nCreated by [Yongming Rao](https://raoyongming.github.io/), [Wenliang Zhao](https://wl-zhao.github.io/), [Zheng Zhu](http://www.zhengzhu.net/), [Jiwen Lu](https://scholar.google.com/citations?user=TN8uDQoAAAAJ\u0026hl=en\u0026authuser=1), [Jie Zhou](https://scholar.google.com/citations?user=6a79aPwAAAAJ\u0026hl=en\u0026authuser=1)\n\nThis repository contains PyTorch implementation for GFNet (NeurIPS 2021 \u0026 T-PAMI).\n\nGlobal Filter Networks is a transformer-style architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity. Our architecture replaces the self-attention layer in vision transformers with three key operations: a 2D discrete Fourier transform, an element-wise multiplication between frequency-domain features and learnable global filters, and a 2D inverse Fourier transform.\n\n![intro](figs/intro.gif)\n\nOur code is based on [pytorch-image-models](https://github.com/rwightman/pytorch-image-models) and [DeiT](https://github.com/facebookresearch/deit).\n\n[[Project Page]](https://gfnet.ivg-research.xyz/) [[arXiv]](https://arxiv.org/abs/2107.00645)\n\n## Global Filter Layer\n\nGFNet is a conceptually simple yet computationally efficient architecture, which consists of several stacking Global Filter Layers and Feedforward Networks (FFN).  The Global Filter Layer mixes tokens with log-linear complexity benefiting from the highly efficient Fast Fourier Transform (FFT) algorithm.  The layer is easy to implement: \n\n```python\nimport torch\nimport torch.nn as nn\nimport torch.fft\n\nclass GlobalFilter(nn.Module):\n    def __init__(self, dim, h=14, w=8):\n        super().__init__()\n        self.complex_weight = nn.Parameter(torch.randn(h, w, dim, 2, dtype=torch.float32) * 0.02)\n\n    def forward(self, x):\n        B, H, W, C = x.shape\n        x = torch.fft.rfft2(x, dim=(1, 2), norm='ortho')\n        weight = torch.view_as_complex(self.complex_weight)\n        x = x * weight\n        x = torch.fft.irfft2(x, s=(H, W), dim=(1, 2), norm='ortho')\n        return x\n\n```\n\nCompared to self-attention and spatial MLP, our Global Filter Layer is much more efficient to process high-resolution feature maps:\n\n![efficiency](figs/efficiency.png)\n\n## Model Zoo\n\nWe provide our GFNet models pretrained on ImageNet:\n| name | arch | Params | FLOPs | acc@1 | acc@5 | url |\n| --- | --- | --- | --- | --- | --- | --- |\n| GFNet-Ti | ```gfnet-ti``` | 7M | 1.3G | 74.6 | 92.2 |  [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/3d0c1579aa524a0a99dd/?dl=1) / [Google Drive](https://drive.google.com/file/d/1_xrfC7c_ccZnVicYDnrViOA_T1N-xoHI/view?usp=sharing)|\n| GFNet-XS | ```gfnet-xs``` | 16M | 2.8G | 78.6 | 94.2 | [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/e0ab5b1583954a1fa9b2/?dl=1) / [Google Drive](https://drive.google.com/file/d/1paf9gQWdsLXrG58R77yJ3U0FiNINg9xN/view?usp=sharing)|\n| GFNet-S | ```gfnet-s``` | 25M | 4.5G | 80.0 | 94.9 | [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/e5561fa070c44d9399bf/?dl=1) / [Google Drive](https://drive.google.com/file/d/18aRey_1abWNMmSL7TZQ4WxpplLRCDGEl/view?usp=sharing)|\n| GFNet-B | ```gfnet-b``` | 43M | 7.9G | 80.7 | 95.1 | [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/2fbf264597af4d72afb3/?dl=1) / [Google Drive](https://drive.google.com/file/d/1OncnXYAXpdjZBq4JK5Y3xacIHOIMePQo/view?usp=sharing)|\n| GFNet-H-Ti | ```gfnet-h-ti``` | 15M | 2.0G | 80.1 | 95.1 | [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/b22dd45eccbe462cbbfb/?dl=1) / [Google Drive](https://drive.google.com/file/d/1Nrq5sfHD9RklCMl6WkcVrAWI5vSVzwSm/view?usp=sharing)|\n| GFNet-H-S | ```gfnet-h-s``` | 32M | 4.5G | 81.5 | 95.6 | [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/5229cb4d1daf48e69675/?dl=1) / [Google Drive](https://drive.google.com/file/d/1w4d7o1LTBjmSkb5NKzgXBBiwdBOlwiie/view?usp=sharing)|\n| GFNet-H-B | ```gfnet-h-b``` | 54M | 8.4G | 82.9 | 96.2 | [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/954c5af21e824ba6b40c/?dl=1) / [Google Drive](https://drive.google.com/file/d/1F900_-yPH7GFYfTt60xn4tu5a926DYL0/view?usp=sharing)|\n\n\n\n## Usage\n\n### Requirements\n\n- torch\u003e=1.8.0\n- torchvision\n- timm\n\n*Note*: To use the ```rfft2``` and ```irfft2``` functions in PyTorch, you need to install PyTorch\u003e=1.8.0. Complex numbers are supported after PyTorch 1.6.0, but the ```fft``` API is slightly different from the current version. \n\n**Data preparation**: download and extract ImageNet images from http://image-net.org/. The directory structure should be\n\n```\n│ILSVRC2012/\n├──train/\n│  ├── n01440764\n│  │   ├── n01440764_10026.JPEG\n│  │   ├── n01440764_10027.JPEG\n│  │   ├── ......\n│  ├── ......\n├──val/\n│  ├── n01440764\n│  │   ├── ILSVRC2012_val_00000293.JPEG\n│  │   ├── ILSVRC2012_val_00002138.JPEG\n│  │   ├── ......\n│  ├── ......\n```\n\n### Evaluation\n\nTo evaluate a pre-trained GFNet model on the ImageNet validation set with a single GPU, run:\n\n```\npython infer.py --data-path /path/to/ILSVRC2012/ --arch arch_name --model-path /path/to/model\n```\n\n\n### Training\n\n#### ImageNet\n\nTo train GFNet models on ImageNet from scratch, run:\n\n```\npython -m torch.distributed.launch --nproc_per_node=8 --use_env main_gfnet.py  --output_dir logs/gfnet-xs --arch gfnet-xs --batch-size 128 --data-path /path/to/ILSVRC2012/\n```\n\nTo finetune a pre-trained model at higher resolution, run:\n\n```\npython -m torch.distributed.launch --nproc_per_node=8 --use_env main_gfnet.py  --output_dir logs/gfnet-xs-img384 --arch gfnet-xs --input-size 384 --batch-size 64 --data-path /path/to/ILSVRC2012/ --lr 5e-6 --weight-decay 1e-8 --min-lr 5e-6 --epochs 30 --finetune /path/to/model\n```\n\n#### Transfer Learning Datasets\n\nTo finetune a pre-trained model on a transfer learning dataset, run:\n```\npython -m torch.distributed.launch --nproc_per_node=8 --use_env main_gfnet_transfer.py  --output_dir logs/gfnet-xs-cars --arch gfnet-xs --batch-size 64 --data-set CARS --data-path /path/to/stanford_cars --epochs 1000 --lr 0.0001 --weight-decay 1e-4 --clip-grad 1 --warmup-epochs 5 --finetune /path/to/model \n```\n\n## Visualization\n\nTo have an intuitive understanding of our Global Filter operation, we visualize the learned filters from different layers of GFNet-XS. \n\n![vis](figs/filters.png)\n\n## License\nMIT License\n\n## Citation\nIf you find our work useful in your research, please consider citing:\n```\n@inproceedings{rao2021global,\n  title={Global Filter Networks for Image Classification},\n  author={Rao, Yongming and Zhao, Wenliang and Zhu, Zheng and Lu, Jiwen and Zhou, Jie},\n  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},\n  year = {2021}\n}\n```\n","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraoyongming%2FGFNet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fraoyongming%2FGFNet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraoyongming%2FGFNet/lists"}