{"id":13577987,"url":"https://github.com/dqshuai/MetaFormer","last_synced_at":"2025-04-05T15:32:03.065Z","repository":{"id":37357333,"uuid":"466428937","full_name":"dqshuai/MetaFormer","owner":"dqshuai","description":"A PyTorch implementation of \"MetaFormer: A Unified Meta Framework for Fine-Grained Recognition\". A reference PyTorch implementation of “CoAtNet: Marrying Convolution and Attention for All Data Sizes”","archived":false,"fork":false,"pushed_at":"2022-04-24T13:25:03.000Z","size":592,"stargazers_count":234,"open_issues_count":18,"forks_count":42,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-04T01:27:43.970Z","etag":null,"topics":["fine-grained-classification","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dqshuai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-03-05T11:05:42.000Z","updated_at":"2025-04-03T17:07:10.000Z","dependencies_parsed_at":"2022-09-15T04:41:05.678Z","dependency_job_id":null,"html_url":"https://github.com/dqshuai/MetaFormer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dqshuai%2FMetaFormer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dqshuai%2FMetaFormer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dqshuai%2FMetaFormer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dqshuai%2FMetaFormer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dqshuai","download_url":"https://codeload.github.com/dqshuai/MetaFormer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247359084,"owners_count":20926358,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fine-grained-classification","pytorch"],"created_at":"2024-08-01T15:01:26.073Z","updated_at":"2025-04-05T15:31:58.049Z","avatar_url":"https://github.com/dqshuai.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/metaformer-a-unified-meta-framework-for-fine/fine-grained-image-classification-on-cub-200)](https://paperswithcode.com/sota/fine-grained-image-classification-on-cub-200?p=metaformer-a-unified-meta-framework-for-fine)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/metaformer-a-unified-meta-framework-for-fine/fine-grained-image-classification-on-nabirds)](https://paperswithcode.com/sota/fine-grained-image-classification-on-nabirds?p=metaformer-a-unified-meta-framework-for-fine)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/metaformer-a-unified-meta-framework-for-fine/image-classification-on-inaturalist)](https://paperswithcode.com/sota/image-classification-on-inaturalist?p=metaformer-a-unified-meta-framework-for-fine)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/metaformer-a-unified-meta-framework-for-fine/image-classification-on-inaturalist-2018)](https://paperswithcode.com/sota/image-classification-on-inaturalist-2018?p=metaformer-a-unified-meta-framework-for-fine)\n# MetaFormer\nA repository for the code used to create and train the model defined in “MetaFormer: A Unified Meta Framework for Fine-Grained Recognition” [arxiv:2203.02751](http://arxiv.org/abs/2203.02751)\n![Image text](figs/overview.png)\nMoreover, MetaFormer is similar to CoAtNet. Therefore, this repo can also be seen as a reference PyTorch implementation of “CoAtNet: Marrying Convolution and Attention for All Data Sizes” [arxiv:2106.04803](https://arxiv.org/abs/2106.04803)\n![Image text](figs/stucture_of_metafg.png)\n## Model zoo\n| name       | resolution   | 1k model   |  21k model   | iNat21 model   |\n| :--------: | :----------: | :--------: | :----------: | :------------: |\n| MetaFormer-0   | 224x224 | [metafg_0_1k_224](https://drive.google.com/file/d/1BYbe3mrKioN-Ara6hhJiaiEgJLl_thSH/view?usp=sharing)|[metafg_0_21k_224](https://drive.google.com/file/d/1834jQ9OPHOBZDgv7jD6Qu5mNLsD9aeZv/view?usp=sharing)|-|\n| MetaFormer-1   | 224x224 | [metafg_1_1k_224](https://drive.google.com/file/d/1p-nIZgnrDatqmSzzDknTFYw-yEEUD_Rz/view?usp=sharing)|[metafg_1_21k_224](https://drive.google.com/file/d/1AcybDVEY-kXFT0D79w1G7I0h4r1IxLlG/view?usp=sharing)|-|\n| MetaFormer-2   | 224x224 | [metafg_2_1k_224](https://drive.google.com/file/d/1K6EEyFKbMUBpPqaEJMvo93YHTXCsgH2V/view?usp=sharing)|[metafg_2_21k_224](https://drive.google.com/file/d/1VygaD_IwYq25KwoupWfttKRZUm2_SPeK/view?usp=sharing)|-|\n| MetaFormer-0   |     384x384      |  [metafg_0_1k_384](https://drive.google.com/file/d/1r62S3CJFRWV_qA5udC9MOFOJYwRf8mE2/view?usp=sharing)  |  [metafg_0_21k_384](https://drive.google.com/file/d/1wVmlPjNTA6JKHcF3ROGorEVPxKVO83Ss/view?usp=sharing)  |  [metafg_0_inat21_384](https://drive.google.com/file/d/11gCk_IuSN7krdkOUSWSM4xlf8GGknmxc/view?usp=sharing)  |\n| MetaFormer-1   |     384x384      |  [metafg_1_1k_384](https://drive.google.com/file/d/12OTmZg4J6fMGvs-colOTDfmhdA5EMMvo/view?usp=sharing)  |  [metafg_1_21k_384](https://drive.google.com/file/d/13dsarbtsNrkhpG5XpCRlN5ogXDGXO3Z_/view?usp=sharing)  |  [metafg_1_inat21_384](https://drive.google.com/file/d/1ATUIrDxaQaGqx4lJ8HE2IwX_evMhblPu/view?usp=sharing)  |\n| MetaFormer-2   |     384x384      |  [metafg_2_1k_384](https://drive.google.com/file/d/167oBaseORq32aFA3Ex6lpHuasvu2PMb8/view?usp=sharing)  |  [metafg_2_21k_384](https://drive.google.com/file/d/1PnpntloQaYduEokFGQ6y79G7DdyjD_u3/view?usp=sharing)  |  [metafg_2_inat21_384](https://drive.google.com/file/d/17sUNST7ivQhonBAfZEiTOLAgtaHa4F3e/view?usp=sharing)  |\n\nYou can also get model by https://pan.baidu.com/s/1ZGEDoWWU7Z0vx0VCjEbe6g (password:3uiq).\n## Usage\n#### python module\n* install `Pytorch and torchvision`\n```\npip install torch==1.5.1 torchvision==0.6.1\n```\n* install `timm`\n```\npip install timm==0.4.5\n```\n* install `Apex`\n```\ngit clone https://github.com/NVIDIA/apex\ncd apex\npip install -v --disable-pip-version-check --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" ./\n```\n* install other requirements\n```\npip install opencv-python==4.5.1.48 yacs==0.1.8\n```\n#### data preparation\nDownload [inat21,18,17](https://github.com/visipedia/inat_comp),[CUB](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html),[NABirds](https://dl.allaboutbirds.org/nabirds),[stanfordcars](https://ai.stanford.edu/~jkrause/cars/car_dataset.html), and [aircraft](https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/), put them in respective folders (\\\u003croot\\\u003e/datasets/\u003cdataset_name\u003e) and Unzip file. The folder sturture as follow:\n```\ndatasets\n  |————inraturelist2021\n  |       └——————train\n  |       └——————val\n  |       └——————train.json\n  |       └——————val.json\n  |————inraturelist2018\n  |       └——————train_val_images\n  |       └——————train2018.json\n  |       └——————val2018.json\n  |       └——————train2018_locations.json\n  |       └——————val2018_locations.json\n  |       └——————categories.json.json\n  |————inraturelist2017\n  |       └——————train_val_images\n  |       └——————train2017.json\n  |       └——————val2017.json\n  |       └——————train2017_locations.json\n  |       └——————val2017_locations.json\n  |————cub-200\n  |       └——————...\n  |————nabirds\n  |       └——————...\n  |————stanfordcars\n  |       └——————car_ims\n  |       └——————cars_annos.mat\n  |————aircraft\n  |       └——————...\n```\n#### Training\nYou can dowmload pre-trained model from model zoo, and put them under \\\u003croot\\\u003e/pretrained.\nTo train MetaFG on datasets, run:\n```\npython3 -m torch.distributed.launch --nproc_per_node \u003cnum-of-gpus-to-use\u003e --master_port 12345  main.py --cfg \u003cconfig-file\u003e --dataset \u003cdataset-name\u003e --pretrain \u003cpretainedmodel-path\u003e [--batch-size \u003cbatch-size-per-gpu\u003e --output \u003coutput-directory\u003e --tag \u003cjob-tag\u003e]\n```\n\\\u003cdataset-name\\\u003e:inaturelist2021,inaturelist2018,inaturelist2017,cub-200,nabirds,stanfordcars,aircraft\nFor CUB-200-2011, run:\n```\npython3 -m torch.distributed.launch --nproc_per_node 8 --master_port 12345  main.py --cfg ./configs/MetaFG_1_224.yaml --batch-size 32 --tag cub-200_v1 --lr 5e-5 --min-lr 5e-7 --warmup-lr 5e-8 --epochs 300 --warmup-epochs 20 --dataset cub-200 --pretrain ./pretrained_model/\u003cxxxx\u003e.pth --accumulation-steps 2 --opts DATA.IMG_SIZE 384  \n```\nnote that final learning rate is total_bs/512.\n#### Eval\nTo evaluate model on dataset,run:\n```\npython3 -m torch.distributed.launch --nproc_per_node \u003cnum-of-gpus-to-use\u003e --master_port 12345  main.py --eval --cfg \u003cconfig-file\u003e --dataset \u003cdataset-name\u003e --resume \u003ccheckpoint\u003e [--batch-size \u003cbatch-size-per-gpu\u003e]\n```\n## Main Result\n#### ImageNet-1k \n| Name       | Resolution   | #Param   |  #FLOPS   | Throughput   | Top-1 acc |\n| :--------: | :----------: | :--------: | :----------: | :------------: | :------------: |\n| MetaFormer-0   |     224x224      |  28M  |  4.6G  |  840.1  | 82.9 |\n| MetaFormer-1   |     224x224      |  45M  |  8.5G  |  444.8  | 83.9 |\n| MetaFormer-2   |     224x224      |  81M  |  16.9G  |  438.9  | 84.1 |\n| MetaFormer-0   |     384x384      |  28M  |  13.4G  |  349.4  | 84.2 |\n| MetaFormer-1   |     384x384      |  45M  |  24.7G  |  165.3  | 84.4 |\n| MetaFormer-2   |     384x384      |  81M  |  49.7G  |  132.7  | 84.6 |\n#### Fine-grained Datasets\nResult on fine-grained datasets with different pre-trained model.\n| Name       | Pretrain   | CUB | NABirds |  iNat2017   | iNat2018  | Cars | Aircraft |\n| :--------: | :----------: | :--------: | :----------: | :------------: | :------------: | :--------: |:--------: |\n| MetaFormer-0|ImageNet-1k|89.6|89.1|75.7|79.5|95.0|91.2|\n| MetaFormer-0|ImageNet-21k|89.7|89.5|75.8|79.9|94.6|91.2|\n| MetaFormer-0|iNaturalist 2021|91.8|91.5|78.3|82.9|95.1|87.4|\n| MetaFormer-1|ImageNet-1k|89.7|89.4|78.2|81.9|94.9|90.8|\n| MetaFormer-1|ImageNet-21k|91.3|91.6|79.4|83.2|95.0|92.6|\n| MetaFormer-1|iNaturalist 2021|92.3|92.7|82.0|87.5|95.0|92.5|\n| MetaFormer-2|ImageNet-1k|89.7|89.7|79.0|82.6|95.0|92.4|\n| MetaFormer-2|ImageNet-21k|91.8|92.2|80.4|84.3|95.1|92.9|\n| MetaFormer-2|iNaturalist 2021|92.9|93.0|82.8|87.7|95.4|92.8|\n\n\nResults in iNaturalist 2019, iNaturalist 2018, and iNaturalist 2021 with meta-information.\n| Name       | Pretrain   | Meta added| iNat2017   |  iNat2018   | iNat2021   |\n| :--------: | :----------: | :--------: | :---------- | :------------ |:------------ |\n|MetaFormer-0|ImageNet-1k|N|75.7|79.5|88.4|\n|MetaFormer-0|ImageNet-1k|Y|79.8(+4.1)|85.4(+5.9)|92.6(+4.2)|\n|MetaFormer-1|ImageNet-1k|N|78.2|81.9|90.2|\n|MetaFormer-1|ImageNet-1k|Y|81.3(+3.1)|86.5(+4.6)|93.4(+3.2)|\n|MetaFormer-2|ImageNet-1k|N|79.0|82.6|89.8|\n|MetaFormer-2|ImageNet-1k|Y|82.0(+3.0)|86.8(+4.2)|93.2(+3.4)|\n|MetaFormer-2|ImageNet-21k|N|80.4|84.3|90.3|\n|MetaFormer-2|ImageNet-21k|Y|83.4(+3.0)|88.7(+4.4)|93.6(+3.3)|\n## Citation\n\n```\n@article{MetaFormer,\n  title={MetaFormer: A Unified Meta Framework for Fine-Grained Recognition},\n  author={Diao, Qishuai and Jiang, Yi and Wen, Bin and Sun, Jia and Yuan, Zehuan},\n  journal={arXiv preprint arXiv:2203.02751},\n  year={2022},\n}\n```\n\n## Acknowledgement\nMany thanks for [swin-transformer](https://github.com/microsoft/Swin-Transformer).A part of the code is borrowed from it.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdqshuai%2FMetaFormer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdqshuai%2FMetaFormer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdqshuai%2FMetaFormer/lists"}