{"id":15027817,"url":"https://github.com/locuslab/convmixer","last_synced_at":"2025-05-16T18:10:23.187Z","repository":{"id":40485893,"uuid":"411419017","full_name":"locuslab/convmixer","owner":"locuslab","description":"Implementation of ConvMixer for \"Patches Are All You Need? 🤷\"","archived":false,"fork":false,"pushed_at":"2022-11-11T08:49:42.000Z","size":14315,"stargazers_count":1070,"open_issues_count":6,"forks_count":103,"subscribers_count":13,"default_branch":"main","last_synced_at":"2025-04-12T17:46:26.401Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/locuslab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-09-28T19:48:47.000Z","updated_at":"2025-04-09T13:30:06.000Z","dependencies_parsed_at":"2022-07-16T23:30:59.561Z","dependency_job_id":null,"html_url":"https://github.com/locuslab/convmixer","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fconvmixer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fconvmixer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fconvmixer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fconvmixer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/locuslab","download_url":"https://codeload.github.com/locuslab/convmixer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254582907,"owners_count":22095518,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-09-24T20:07:06.489Z","updated_at":"2025-05-16T18:10:23.171Z","avatar_url":"https://github.com/locuslab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Patches Are All You Need? 🤷\nThis repository contains an implementation of ConvMixer for the ICLR 2022 submission [\"Patches Are All You Need?\"](https://openreview.net/forum?id=TVHS5Y4dNvM) by Asher Trockman and Zico Kolter.\n\n🔎 New: Check out [this repository](https://github.com/locuslab/convmixer-cifar10) for training ConvMixers on CIFAR-10.\n\n### Code overview\nThe most important code is in `convmixer.py`. We trained ConvMixers using the `timm` framework, which we copied from [here](http://github.com/rwightman/pytorch-image-models).\n\n__**Update:**__ ConvMixer is now integrated into the [`timm` framework itself](https://github.com/rwightman/pytorch-image-models). You can see the PR [here](https://github.com/rwightman/pytorch-image-models/pull/910).\n\nInside `pytorch-image-models`, we have made the following modifications. (Though one could look at the diff, we think it is convenient to summarize them here.)\n\n- Added ConvMixers\n  - added `timm/models/convmixer.py`\n  - modified `timm/models/__init__.py`\n- Added \"OneCycle\" LR Schedule\n  - added `timm/scheduler/onecycle_lr.py`\n  - modified `timm/scheduler/scheduler.py`\n  - modified `timm/scheduler/scheduler_factory.py`\n  - modified `timm/scheduler/__init__.py`\n  - modified `train.py` (added two lines to support this LR schedule)\n\nWe are confident that the use of the OneCycle schedule here is not critical, and one could likely just as well\ntrain ConvMixers with the built-in cosine schedule.\n\n### Evaluation\nWe provide some model weights below:\n\n| Model Name | Kernel Size | Patch Size | File Size |\n|------------|:-----------:|:----------:|----------:|\n|[ConvMixer-1536/20](https://github.com/tmp-iclr/convmixer/releases/download/v1.0/convmixer_1536_20_ks9_p7.pth.tar)| 9 | 7 | 207MB |\n|[ConvMixer-768/32](https://github.com/tmp-iclr/convmixer/releases/download/v1.0/convmixer_768_32_ks7_p7_relu.pth.tar)\\*| 7 | 7 | 85MB |\n|[ConvMixer-1024/20](https://github.com/tmp-iclr/convmixer/releases/download/v1.0/convmixer_1024_20_ks9_p14.pth.tar)| 9 | 14 | 98MB |\n\n\\* **Important:** ConvMixer-768/32 here uses ReLU instead of GELU, so you would have to change `convmixer.py` accordingly (we will fix this later).\n\nYou can evaluate ConvMixer-1536/20 as follows:\n\n```\npython validate.py --model convmixer_1536_20 --b 64 --num-classes 1000 --checkpoint [/path/to/convmixer_1536_20_ks9_p7.pth.tar] [/path/to/ImageNet1k-val]\n```\n\nYou should get a `81.37%` accuracy.\n\n### Training\nIf you had a node with 10 GPUs, you could train a ConvMixer-1536/20 as follows (these are exactly the settings we used):\n\n```\nsh distributed_train.sh 10 [/path/to/ImageNet1k] \n    --train-split [your_train_dir] \n    --val-split [your_val_dir] \n    --model convmixer_1536_20 \n    -b 64 \n    -j 10 \n    --opt adamw \n    --epochs 150 \n    --sched onecycle \n    --amp \n    --input-size 3 224 224\n    --lr 0.01 \n    --aa rand-m9-mstd0.5-inc1 \n    --cutmix 0.5 \n    --mixup 0.5 \n    --reprob 0.25 \n    --remode pixel \n    --num-classes 1000 \n    --warmup-epochs 0 \n    --opt-eps=1e-3 \n    --clip-grad 1.0\n```\n\nWe also included a ConvMixer-768/32 in timm/models/convmixer.py (though it is simple to add more ConvMixers). We trained that one with the above settings but with 300 epochs instead of 150 epochs.\n\n__**Note:**__ If you are training on CIFAR-10 instead of ImageNet-1k, we recommend setting `--scale 0.75 1.0` as well, since the default value of 0.08 1.0 does not make sense for 32x32 inputs.\n\nThe tweetable version of ConvMixer, which requires `from torch.nn import *`:\n\n```\ndef ConvMixer(h,d,k,p,n):\n S,C,A=Sequential,Conv2d,lambda x:S(x,GELU(),BatchNorm2d(h))\n R=type('',(S,),{'forward':lambda s,x:s[0](x)+x})\n return S(A(C(3,h,p,p)),*[S(R(A(C(h,h,k,groups=h,padding=k//2))),A(C(h,h,1))) for i in range(d)],AdaptiveAvgPool2d(1),Flatten(),Linear(h,n))\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flocuslab%2Fconvmixer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flocuslab%2Fconvmixer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flocuslab%2Fconvmixer/lists"}