{"id":20771932,"url":"https://github.com/suous/repnext","last_synced_at":"2025-04-30T14:27:37.241Z","repository":{"id":245972607,"uuid":"819684114","full_name":"suous/RepNeXt","owner":"suous","description":"RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization","archived":false,"fork":false,"pushed_at":"2024-10-13T11:58:37.000Z","size":9602,"stargazers_count":39,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-30T14:27:28.520Z","etag":null,"topics":["computer-vision"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2406.16004","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/suous.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-25T02:18:17.000Z","updated_at":"2025-04-24T03:38:00.000Z","dependencies_parsed_at":"2024-09-16T17:17:23.709Z","dependency_job_id":"80d765fd-e4ef-4d3a-9cf8-ae9cf5dbca12","html_url":"https://github.com/suous/RepNeXt","commit_stats":null,"previous_names":["suous/repnext"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suous%2FRepNeXt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suous%2FRepNeXt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suous%2FRepNeXt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suous%2FRepNeXt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/suous","download_url":"https://codeload.github.com/suous/RepNeXt/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251721073,"owners_count":21632767,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision"],"created_at":"2024-11-17T12:18:07.864Z","updated_at":"2025-04-30T14:27:37.207Z","avatar_url":"https://github.com/suous.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization](https://arxiv.org/abs/2406.16004)\n\n[![license](https://img.shields.io/github/license/suous/RepNeXt)](https://github.com/suous/RepNeXt/blob/main/LICENSE)\n[![arXiv](https://img.shields.io/badge/arXiv-2406.16004-red)](https://arxiv.org/abs/2406.16004)\n[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/suous/RepNeXt/blob/main/demo/feature_map_visualization.ipynb)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"figures/latency.png\" width=70%\u003e \u003cbr\u003e\n  The top-1 accuracy is tested on ImageNet-1K and the latency is measured by an iPhone 12 with iOS 16 across 20 experimental sets.\n\u003c/p\u003e\n\n[RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization](https://arxiv.org/abs/2406.16004).\\\nMingshu Zhao, Yi Luo, and Yong Ouyang\n[[`arXiv`](https://arxiv.org/abs/2406.16004)]\n\n![architecture](./figures/architecture.png)\n\n## Abstract\n\nWe introduce RepNeXt, a novel model series integrates multi-scale feature representations and incorporates both serial and parallel structural reparameterization (SRP) to enhance network depth and width without compromising inference speed.\nExtensive experiments demonstrate RepNeXt's superiority over current leading lightweight CNNs and ViTs, providing advantageous latency across various vision benchmarks.\nRepNeXt-M4 matches RepViT-M1.5's 82.3% accuracy on ImageNet within 1.5ms on an iPhone 12, outperforms its AP$^{box}$ by 1.3 on MS-COCO, and reduces parameters by 0.7M.\n\n![transforms](./figures/transforms.png)\n\n\u003cdetails\u003e\n  \u003csummary\u003e\n  \u003cfont size=\"+1\"\u003eConclusion\u003c/font\u003e\n  \u003c/summary\u003e\n    In this paper, we introduced a multi-scale depthwise convolution integrated with both serial and parallel SRP mechanisms, enhancing feature diversity and expanding the network’s expressive capacity without compromising inference speed. \n    Specifically, we designed a reparameterized medium-kernel convolution to imitate the human foveal vision system. \n    Additionally, we proposed our light-weight, general-purpose RepNeXts that employed the distribute-transform-aggregate design philosophy across inner-stage blocks as well as downsampling layers, achieving comparable or superior accuracy-efficiency trade-off across various vision benchmarks, especially on downstream tasks. \n    Moreover, our flexible multi-branch design functions as a grouped-depthwise convolution with additional inductive bias and efficiency trade-offs. \n    It can also be reparameterized into a single-branch large-kernel depthwise convolution, enabling potential optimization towards different accelerators.\n\nFor example, the large-kernel depthwise convolution can be accelerated by the implicit GEMM algorithm: `DepthWiseConv2dImplicitGEMM` of [RepLKNet](https://github.com/DingXiaoH/RepLKNet-pytorch).\n\nMany token mixers can be generalized as a **distribute-transform-aggregate** process:\n\n| Token Mixer | Distribution |   Transforms   | Aggregation |\n|:-----------:|:------------:|:--------------:|:-----------:|\n|  ChunkConv  |    Split     | Conv, Identity |     Cat     |\n|  CopyConv   |    Clone     |      Conv      |     Cat     |\n|   MixConv   |    Split     |      Conv      |     Cat     |\n|    MHSA     |    Split     |      Attn      |     Cat     |\n|  RepBlock   |    Clone     |      Conv      |     Add     |\n\n`ChunkConv` and `CopyConv` can be viewed as grouped depthwise convolutions.\n\n- **Chunk Conv**\n\n```python\nclass ChunkConv(nn.Module):\n    def __init__(self, in_channels, bias=True):\n        super().__init__()\n        self.bias = bias\n        in_channels = in_channels // 4\n        kwargs = {\"in_channels\": in_channels, \"out_channels\": in_channels, \"groups\": in_channels, \"bias\": bias}\n        self.conv_i = nn.Identity()\n        self.conv_s = nn.Conv2d(kernel_size=3, padding=1, **kwargs)\n        self.conv_m = nn.Conv2d(kernel_size=7, padding=3, **kwargs)\n        self.conv_l = nn.Conv2d(kernel_size=11, padding=5, **kwargs)\n\n    def forward(self, x):\n        i, s, m, l = torch.chunk(x, chunks=4, dim=1)\n        return torch.cat((self.conv_i(i), self.conv_s(s), self.conv_m(m), self.conv_l(l)), dim=1)\n\n    @torch.no_grad()\n    def fuse(self):\n        conv_s_w, conv_s_b = self.conv_s.weight, self.conv_s.bias\n        conv_m_w, conv_m_b = self.conv_m.weight, self.conv_m.bias\n        conv_l_w, conv_l_b = self.conv_l.weight, self.conv_l.bias\n\n        conv_i_w = torch.nn.functional.pad(torch.ones(conv_l_w.shape[0], conv_l_w.shape[1], 1, 1), [5, 5, 5, 5])\n        conv_s_w = nn.functional.pad(conv_s_w, [4, 4, 4, 4])\n        conv_m_w = nn.functional.pad(conv_m_w, [2, 2, 2, 2])\n\n        in_channels = self.conv_l.in_channels*4\n        conv = nn.Conv2d(in_channels, in_channels, kernel_size=11, padding=5, bias=self.bias, groups=in_channels)\n        conv.weight.data.copy_(torch.cat((conv_i_w, conv_s_w, conv_m_w, conv_l_w), dim=0))\n\n        if self.bias:\n            conv_i_b = torch.zeros_like(conv_s_b)\n            conv.bias.data.copy_(torch.cat((conv_i_b, conv_s_b, conv_m_b, conv_l_b), dim=0))\n        return conv\n```\n\n- **Copy Conv**\n\n```python\nclass CopyConv(nn.Module):\n    def __init__(self, in_channels, bias=True):\n        super().__init__()\n        self.bias = bias\n        kwargs = {\"in_channels\": in_channels, \"out_channels\": in_channels, \"groups\": in_channels, \"bias\": bias, \"stride\": 2}\n        self.conv_s = nn.Conv2d(kernel_size=3, padding=1, **kwargs)\n        self.conv_l = nn.Conv2d(kernel_size=7, padding=3, **kwargs)\n         \n    def forward(self, x):\n        B, C, H, W = x.shape\n        s, l = self.conv_s(x), self.conv_l(x)\n        return torch.stack((s, l), dim=2).reshape(B, C*2, H//2, W//2)\n\n    @torch.no_grad()\n    def fuse(self):\n        conv_s_w, conv_s_b = self.conv_s.weight, self.conv_s.bias\n        conv_l_w, conv_l_b = self.conv_l.weight, self.conv_l.bias\n\n        conv_s_w = nn.functional.pad(conv_s_w, [2, 2, 2, 2])\n\n        in_channels = self.conv_l.in_channels\n        conv = nn.Conv2d(in_channels, in_channels*2, kernel_size=7, padding=3, bias=self.bias, stride=self.conv_l.stride, groups=in_channels)\n        conv.weight.data.copy_(torch.stack((conv_s_w, conv_l_w), dim=1).reshape(conv.weight.shape))\n\n        if self.bias:\n            conv.bias.data.copy_(torch.stack((conv_s_b, conv_l_b), dim=1).reshape(conv.bias.shape))\n        return conv\n```\n\nIn summary, by focusing solely on the simplicity of the model’s overall architecture and disregarding its efficiency and parameter count, we can ultimately consolidate it into the single-branch structure shown in the figure below:\n\n![equivalent](./figures/equivalent.png)\n\u003c/details\u003e\n\n\u003cbr/\u003e\n\n**UPDATES** 🔥\n- **2024/10/13**: Added M0-M2 single branch equivalent form ImageNet-1K results using StarNet's training recipe.\n- **2024/09/19**: Added M0-M2 ImageNet-1K results using StarNet's training recipe (distilled). Hit 80.6% top-1 accuracy within 1ms on an iPhone 12.\n- **2024/09/08**: Added RepNext-M0 ImageNet-1K result using StarNet's training recipe. Achieving 73.8% top-1 accuracy without distillation.\n- **2024/08/26**: RepNext-M0 (distilled) has been released, achieving 74.2% top-1 accuracy within 0.6ms on an iPhone 12.\n- **2024/08/23**: Finished compact model (M0) ImageNet-1K experiments.\n- **2024/07/23**: Updated readme about further simplified model structure.\n- **2024/06/25**: Uploaded checkpoints and training logs of RepNext-M1 - M5.\n\n## Classification on ImageNet-1K\n\n### Models under the RepVit training strategy\n\nWe report the top-1 accuracy on ImageNet-1K with and without distillation using the same training strategy as [RepViT](https://github.com/THU-MIG/RepViT).\n\n| Model | Top-1(distill) / Top-1 | #params | MACs | Latency |                                                                                                 Ckpt                                                                                                 |                                               Core ML                                               |                                                   Log                                                   |\n|:------|:----------------------:|:-------:|:----:|:-------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------:|\n| M0    |      74.2 \\| 72.6      |  2.3M   | 0.4G | 0.59ms  | [fused 300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m0_distill_300e_fused.pt) / [300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m0_distill_300e.pth) | [300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m0_distill_300e_224.mlmodel) | [distill 300e](./logs/repnext_m0_distill_300e.txt) / [300e](./logs/repnext_m0_without_distill_300e.txt) |\n| M1    |      78.8 \\| 77.5      |  4.8M   | 0.8G | 0.86ms  | [fused 300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m1_distill_300e_fused.pt) / [300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m1_distill_300e.pth) | [300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m1_distill_300e_224.mlmodel) | [distill 300e](./logs/repnext_m1_distill_300e.txt) / [300e](./logs/repnext_m1_without_distill_300e.txt) |\n| M2    |      80.1 \\| 78.9      |  6.5M   | 1.1G | 1.00ms  | [fused 300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m2_distill_300e_fused.pt) / [300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m2_distill_300e.pth) | [300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m2_distill_300e_224.mlmodel) | [distill 300e](./logs/repnext_m2_distill_300e.txt) / [300e](./logs/repnext_m2_without_distill_300e.txt) |\n| M3    |      80.7 \\| 79.4      |  7.8M   | 1.3G | 1.11ms  | [fused 300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m3_distill_300e_fused.pt) / [300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m3_distill_300e.pth) | [300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m3_distill_300e_224.mlmodel) | [distill 300e](./logs/repnext_m3_distill_300e.txt) / [300e](./logs/repnext_m3_without_distill_300e.txt) |\n| M4    |      82.3 \\| 81.2      |  13.3M  | 2.3G | 1.48ms  | [fused 300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m4_distill_300e_fused.pt) / [300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m4_distill_300e.pth) | [300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m4_distill_300e_224.mlmodel) | [distill 300e](./logs/repnext_m4_distill_300e.txt) / [300e](./logs/repnext_m4_without_distill_300e.txt) |\n| M5    |      83.3 \\| 82.4      |  21.7M  | 4.5G | 2.20ms  | [fused 300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m5_distill_300e_fused.pt) / [300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m5_distill_300e.pth) | [300e](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m5_distill_300e_224.mlmodel) | [distill 300e](./logs/repnext_m5_distill_300e.txt) / [300e](./logs/repnext_m5_without_distill_300e.txt) |\n\n### Models under the StarNet training strategy\n\nWe report the top-1 and top-5 accuracy on ImageNet-1K with and without distillation using the same training strategy as [StarNet](https://github.com/ma-xu/Rewrite-the-Stars).\n\n\u003e Models decorated with “E” denote the single-branch equivalent form of RepNeXts through channel-wise structural reparameterization.\n\n| Model | Params (M) | MACs | Latency | Top-1 / top-5 |                                                                                                                     Download                                                                                                                      | Top-1 / top-5 (distill) |                                                                                                                                 Download                                                                                                                                  |\n|:------|:----------:|:----:|:-------:|:-------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|\n| M0    |    2.3     | 0.4G | 0.59ms  | 73.8 \\| 91.6  | [args](./logs/strategy/repnext_m0_sz224_4xbs512_ep300_args.yaml) \\| [log](./logs/strategy/repnext_m0_sz224_4xbs512_ep300_summary.csv) \\| [model]( https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m0_sz224_4xbs512_ep300.pth.tar) |      75.4 \\| 92.1       | [args](./logs/strategy/repnext_m0_sz224_4xbs256_ep300_distill_args.yaml) \\| [log](./logs/strategy/repnext_m0_sz224_4xbs256_ep300_distill_summary.csv) \\| [model]( https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m0_sz224_4xbs256_ep300_distill.pth.tar) |\n| M0E   |    2.5     | 0.5G | 0.62ms  | 73.9 \\| 91.9  |                                                      [args](./logs/strategy/repnext_m0_sz224_4xbs512_ep300_args.yaml) \\| [log](./logs/strategy/repnext_m0e_sz224_4xbs512_ep300_summary.csv)                                                       |      75.6 \\| 92.3       |                                                          [args](./logs/strategy/repnext_m0_sz224_4xbs256_ep300_distill_args.yaml) \\| [log](./logs/strategy/repnext_m0e_sz224_4xbs256_ep300_distill_summary.csv)                                                           |\n| M1    |    4.8     | 0.8G | 0.86ms  | 77.9 \\| 94.0  | [args](./logs/strategy/repnext_m1_sz224_4xbs512_ep300_args.yaml) \\| [log](./logs/strategy/repnext_m1_sz224_4xbs512_ep300_summary.csv) \\| [model]( https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m1_sz224_4xbs512_ep300.pth.tar) |      79.7 \\| 94.5       | [args](./logs/strategy/repnext_m1_sz224_4xbs256_ep300_distill_args.yaml) \\| [log](./logs/strategy/repnext_m1_sz224_4xbs256_ep300_distill_summary.csv) \\| [model]( https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m1_sz224_4xbs256_ep300_distill.pth.tar) |\n| M1E   |    5.3     | 1.0G | 0.89ms  | 77.7 \\| 93.9  |                                                      [args](./logs/strategy/repnext_m1_sz224_4xbs512_ep300_args.yaml) \\| [log](./logs/strategy/repnext_m1e_sz224_4xbs512_ep300_summary.csv)                                                       |      79.5 \\| 94.5       |                                                          [args](./logs/strategy/repnext_m1_sz224_4xbs256_ep300_distill_args.yaml) \\| [log](./logs/strategy/repnext_m1e_sz224_4xbs256_ep300_distill_summary.csv)                                                           |\n| M2    |    6.5     | 1.1G | 1.00ms  | 78.8 \\| 94.5  | [args](./logs/strategy/repnext_m2_sz224_4xbs512_ep300_args.yaml) \\| [log](./logs/strategy/repnext_m2_sz224_4xbs512_ep300_summary.csv) \\| [model]( https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m2_sz224_4xbs512_ep300.pth.tar) |      80.6 \\| 95.1       | [args](./logs/strategy/repnext_m2_sz224_4xbs256_ep300_distill_args.yaml) \\| [log](./logs/strategy/repnext_m2_sz224_4xbs256_ep300_distill_summary.csv) \\| [model]( https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m2_sz224_4xbs256_ep300_distill.pth.tar) |\n| M2E   |    7.0     | 1.3G | 1.04ms  | 78.8 \\| 94.3  |                                                      [args](./logs/strategy/repnext_m2_sz224_4xbs512_ep300_args.yaml) \\| [log](./logs/strategy/repnext_m2e_sz224_4xbs512_ep300_summary.csv)                                                       |      80.6 \\| 95.2       |                                                          [args](./logs/strategy/repnext_m2_sz224_4xbs256_ep300_distill_args.yaml) \\| [log](./logs/strategy/repnext_m2e_sz224_4xbs256_ep300_distill_summary.csv)                                                           |\n\nModel evaluation:\n\n```bash\npython moganet_valid.py \\\n--model repnext_m0 \\\n--img_size 224 \\\n--crop_pct 0.9 \\\n--data_dir data/imagenet \\                  \n--checkpoint repnext_m0_sz224_4xbs512_ep300.pth.tar\n```\n\nTips: Convert a training-time RepNeXt into the inference-time structure\n```\nfrom timm.models import create_model\nimport utils\n\nmodel = create_model('repnext_m1')\nutils.replace_batchnorm(model)\n```\n\n## Latency Measurement \n\nThe latency reported in RepNeXt for iPhone 12 (iOS 16) uses the benchmark tool from [XCode 14](https://developer.apple.com/videos/play/wwdc2022/10027/).\n\n\u003cdetails\u003e\n\u003csummary\u003e\nRepNeXt-M1\n\u003c/summary\u003e\n\u003cimg src=\"./figures/latency/repnext_m1_latency.png\" width=70%\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nRepNeXt-M2\n\u003c/summary\u003e\n\u003cimg src=\"./figures/latency/repnext_m2_latency.png\" width=70%\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nRepNeXt-M3\n\u003c/summary\u003e\n\u003cimg src=\"./figures/latency/repnext_m3_latency.png\" width=70%\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nRepNeXt-M4\n\u003c/summary\u003e\n\u003cimg src=\"./figures/latency/repnext_m4_latency.png\" width=70%\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nRepNeXt-M5\n\u003c/summary\u003e\n\u003cimg src=\"./figures/latency/repnext_m5_latency.png\" width=70%\u003e\n\u003c/details\u003e\n\nTips: export the model to Core ML model\n```\npython export_coreml.py --model repnext_m1 --ckpt pretrain/repnext_m1_distill_300e.pth\n```\nTips: measure the throughput on GPU\n```\npython speed_gpu.py --model repnext_m1\n```\n\n## ImageNet  \n\n### Prerequisites\n`conda` virtual environment is recommended. \n```\nconda create -n repnext python=3.8\npip install -r requirements.txt\n```\n\n### Data preparation\n\nDownload and extract ImageNet train and val images from http://image-net.org/. The training and validation data are expected to be in the `train` folder and `val` folder respectively:\n\n```bash\n# script to extract ImageNet dataset: https://github.com/pytorch/examples/blob/main/imagenet/extract_ILSVRC.sh\n# ILSVRC2012_img_train.tar (about 138 GB)\n# ILSVRC2012_img_val.tar (about 6.3 GB)\n```\n\n```\n# organize the ImageNet dataset as follows:\nimagenet\n├── train\n│   ├── n01440764\n│   │   ├── n01440764_10026.JPEG\n│   │   ├── n01440764_10027.JPEG\n│   │   ├── ......\n│   ├── ......\n├── val\n│   ├── n01440764\n│   │   ├── ILSVRC2012_val_00000293.JPEG\n│   │   ├── ILSVRC2012_val_00002138.JPEG\n│   │   ├── ......\n│   ├── ......\n```\n\n### Training\nTo train RepNeXt-M1 on an 8-GPU machine:\n\n```\npython -m torch.distributed.launch --nproc_per_node=8 --master_port 12346 --use_env main.py --model repnext_m1 --data-path ~/imagenet --dist-eval\n```\nTips: specify your data path and model name! \n\nTraining with the helper script:\n\n```bash\nsh dist_train_cifar.sh\n```\n\n### Testing \nFor example, to test RepNeXt-M1:\n```\npython main.py --eval --model repnext_m1 --resume pretrain/repnext_m1_distill_300e.pth --data-path ~/imagenet\n```\n\n### Fused model evaluation\nFor example, to evaluate RepNeXt-M1 with the fused model: [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/suous/RepNeXt/blob/main/demo/fused_model_evaluation.ipynb)\n```\npython fuse_eval.py --model repnext_m1 --resume pretrain/repnext_m1_distill_300e_fused.pt --data-path ~/imagenet\n```\n\n## Downstream Tasks\n[Object Detection and Instance Segmentation](detection/README.md)\u003cbr\u003e\n\n| Model      | $AP^b$ | $AP_{50}^b$ | $AP_{75}^b$ | $AP^m$ | $AP_{50}^m$ | $AP_{75}^m$ | Latency |                                       Ckpt                                        |                     Log                     |\n|:-----------|:------:|:---:|:--:|:------:|:--:|:--:|:-------:|:---------------------------------------------------------------------------------:|:-------------------------------------------:|\n| RepNeXt-M3 |  40.8  | 62.4   | 44.7  | 37.8   | 59.5  | 40.6 |  5.1ms  | [M3](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m3_coco.pth) | [M3](./detection/logs/repnext_m3_coco.json) |\n| RepNeXt-M4 |  42.9  | 64.4   | 47.2  |  39.1  | 61.7  | 41.7 |  6.6ms  | [M4](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m4_coco.pth) | [M4](./detection/logs/repnext_m4_coco.json) |\n| RepNeXt-M5 |  44.7  | 66.0   | 49.2  |  40.7  | 63.5  | 43.6 | 10.4ms  | [M5](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m5_coco.pth) | [M5](./detection/logs/repnext_m5_coco.json) |\n\n[Semantic Segmentation](segmentation/README.md)\n\n| Model      | mIoU | Latency |                                        Ckpt                                         |                       Log                        |\n|:-----------|:----:|:-------:|:-----------------------------------------------------------------------------------:|:------------------------------------------------:|\n| RepNeXt-M3 |   40.6   |  5.1ms  | [M3](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m3_ade20k.pth) | [M3](./segmentation/logs/repnext_m3_ade20k.json) |\n| RepNeXt-M4 |   43.3   |  6.6ms  | [M4](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m4_ade20k.pth) | [M4](./segmentation/logs/repnext_m4_ade20k.json) |\n| RepNeXt-M5 |   45.0   | 10.4ms  | [M5](https://github.com/suous/RepNeXt/releases/download/v1.0/repnext_m5_ade20k.pth) | [M5](./segmentation/logs/repnext_m5_ade20k.json) |\n\n## Feature Map Visualization\nRun feature map visualization demo: [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/suous/RepNeXt/blob/main/demo/feature_map_visualization.ipynb)\n\n\u003ctable border=0 align=center\u003e\n\t\u003ctbody\u003e\n    \u003ctr\u003e\n\t\t\t\u003ctd align=\"center\"\u003e Original Image \u003c/td\u003e\n\t\t\t\u003ctd align=\"center\"\u003e Identity \u003c/td\u003e\n\t\t\t\u003ctd align=\"center\"\u003e RepDWConvS \u003c/td\u003e\n\t\t\t\u003ctd align=\"center\"\u003e RepDWConvM \u003c/td\u003e\n\t\t\t\u003ctd align=\"center\"\u003e DWConvL \u003c/td\u003e\n\t\t\u003c/tr\u003e\n\t\t\u003ctr\u003e\n\t\t\t\u003ctd width=\"20%\"\u003e \u003cimg src=\"https://raw.githubusercontent.com/shicai/MobileNet-Caffe/master/cat.jpg\"\u003e \u003c/td\u003e\n\t\t\t\u003ctd width=\"20%\"\u003e \u003cimg src=\"./demo/figures/cat-stage0-block1-conv_i.png\"\u003e \u003c/td\u003e\n            \u003ctd width=\"20%\"\u003e \u003cimg src=\"./demo/figures/cat-stage0-block1-conv_s.png\"\u003e \u003c/td\u003e\n            \u003ctd width=\"20%\"\u003e \u003cimg src=\"./demo/figures/cat-stage0-block1-conv_m.png\"\u003e \u003c/td\u003e\n            \u003ctd width=\"20%\"\u003e \u003cimg src=\"./demo/figures/cat-stage0-block1-conv_l.png\"\u003e \u003c/td\u003e\n\t\t\u003c/tr\u003e\n\t\t\u003ctr\u003e\n\t\t\t\u003ctd width=\"20%\"\u003e \u003cimg src=\"https://raw.githubusercontent.com/pytorch/vision/main/gallery/assets/dog2.jpg\"\u003e \u003c/td\u003e\n\t\t\t\u003ctd width=\"20%\"\u003e \u003cimg src=\"./demo/figures/dog2-stage0-block1-conv_i.png\"\u003e \u003c/td\u003e\n            \u003ctd width=\"20%\"\u003e \u003cimg src=\"./demo/figures/dog2-stage0-block1-conv_s.png\"\u003e \u003c/td\u003e\n            \u003ctd width=\"20%\"\u003e \u003cimg src=\"./demo/figures/dog2-stage0-block1-conv_m.png\"\u003e \u003c/td\u003e\n            \u003ctd width=\"20%\"\u003e \u003cimg src=\"./demo/figures/dog2-stage0-block1-conv_l.png\"\u003e \u003c/td\u003e\n\t\t\u003c/tr\u003e\n\t\u003c/tbody\u003e\n\u003c/table\u003e\n\n## Ablation Study\n\n### Downsampling Layer Design\n\nThe downsampling layer between each stage is a modified version of the MetaNeXt block, where the shortcut connection bypasses the channel mixer.\n\n\u003cdetails\u003e\n  \u003csummary\u003e\n  When replace downsampling layers of ConvNeXt-femto with our designs, the top-1 accuracy is improved by 1.8%.\n  \u003c/summary\u003e\n\n|                          Model                           | Top-1(%) | Params(M) | GMACs | Throughput(im/s) |                          Log                          |                                                 Ckpt                                                 |\n|:--------------------------------------------------------:|:--------:|:---------:|:-----:|:----------------:|:-----------------------------------------------------:|:----------------------------------------------------------------------------------------------------:|\n| [ConvNeXt](https://github.com/facebookresearch/ConvNeXt) |  72.37   |   5.22    | 0.78  |       3636       | [femto](./logs/ablation/convnext_femto_120e_7237.txt) | [baseline](https://github.com/suous/RepNeXt/releases/download/v1.0/convnext_femto_120e_baseline_7237.pth) |\n|                         ModifiedA                        |  74.28   |   5.25    | 0.79  |       3544       | [femto](./logs/ablation/convnext_femto_120e_7428.txt) | [replaced](https://github.com/suous/RepNeXt/releases/download/v1.0/convnext_femto_120e_replaced_7428.pth) |\n|                         ModifiedB                        |  74.19   |   5.25    | 0.79  |       3544       | [femto](./logs/ablation/convnext_femto_120e_7419.txt) | [replaced](https://github.com/suous/RepNeXt/releases/download/v1.0/convnext_femto_120e_replaced_7419.pth) |\n\n![top1_acc](./figures/ablation_convnext.png)\n\n```python\n# ModifiedA\nclass Downsample(nn.Module):\n    def __init__(self, dim, mlp_ratio):\n        super().__init__()\n        out_dim = dim * 2\n        self.dwconv = nn.Conv2d(dim, out_dim, kernel_size=7, padding=3, groups=dim, stride=2)\n        self.norm = LayerNorm(out_dim, eps=1e-6)\n        self.pwconv1 = nn.Linear(out_dim, mlp_ratio * out_dim)\n        self.act = nn.GELU()\n        self.pwconv2 = nn.Linear(mlp_ratio * out_dim, out_dim)\n\n    def forward(self, x):\n        x = self.dwconv(x)        # token mixer: (N, C, H, W) -\u003e (N, 2C, H/2, W/2)\n        input = x                 # bypass the channel mixer and the normalization layer\n        x = x.permute(0, 2, 3, 1) # (N, C, H, W) -\u003e (N, H, W, C)\n        x = self.norm(x)\n        x = self.pwconv1(x)\n        x = self.act(x)\n        x = self.pwconv2(x)\n        x = x.permute(0, 3, 1, 2) # (N, H, W, C) -\u003e (N, C, H, W)\n        return input + x\n```\n\n\u003e The code below is our downsample layer design, where the shortcut connection only bypasses the channel mixer. \n\u003e This design enables the batch normalization layer to be fused into the previous convolution layer.\n\n```python\n# ModifiedB\nclass Downsample(nn.Module):\n    def __init__(self, dim, mlp_ratio=2):\n        super().__init__()\n        out_dim = dim * 2\n        self.dwconv = nn.Conv2d(dim, out_dim, kernel_size=7, padding=3, groups=dim, stride=2)\n        self.norm = LayerNorm(out_dim, eps=1e-6, data_format=\"channels_first\")\n        self.pwconv1 = nn.Linear(out_dim, mlp_ratio * out_dim)\n        self.act = nn.GELU()\n        self.pwconv2 = nn.Linear(mlp_ratio * out_dim, out_dim)\n\n    def forward(self, x):\n        x = self.dwconv(x)        # token mixer: (N, C, H, W) -\u003e (N, 2C, H/2, W/2)\n        x = self.norm(x)          # normalization layer\n        input = x                 # bypass the channel mixer\n        x = x.permute(0, 2, 3, 1) # (N, C, H, W) -\u003e (N, H, W, C)\n        x = self.pwconv1(x)\n        x = self.act(x)\n        x = self.pwconv2(x)\n        x = x.permute(0, 3, 1, 2) # (N, H, W, C) -\u003e (N, C, H, W)\n        return input + x\n```\n\n\u003c/details\u003e\n\n### Compact Model Experiments\n\n#### Cifar-100\n\n|    Model    | Latency (ms) | Params (M) | GMACs | Top-1 (100e) | Top-1 (200e) | Top-1 (300e) | Top-1 (400e) |                                                                                                                    Log                                                                                                                    |  \n|:-----------:|:------------:|:----------:|:-----:|:------------:|:------------:|:------------:|:------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|\n| RepViT-M0.6 |     0.6      |    2.2     | 0.39  |    71.73     |    78.45     |    80.16     |    80.60     |   [100e](./logs/compact/cifar/repvit_m0_6_cifar_100e.txt) / [200e](./logs/compact/cifar/repvit_m0_6_cifar_200e.txt) / [300e](./logs/compact/cifar/repvit_m0_6_cifar_300e.txt) / [400e](./logs/compact/cifar/repvit_m0_6_cifar_400e.txt)   |\n| RepNeXt-M0E |     0.7      |    2.3     | 0.46  |    73.56     |    79.79     |  **81.89**   |  **82.07**   | [100e](./logs/compact/cifar/repnext_m0_e_cifar_100e.txt) / [200e](./logs/compact/cifar/repnext_m0_e_cifar_200e.txt) / [300e](./logs/compact/cifar/repnext_m0_e_cifar_300e.txt) / [400e](./logs/compact/cifar/repnext_m0_e_cifar_400e.txt) |\n| RepNeXt-M0  |     0.6      |    2.0     | 0.39  |  **74.53**   |  **80.14**   |    81.73     |    81.97     |     [100e](./logs/compact/cifar/repnext_m0_cifar_100e.txt) / [200e](./logs/compact/cifar/repnext_m0_cifar_200e.txt) / [300e](./logs/compact/cifar/repnext_m0_cifar_300e.txt) / [400e](./logs/compact/cifar/repnext_m0_cifar_400e.txt)     |\n\n![top1_acc_cifar_100](./figures/compact_models_cifar_100.png)\n\n`RepNeXt-M0E` is the equivalent form of `RepNeXt-M0` where the multi-branch design is replaced by the single-branch large-kernel depthwise convolution.\nOur multi-branch reparameter design helps the model converge faster, with a smaller size and lower latency.\n\n#### ImageNet-1K\n\n|    Model    | Latency (ms) | Params (M) | GMACs | Top-1 (100e) | Top-1 (200e) | Top-1 (300e) | Top-1 (400e) |                                                                                                                        Log                                                                                                                        |  \n|:-----------:|:------------:|:----------:|:-----:|:------------:|:------------:|:------------:|:------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|\n| RepViT-M0.6 |     0.6      |    2.5     | 0.39  |    69.29     |    71.63     |    72.34     |  **72.87**   |   [100e](./logs/compact/imgnet/repvit_m0_6_imgnet_100e.txt) / [200e](./logs/compact/imgnet/repvit_m0_6_imgnet_200e.txt) / [300e](./logs/compact/imgnet/repvit_m0_6_imgnet_300e.txt) / [400e](./logs/compact/imgnet/repvit_m0_6_imgnet_400e.txt)   |\n| RepNeXt-M0E |     0.7      |    2.5     | 0.46  |    69.69     |    71.57     |    72.18     |    72.53     | [100e](./logs/compact/imgnet/repnext_m0_e_imgnet_100e.txt) / [200e](./logs/compact/imgnet/repnext_m0_e_imgnet_200e.txt) / [300e](./logs/compact/imgnet/repnext_m0_e_imgnet_300e.txt) / [400e](./logs/compact/imgnet/repnext_m0_e_imgnet_400e.txt) |\n| RepNeXt-M0  |     0.6      |    2.3     | 0.39  |  **70.14**   |  **71.93**   |  **72.56**   |    72.78     |     [100e](./logs/compact/imgnet/repnext_m0_imgnet_100e.txt) / [200e](./logs/compact/imgnet/repnext_m0_imgnet_200e.txt) / [300e](./logs/compact/imgnet/repnext_m0_imgnet_300e.txt) / [400e](./logs/compact/imgnet/repnext_m0_imgnet_400e.txt)     |\n\n![top1_acc_cifar_100](./figures/compact_models_imagenet_1k.png)\n\n`RepNeXt-M0E` is the equivalent form of `RepNeXt-M0` where the multi-branch design is replaced by the single-branch large-kernel depthwise convolution.\nOur multi-branch reparameter design helps the model converge faster, with a smaller size and lower latency.\n\n## Acknowledgement\n\nClassification (ImageNet) code base is partly built with [LeViT](https://github.com/facebookresearch/LeViT), [PoolFormer](https://github.com/sail-sg/poolformer), [EfficientFormer](https://github.com/snap-research/EfficientFormer),  [RepViT](https://github.com/THU-MIG/RepViT), and [MogaNet](https://github.com/Westlake-AI/MogaNet).\n\nThe detection and segmentation pipeline is from [MMCV](https://github.com/open-mmlab/mmcv) ([MMDetection](https://github.com/open-mmlab/mmdetection) and [MMSegmentation](https://github.com/open-mmlab/mmsegmentation)). \n\nThanks for the great implementations! \n\n## Citation\n\nIf our code or models help your work, please cite our paper:\n```BibTeX\n@misc{zhao2024repnext,\n      title={RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization},\n      author={Mingshu Zhao and Yi Luo and Yong Ouyang},\n      year={2024},\n      eprint={2406.16004},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsuous%2Frepnext","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsuous%2Frepnext","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsuous%2Frepnext/lists"}