{"id":20856276,"url":"https://github.com/yangyanli/do-conv","last_synced_at":"2025-05-12T06:31:28.526Z","repository":{"id":53118439,"uuid":"274043939","full_name":"yangyanli/DO-Conv","owner":"yangyanli","description":"Depthwise Over-parameterized Convolutional Layer","archived":false,"fork":false,"pushed_at":"2022-12-04T11:35:46.000Z","size":53,"stargazers_count":197,"open_issues_count":1,"forks_count":35,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-01T03:37:13.345Z","etag":null,"topics":["convolutional-neural-networks","deep-learning","gluoncv","pytorch","tensorflow"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yangyanli.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-06-22T05:01:23.000Z","updated_at":"2025-02-24T03:24:45.000Z","dependencies_parsed_at":"2023-01-24T03:16:19.425Z","dependency_job_id":null,"html_url":"https://github.com/yangyanli/DO-Conv","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yangyanli%2FDO-Conv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yangyanli%2FDO-Conv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yangyanli%2FDO-Conv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yangyanli%2FDO-Conv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yangyanli","download_url":"https://codeload.github.com/yangyanli/DO-Conv/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253687535,"owners_count":21947694,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["convolutional-neural-networks","deep-learning","gluoncv","pytorch","tensorflow"],"created_at":"2024-11-18T04:29:47.882Z","updated_at":"2025-05-12T06:31:28.266Z","avatar_url":"https://github.com/yangyanli.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DO-Conv: Depthwise Over-parameterized Convolutional Layer\n\nCreated by \u003ca href=\"https://jinming0912.github.io/\" target=\"_blank\"\u003eJinming Cao\u003c/a\u003e, \u003ca href=\"http://yangyan.li\" target=\"_blank\"\u003eYangyan Li\u003c/a\u003e, Mingchao Sun, \u003ca href=\"https://scholar.google.com/citations?user=NpTmcKEAAAAJ\u0026hl=en\" target=\"_blank\"\u003eYing Chen\u003c/a\u003e, \u003ca href=\"https://www.cs.huji.ac.il/~danix/\" target=\"_blank\"\u003eDani Lischinski\u003c/a\u003e, \u003ca href=\"https://danielcohenor.com/\" target=\"_blank\"\u003eDaniel Cohen-Or\u003c/a\u003e, \u003ca href=\"https://cfcs.pku.edu.cn/baoquan/\" target=\"_blank\"\u003eBaoquan Chen\u003c/a\u003e, and \u003ca href=\"http://irc.cs.sdu.edu.cn/~chtu/index.html\" target=\"_blank\"\u003eChanghe Tu\u003c/a\u003e. Transactions on Image Processing (TIP) 2022.\n\n## Introduction\n\nDO-Conv is a depthwise over-parameterized convolutional layer, which can be used as a replacement of conventional convolutional layer in CNNs in the training phase to achieve higher accuracies. In the inference phase, DO-Conv can be fused into a conventional convolutional layer, resulting in the computation amount that is exactly the same as that of a conventional convolutional layer.\n\nPlease see our \u003ca href=\"https://ieeexplore.ieee.org/document/9779456\" target=\"_blank\"\u003epaper\u003c/a\u003e for more details, where we demonstrated the advantages of DO-Conv on various benchmark datasets/tasks.\n\n## We Highly Welcome Issues\n\n**We highly welcome issues, rather than emails, for DO-Conv related questions.**\n\nMoreover, it would be great if a **minimal reproduciable example code** is provide in the issue.\n\n## News\n\n1 . We open source the FUSION code in PyTorch. Run the demo example (see the **save_with_fusion** function for details) to fold D into W when save model trained with DOConv.\n````\npython sample_pt_with_fusion.py\n````\nThe saved models are in the model folder, and the number of model parameters is the same as that using conventional convolutional layers without introducing extra computation at the inference phase. You can refer to the **load_model_with_fusion** function for model loading, be noted to use a network structure that is exactly the same as the original model but using conventional convolutional layers.\n\n2 . We provide DOConv for the new pytorch version (pytorch==1.10.2, torchvision==0.11.3).\n\nReplace this line：\n````\nfrom do_conv_pytorch import DOConv2d\n````\nwith\n````\nfrom do_conv_pytorch_1_10 import DOConv2d\n````\nto apply this version of DOConv without any other changes.\n\n## ImageNet Classification Performance\n\nWe take the \u003ca href=\"https://gluon-cv.mxnet.io/model_zoo/classification.html\" target=\"_blank\"\u003emodel zoo\u003c/a\u003e of \u003ca href=\"https://gluon-cv.mxnet.io/contents.html\" target=\"_blank\"\u003eGluonCV\u003c/a\u003e as baselines. The settings in the baselines have been tuned to favor baselines, and they are not touched during the switch to DO-Conv. In other words, DO-Conv is the one and only change over baselines, and no hyper-parameter tuning is conducted to favor DO-Conv. We consider GluonCV highly reproducible, but still, to exclude clutter factors as much as possible, we train the baselines ourselves, and compare DO-Conv versions with them, while reporting the performance provided by GluonCV as reference. The results are summarized in this table where the “DO-Conv” column shows the performance gain over the baselines.\n\u003ctable\u003e\n\u003cthead\u003e\n  \u003ctr\u003e\n    \u003cth\u003eNetwork\u003c/th\u003e\n    \u003cth\u003eDepth\u003c/th\u003e\n    \u003cth\u003eReference\u003c/th\u003e\n    \u003cth\u003eBaseline\u003c/th\u003e\n    \u003cth\u003eDO-Conv\u003c/th\u003e\n  \u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ePlain\u003c/td\u003e\n    \u003ctd\u003e18\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e69.97\u003c/td\u003e\n    \u003ctd\u003e+1.01\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd rowspan=\"5\"\u003eResNet-v1\u003c/td\u003e\n    \u003ctd\u003e18\u003c/td\u003e\n    \u003ctd\u003e70.93\u003c/td\u003e\n    \u003ctd\u003e70.87\u003c/td\u003e\n    \u003ctd\u003e+0.82\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e34\u003c/td\u003e\n    \u003ctd\u003e74.37\u003c/td\u003e\n    \u003ctd\u003e74.49\u003c/td\u003e\n    \u003ctd\u003e+0.49\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e50\u003c/td\u003e\n    \u003ctd\u003e77.36\u003c/td\u003e\n    \u003ctd\u003e77.32\u003c/td\u003e\n    \u003ctd\u003e+0.08\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e101\u003c/td\u003e\n    \u003ctd\u003e78.34\u003c/td\u003e\n    \u003ctd\u003e78.16\u003c/td\u003e\n    \u003ctd\u003e+0.46\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e152\u003c/td\u003e\n    \u003ctd\u003e79.22\u003c/td\u003e\n    \u003ctd\u003e79.34\u003c/td\u003e\n    \u003ctd\u003e+0.07\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd rowspan=\"5\"\u003eResNet-v1b\u003c/td\u003e\n    \u003ctd\u003e18\u003c/td\u003e\n    \u003ctd\u003e70.94\u003c/td\u003e\n    \u003ctd\u003e71.08\u003c/td\u003e\n    \u003ctd\u003e+0.71\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e34\u003c/td\u003e\n    \u003ctd\u003e74.65\u003c/td\u003e\n    \u003ctd\u003e74.35\u003c/td\u003e\n    \u003ctd\u003e+0.77\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e50\u003c/td\u003e\n    \u003ctd\u003e77.67\u003c/td\u003e\n    \u003ctd\u003e77.56\u003c/td\u003e\n    \u003ctd\u003e+0.44\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e101\u003c/td\u003e\n    \u003ctd\u003e79.20\u003c/td\u003e\n    \u003ctd\u003e79.14\u003c/td\u003e\n    \u003ctd\u003e+0.25\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e152\u003c/td\u003e\n    \u003ctd\u003e79.69\u003c/td\u003e\n    \u003ctd\u003e79.60\u003c/td\u003e\n    \u003ctd\u003e+0.10\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd rowspan=\"5\"\u003eResNet-v2\u003c/td\u003e\n    \u003ctd\u003e18\u003c/td\u003e\n    \u003ctd\u003e71.00\u003c/td\u003e\n    \u003ctd\u003e70.80\u003c/td\u003e\n    \u003ctd\u003e+0.64\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e34\u003c/td\u003e\n    \u003ctd\u003e74.40\u003c/td\u003e\n    \u003ctd\u003e74.76\u003c/td\u003e\n    \u003ctd\u003e+0.22\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e50\u003c/td\u003e\n    \u003ctd\u003e77.17\u003c/td\u003e\n    \u003ctd\u003e77.17\u003c/td\u003e\n    \u003ctd\u003e+0.31\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e101\u003c/td\u003e\n    \u003ctd\u003e78.53\u003c/td\u003e\n    \u003ctd\u003e78.56\u003c/td\u003e\n    \u003ctd\u003e+0.11\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e152\u003c/td\u003e\n    \u003ctd\u003e79.21\u003c/td\u003e\n    \u003ctd\u003e79.24\u003c/td\u003e\n    \u003ctd\u003e+0.14\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eResNext\u003c/td\u003e\n    \u003ctd\u003e50_32x4d\u003c/td\u003e\n    \u003ctd\u003e79.32\u003c/td\u003e\n    \u003ctd\u003e79.21\u003c/td\u003e\n    \u003ctd\u003e+0.40\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMobileNet-v1\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e73.28\u003c/td\u003e\n    \u003ctd\u003e73.30\u003c/td\u003e\n    \u003ctd\u003e+0.03\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMobileNet-v2\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e72.04\u003c/td\u003e\n    \u003ctd\u003e71.89\u003c/td\u003e\n    \u003ctd\u003e+0.16\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMobileNet-v3\u003c/td\u003e\n    \u003ctd\u003e-\u003c/td\u003e\n    \u003ctd\u003e75.32\u003c/td\u003e\n    \u003ctd\u003e75.16\u003c/td\u003e\n    \u003ctd\u003e+0.14\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\n## DO-Conv Usage\n\nIn thie repo, we provide reference implementation of DO-Conv in \u003ca href=\"https://www.tensorflow.org/\" target=\"_blank\"\u003eTensorflow\u003c/a\u003e (tensorflow-gpu==2.2.0), \u003ca href=\"https://pytorch.org/\" target=\"_blank\"\u003ePyTorch\u003c/a\u003e (pytorch==1.4.0, torchvision==0.5.0) and \u003ca href=\"https://gluon-cv.mxnet.io/contents.html\" target=\"_blank\"\u003eGluonCV\u003c/a\u003e (mxnet-cu100==1.5.1.post0, gluoncv==0.6.0), as replacement to \u003ca href=\"https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D\" target=\"_blank\"\u003etf.keras.layers.Conv2D\u003c/a\u003e, \u003ca href=\"https://pytorch.org/docs/master/generated/torch.nn.Conv2d.html\" target=\"_blank\"\u003etorch.nn.Conv2d\u003c/a\u003e and \u003ca href=\"https://beta.mxnet.io/api/gluon/_autogen/mxnet.gluon.nn.Conv2D.html\" target=\"_blank\"\u003emxnet.gluon.nn.Conv2D\u003c/a\u003e, respectively. Please see the code for more details.\n\nWe highly welcome pull requests for adding support for different versions of Pytorch/Tensorflow/GluonCV.\n\n## Example Usage: Tensorflow (tensorflow-gpu==2.2.0)\nWe show how to use DO-Conv based on the examples provided in the \u003ca href=\"https://www.tensorflow.org/tutorials/quickstart/advanced\" target=\"_blank\"\u003eTutorial\u003c/a\u003e of TensorFlow with MNIST dataset.\n\n1 . Run the demo example first to get the accuracy of the baseline.\n````\npython sample_tf.py\n````\nIf there is any wrong at this step, please check whether the tensorflow version meets the requirements.\n\n2 . Replace these lines:\n````\nself.conv1 = Conv2D(32, 3, activation='relu')\nself.conv2 = Conv2D(8, 3, activation='relu')\n````\nwith\n````\nself.conv1 = DOConv2D(32, 3, activation='relu')\nself.conv2 = DOConv2D(8, 3, activation='relu')\n````\nto apply DO-Conv without any other changes. \n````\npython sample_tf.py\n````\n3 . We provide the performance improvement in this demo example as follows. (averaged accuracy (%) of five runs)\n\n|          | run1  | run2  | run3  | run4  | run5  | avg    | +     |\n|----------|-------|-------|-------|-------|-------|--------|-------|\n| Baseline | 98.5  | 98.51 | 98.54 | 98.46 | 98.51 | 98.504 | -     |\n| DO-Conv  | 98.71 | 98.62 | 98.67 | 98.75 | 98.66 | 98.682 | 0.178 |\n\n4 . Then you can use DO-Conv in your own network in this way.\n\n## Example Usage: PyTorch (pytorch==1.4.0, torchvision==0.5.0)\nWe show how to use DO-Conv based on the examples provided in the \u003ca href=\"https://pytorch.org/tutorials/beginner/nn_tutorial.html?highlight=mnist\" target=\"_blank\"\u003eTutorial\u003c/a\u003e of PyTorch with MNIST dataset.\n\n1 . Run the demo example first to get the accuracy of the baseline.\n````\npython sample_pt.py\n````\nIf there is any wrong at this step, please check whether the pytorch and torchvision versions meets the requirements.\n\n2 . Replace these lines:\n````\nmodel = nn.Sequential(\n    Conv2d(1, 16, kernel_size=3, stride=2, padding=1),\n    nn.ReLU(),\n    Conv2d(16, 16, kernel_size=3, stride=2, padding=1),\n    nn.ReLU(),\n    Conv2d(16, 10, kernel_size=3, stride=2, padding=1),\n    nn.ReLU(),\n    nn.AdaptiveAvgPool2d(1),\n    Lambda(lambda x: x.view(x.size(0), -1)),\n)\n````\nwith\n````\nmodel = nn.Sequential(\n    DOConv2d(1, 16, kernel_size=3, stride=2, padding=1),\n    nn.ReLU(),\n    DOConv2d(16, 16, kernel_size=3, stride=2, padding=1),\n    nn.ReLU(),\n    DOConv2d(16, 10, kernel_size=3, stride=2, padding=1),\n    nn.ReLU(),\n    nn.AdaptiveAvgPool2d(1),\n    Lambda(lambda x: x.view(x.size(0), -1)),\n)\n````\nto apply DO-Conv without any other changes. \n````\npython sample_pt.py\n````\n3 . We provide the performance improvement in this demo example as follows. (averaged accuracy (%) of five runs)\n\n|          | run1  | run2  | run3  | run4  | run5  | avg    | +     |\n|----------|-------|-------|-------|-------|-------|--------|-------|\n| Baseline | 94.63  | 95.31 | 95.23 | 95.24 | 95.37 | 95.156 | -     |\n| DO-Conv  | 95.59 | 95.73 | 95.68 | 95.70 | 95.67 | 95.674 | 0.518 |\n\n4 . Then you can use DO-Conv in your own network in this way.\n\n## Example Usage: GluonCV (mxnet-cu100==1.5.1.post0, gluoncv==0.6.0)\nWe show how to use DO-Conv based on the examples provided in the \u003ca href=\"https://mxnet.apache.org/versions/1.6/api/python/docs/tutorials/packages/gluon/image/mnist.html\" target=\"_blank\"\u003eTutorial\u003c/a\u003e of GluonCV with MNIST dataset.\n\n1 . Run the demo example first to get the accuracy of the baseline.\n````\npython sample_gluoncv.py\n````\nIf there is any wrong at this step, please check whether the mxnet and gluoncv versions meets the requirements.\n\n2 . Replace these lines:\n````\nself.conv1 = Conv2D(20, kernel_size=(5,5))\nself.conv2 = Conv2D(50, kernel_size=(5,5))\n````\nwith\n````\nself.conv1 = DOConv2D(1, 20, kernel_size=(5, 5))\nself.conv2 = DOConv2D(20, 50, kernel_size=(5, 5))\n````\nto apply DO-Conv, note that the 'in_channels' in DOConv2D of GluonCV should be set explicitly. \n````\npython sample_gluoncv.py\n````\n3 . We provide the performance improvement in this demo example as follows. (averaged accuracy (%) of five runs)\n\n|          | run1  | run2  | run3  | run4  | run5  | avg    | +     |\n|----------|-------|-------|-------|-------|-------|--------|-------|\n| Baseline | 98.10 | 98.10 | 98.10 | 98.10 | 98.10 | 98.10 | -     |\n| DO-Conv  | 98.26 | 98.26 | 98.26 | 98.26 | 98.26 | 98.26 | 0.16 |\n\n4 . Then you can use DO-Conv in your own network in this way.\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyangyanli%2Fdo-conv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyangyanli%2Fdo-conv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyangyanli%2Fdo-conv/lists"}