{"id":20161901,"url":"https://github.com/ailab-cvc/unireplknet","last_synced_at":"2025-05-15T04:04:39.080Z","repository":{"id":208972066,"uuid":"722886816","full_name":"AILab-CVC/UniRepLKNet","owner":"AILab-CVC","description":"[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition","archived":false,"fork":false,"pushed_at":"2024-10-24T07:29:55.000Z","size":5058,"stargazers_count":980,"open_issues_count":5,"forks_count":57,"subscribers_count":13,"default_branch":"main","last_synced_at":"2025-04-14T05:55:25.475Z","etag":null,"topics":["architecture","artificial-intelligence","convolutional-neural-networks","deep-learning","multimodal-learning"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2311.15599","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AILab-CVC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-24T07:28:22.000Z","updated_at":"2025-04-05T14:17:23.000Z","dependencies_parsed_at":"2023-12-23T17:16:38.786Z","dependency_job_id":"e6d93978-ce0e-4dba-a5f5-540ab7c90ea1","html_url":"https://github.com/AILab-CVC/UniRepLKNet","commit_stats":{"total_commits":57,"total_committers":2,"mean_commits":28.5,"dds":0.08771929824561409,"last_synced_commit":"90209860873f5be978a840f5f31f18f17de480da"},"previous_names":["ailab-cvc/unireplknet"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FUniRepLKNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FUniRepLKNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FUniRepLKNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FUniRepLKNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AILab-CVC","download_url":"https://codeload.github.com/AILab-CVC/UniRepLKNet/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254270641,"owners_count":22042858,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["architecture","artificial-intelligence","convolutional-neural-networks","deep-learning","multimodal-learning"],"created_at":"2024-11-14T00:21:40.765Z","updated_at":"2025-05-15T04:04:39.057Z","avatar_url":"https://github.com/AILab-CVC.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video,Point Cloud, Time-Series and Image Recognition\n\n\u003cp align=\"center\" width=\"100%\"\u003e\n\u003cimg src=\"assets/UniRepLKNet.png\"  width=\"100%\" height=\"60%\"\u003e\n\u003c/p\u003e\n\n\n\u003cdiv align=\"center\"\u003e\n    \u003cspan class=\"author-block\"\u003e\n    \u003ca href=\"https://dingxiaohan.xyz/\" target=\"_blank\"\u003eXiaohan Ding\u003c/a\u003e\u003csup\u003e1*\u003c/sup\u003e,\n    \u003c/span\u003e\n    \u003cspan class=\"author-block\"\u003e\n    \u003ca href=\"https://invictus717.github.io/\" target=\"_blank\"\u003eYiyuan Zhang\u003c/a\u003e\u003csup\u003e2*\u003c/sup\u003e,\u003c/span\u003e\n    \u003cspan class=\"author-block\"\u003e\n    \u003c/span\u003e\n    \u003ca href=\"https://geyixiao.com/\" target=\"_blank\"\u003eYixiao Ge\u003c/a\u003e\u003csup\u003e1\u003c/sup\u003e,\n    \u003c/span\u003e\n    \u003c/br\u003e\n    \u003cspan class=\"author-block\"\u003e\n    \u003ca target=\"_blank\"\u003eSijie Zhao\u003c/a\u003e\u003csup\u003e1\u003c/sup\u003e,\n    \u003c/span\u003e\n    \u003cspan class=\"author-block\"\u003e\n    \u003ca href=\"https://scholar.google.com/citations?user=6Ra2TgQAAAAJ\u0026hl=en\u0026oi=ao\" target=\"_blank\"\u003eLin Song\u003c/a\u003e\u003csup\u003e1\u003c/sup\u003e,\n    \u003c/span\u003e\n    \u003cspan class=\"author-block\"\u003e\n    \u003ca href=\"http://people.eecs.berkeley.edu/~xyyue/\" target=\"_blank\"\u003eXiangyu Yue\u003c/a\u003e\u003csup\u003e2\u003c/sup\u003e,\n    \u003c/span\u003e\n    \u003cspan class=\"author-block\"\u003e\n    \u003ca href=\"https://scholar.google.com/citations?user=4oXBp9UAAAAJ\u0026hl=en\u0026oi=ao\" target=\"_blank\"\u003eYing Shan\u003c/a\u003e\u003csup\u003e1\u003c/sup\u003e\n    \u003c/span\u003e\n\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n    \u003csup\u003e1\u003c/sup\u003e \u003ca href='https://ai.tencent.com/' target='_blank'\u003eTencent AI Lab\u003c/a\u003e\n    \u003csup\u003e2\u003c/sup\u003e\n    \u003ca href='http://mmlab.ie.cuhk.edu.hk/' target='_blank'\u003eThe Chinese University of Hong Kong\u003c/a\u003e\u0026emsp;\n    \u003c/br\u003e\n    \u003csup\u003e*\u003c/sup\u003e Equal Contribution\u0026emsp;\n\u003c/div\u003e\n\n[![arXiv](https://img.shields.io/badge/arxiv-2311.15599-b31b1b?style=plastic\u0026color=b31b1b\u0026link=https%3A%2F%2Farxiv.org%2Fabs%2F2311.15599)](https://arxiv.org/abs/2311.15599)\n[![arXiv](https://img.shields.io/badge/arxiv-2410.08049-b31b1b?style=plastic\u0026color=b31b1b\u0026link=https%3A%2F%2Farxiv.org%2Fabs%2F2410.08049)](https://arxiv.org/abs/2410.08049)\n[![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/DingXiaoH/UniRepLKNet/tree/main)\n[![website](https://img.shields.io/badge/Project-Website-blueviolet)](https://invictus717.github.io/UniRepLKNet/)\n\u003ca href=\"#LICENSE--citation\"\u003e\n  \u003cimg alt=\"License: Apache2.0\" src=\"https://img.shields.io/badge/LICENSE-Apache%202.0-blue.svg\"/\u003e\n\u003c/a\u003e\n\n🌟🌟🌟 ***News***: The extended journal version of UniRepLKNet is released at [arxiv](https://arxiv.org/abs/2410.08049).\n\n\n## Motivation \n* We note that most architectures of the existing large-kernel ConvNets simply follow other models. **The architectural design for large-kernel ConvNets remains under-explored.**\n* The *universal perception ability* of Transformers is sparking in multimodal research areas (image, audio, video, time-series, *etc*). We are curious whether ConvNets can also deliver **universal perception ability across multiple modalities with a unified architecture**.\n\n## Highlights\n\nA **ConvNet unifies multiple modalities and outperforms modality-specific models**. This paper summarizes architectural guidelines to build large-kernel CNN, which works amazingly well with images and other modalities. This is the latest contribution to both two influential areas - **Structural Re-param** (since RepVGG, Ding et al. 2021) and **very-large-kernel ConvNet** (since RepLKNet, Ding et al. 2022). **ImageNet accuracy of 88.0%, COCO AP of 56.4, ADE20K mIoU of 55.6 with only ImageNet-22K pretraining**. Higher actual speed and performance than recent models like ConvNeXt v2 and InternImage. With a unified architecture and extremely simple modality-specific preprocessing, achieves state-of-the-art performances on audio recognition and, most amazingly, **Global Temperature \u0026 Wind Speed Forecasting** (a challenging huge-scale time-series forecasting task), outperforming the existing global forecasting system.\n\nMore specifically, we contribute from two aspects:\n* We propose four architectural guidelines for designing\nlarge-kernel ConvNets, the core of which is to exploit\nthe essential characteristics of large kernels that distinguish\nthem from small kernels - they can see wide without\ngoing deep. Following such guidelines, our proposed\nlarge-kernel ConvNet shows leading performance in image\nrecognition.  \n* We discover that large kernels are the key to\nunlocking the exceptional performance of ConvNets in domains\nwhere they were originally not proficient. With certain\nmodality-related preprocessing approaches, the proposed\nmodel achieves state-of-the-art performance on time-series\nforecasting and audio recognition tasks even without\nmodality-specific customization to the architecture.\n\n**UniRepLKNet not only signifies a \"comeback\" for ConvNet in its original domain but also showcases large-kernel ConvNet’s potential to \"conquer\" new territories, highlighting further adaptability and broad utility across different modalities and tasks.**\n\n## TODOs\n\n- [x] Model code\n- [x] Most of the ImageNet-1K and ImageNet-22K pretrained weights\n- [x] Weights released on both Google Drive (see this page) and hugging face (see unireplknet.py)\n- [x] PyTorch efficient large-kernel conv implementation\n- [x] ImageNet training code\n- [x] Code and documents of audio, video, point cloud, and time-series tasks\n- [x] Semantic segmentation code, document, and all the checkpoints\n- [x] Object detection code, document, and all the checkpoints\n- [ ] Checkpoints of audio, video, point cloud, and time-series tasks\n\nThe ImageNet, COCO, and ADE20K checkpoints have been released (see the huggingface repo shown below), except the ImageNet-22K pretrained UniRepLKNet-S, and UperNet with UniRepLKNet-XL, which were lost, and we are reproducing them.\n\nLatest news: fixed a bug, which results from [this commit](https://github.com/AILab-CVC/UniRepLKNet/commit/920b7251ea3d52ab476d0f40ba722db56d9a7e03) on Dec 1st, 2023. [Now it is fixed ](https://github.com/AILab-CVC/UniRepLKNet/commit/5349bcee9a8202c62c8c169220f8cc613914baac). If you used unireplknet.py after Dec 1st, 2023, please check your code.\n\n\n\n\n## Code design\n\n1. There is some MMDetection- and MMSegmentation-related code in ```unireplknet.py``` so that you can directly copy-paste it into your MMDetection or MMSegmentation, e.g., [here](unireplknet.py#L29) and [here](unireplknet.py#L617). If you do not want to use it with MMDetection or MMSegmentation, you can safely delete those lines of code.\n2. We have provided code to automatically build our models and load our released weights. See the functions [here](unireplknet.py#L726). You can also use ```timm.create_model``` to build the models. For example, ```model = timm.create_model('unireplknet_l', num_classes=num_classes_of_your_task, in_22k_pretrained=True)``` will call the function ```unireplknet_l``` defined [here](https://github.com/AILab-CVC/UniRepLKNet/blob/main/unireplknet.py#L745), which will build a UniRepLKNet-L and automatically download our checkpoints and load the weights.\n   ```\n   # The simplest way to use our model in your project is to copy-paste unireplknet.py into your working directory and create models. For example\n   from unireplknet import *\n   model = timm.create_model('unireplknet_l', num_classes=num_classes_of_your_task, in_22k_pretrained=True)\n   ```\n4. As UniRepLKNet also uses the Structural Re-parameterization methodology, we provide a function ```reparameterize_unireplknet()``` that converts a trained UniRepLKNet into the inference structure, which equivalently removes the parallel branches in Dialted Reparam Blocks, Batch Norm layers, and the bias term in GRN. The pseudo-code of the full pipeline will be like\n    ```python\n    training_model = unireplknet_l(...,  deploy=False)\n    train(training_model)\n    trained_results = evaluate(training_model)\n    training_model.reparameterize_unireplknet()\n    inference_results = evaluate(training_model)\n    # you will see inference_results are identical to trained_results\n    save(training_model, 'converted_weights.pth')\n    # use the converted model\n    deploy_model = unireplknet_l(..., deploy=True)\n    load_weights(deploy_model, 'converted_weights.pth')\n    deploy_results = evaluate(deploy_model)\n    # you will see deploy_results == inference_results == trained_results\n    ```\n5. You may want to read this if you are familiar with the [timm](https://github.com/huggingface/pytorch-image-models/tree/main) library. We sincerely thank timm for providing a convenient [re-parameterize function](https://github.com/huggingface/pytorch-image-models/blob/main/timm/utils/model.py#L225). The code design of UniRepLKNet is compatible with it. That is, calling ```some_unireplknet_model.reparameterize_unireplknet()``` is equivalent to calling ```timm.utils.reparameterize_model(some_unireplknet_model)```. So if you use our code with timm's codebase, e.g., timm's evaluation code, just add ```--reparam``` to your command so that ```timm.utils.reparameterize_model``` will be called (see [here](https://github.com/huggingface/pytorch-image-models/blob/main/validate.py#L128)).\n\n\n\n## Models\n\nWe have provided five ways to download our checkpoints.\n\n1. Download via the Google Drive links shown below.\n2. Visit our huggingface repo at https://huggingface.co/DingXiaoH/UniRepLKNet/tree/main and click the download icons.\n3. Use huggingface-hub in your python code. First, install huggingface_hub\n```\npip install huggingface_hub\n```\nThen, use huggingface_hub like this in your python code, for example,\n```python\nfrom huggingface_hub import hf_hub_download\nrepo_id = 'DingXiaoH/UniRepLKNet'\ncache_file = hf_hub_download(repo_id=repo_id, filename=FILE_NAME)\ncheckpoint = torch.load(cache_file, map_location='cpu')\nmodel.load_state_dict(checkpoint)\n```\nSee our [huggingface repo](https://huggingface.co/DingXiaoH/UniRepLKNet/tree/main) or [our code](unireplknet.py#L670) for FILE_NAME (e.g., ```unireplknet_xl_in22k_pretrain.pth```).\n\n4. Use the huggingface CLI. Check the [tutorial](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli).\n\n5. Automatically download our checkpoints by passing ```in_1k_pretrained=True```, ```in_22k_pretrained=True```, or ```in_22k_to_1k=True``` while calling our provided functions. See the [code here](unireplknet.py#L738).\n\n### ImageNet-1K Pretrained Weights\n\n| name | resolution |acc@1 | #params | FLOPs | Weights |\n|:---:|:---:|:---:|:---:| :---:|:---:|\n| UniRepLKNet-A | 224x224 | 77.0 | 4.4M  | 0.6G | [ckpt](https://drive.google.com/file/d/1jUB-lq6NMTbeBvGTDvAarKWh-ZfMMZWt/view?usp=drive_link) |\n| UniRepLKNet-F | 224x224 | 78.6 | 6.2M  | 0.9G | [ckpt](https://drive.google.com/file/d/1vYqhCNx3q-z0fVT4UZecFTUmb9IDaYh9/view?usp=drive_link) |\n| UniRepLKNet-P | 224x224 | 80.2 | 10.7M  | 1.6G | [ckpt](https://drive.google.com/file/d/1D7rljWnnzEGDn8MDkvAWJ8qd1SCix6Vm/view?usp=drive_link) |\n| UniRepLKNet-N | 224x224 | 81.6 | 18.3M | 2.8G | [ckpt](https://drive.google.com/file/d/1tMHOl55C7h44ag8SLUuaP0bBUUpVXhKj/view?usp=drive_link) |\n| UniRepLKNet-T | 224x224 | 83.2 | 31M | 4.9G | [ckpt](https://drive.google.com/file/d/12Xon3FWkzEQV1nnNsF2U8XDMD-7NO2cJ/view?usp=drive_link) |\n| UniRepLKNet-S | 224x224 | 83.9 | 56M   | 9.1G | [ckpt](https://drive.google.com/file/d/11YEOcKs4WNprRzCvKe-fB5z-l7zQv3kb/view?usp=drive_link) |\n\n### ImageNet-22K Pretrained Weights\n\n| name | resolution | #params | FLOPs | ckpt |\n|:---:|:---:|:---:|:---:| :---:|\n| UniRepLKNet-S | 224x224 | 56M | 26.7G  | [ckpt](https://drive.google.com/file/d/18ljg1qZKNxNvpUuOnrZSZ_DS81vGVU_9/view?usp=drive_link) |\n| UniRepLKNet-B | 224x224 | 98M   | 47.2G   | [ckpt](https://drive.google.com/file/d/1t1txZOTpwXGUMVsqyUxpzE5EGLqMX5li/view?usp=drive_link)|\n| UniRepLKNet-L | 192x192 | 218M  | 105.4G   | [ckpt](https://drive.google.com/file/d/1PEY474n6a7pZ3vJitsU7ZLzwBI00pf7u/view?usp=drive_link)|\n| UniRepLKNet-XL | 192x192 | 386M  | 187G  | [ckpt](https://drive.google.com/file/d/1OP7I0jabljm8LKXTypk4HDmF9dQQqYib/view?usp=drive_link)|\n\n### Pretrained on ImageNet-22K then finetuned on ImageNet-1K\n\n| name | resolution |acc@1 | #params | FLOPs | ckpt |\n|:---:|:---:|:---:|:---:| :---:| :---:|\n| UniRepLKNet-S | 384x384 | 86.4 | 56M | 26.7G  | [ckpt](https://drive.google.com/file/d/1PzEHFOgEllMRIB-emkX_2VjXyBYC_X0z/view?usp=drive_link)|\n| UniRepLKNet-B | 384x384 | 87.4 | 98M   | 47.2G   | [ckpt](https://drive.google.com/file/d/1T4BB3xx6FsWrK5QpTy7FwBrLuOMcZcEu/view?usp=drive_link)|\n| UniRepLKNet-L | 384x384 | 87.9 | 218M  | 105.4G   | [ckpt](https://drive.google.com/file/d/10jJGzXX3cFRrfk3oAoIoWRnKSAaquQtM/view?usp=drive_link)|\n| UniRepLKNet-XL | 384x384 | 88.0 | 386M  | 187G  | [ckpt](https://drive.google.com/file/d/1bS6DfAdQW-tHfyKOEpE_fXypi45CLnT3/view?usp=sharing)|\n\n### COCO Object Detection\n\nCode, document, and config files have been released. See the [detection guide](detection/README.md) here.\n\nCheckpoints have already been released on hugging face. You can download them right now from https://huggingface.co/DingXiaoH/UniRepLKNet/tree/main.\n\nOr you can download these checkpoints from Google Drive as follows:\n\n\n| name | resolution |box mAP | mask mAP | #params | FLOPs | Weights |\n|:---:|:---:|:---:|:---:| :---:|:---:|:---:|\n| UniRepLKNet-T | 1280x800 | 51.7 | 44.9 | 89M  | 749G | [ckpt](https://drive.google.com/file/d/15LVEXyC8xxOIHhUeSFeolyZ1IVXVjQ4I/view?usp=drive_link) |\n| UniRepLKNet-S | 1280x800 | 53.0 | 45.9 | 113M  | 835G | [ckpt](https://drive.google.com/file/d/1wcdMn35aMLgjIFVEIaJYMjOwMtuoz58I/view?usp=drive_link) |\n| UniRepLKNet-S_22K | 1280x800 | 54.3 | 47.1 | 113M  | 835G | [ckpt](https://drive.google.com/file/d/1pZmrLRbM8bjiQvr_xenReXmtA7M3f_Ii/view?usp=sharing) |\n| UniRepLKNet-B_22K | 1280x800 | 54.8 | 47.4 | 155M  | 978G | [ckpt](https://drive.google.com/file/d/1CCyk0q4E4tuFLWqafHIC-DywdJJ0-pMQ/view?usp=drive_link) |\n| UniRepLKNet-L_22K | 1280x800 | 55.8 | 48.4 | 276M  | 1385G | [ckpt](https://drive.google.com/file/d/1m9WzhfhEF1KKxLH8IxE5vkM7HlucJu4N/view?usp=drive_link) |\n| UniRepLKNet-XL_22K | 1280x800 | 56.4 | 49.0 | 443M  | 1952G | [ckpt](https://drive.google.com/file/d/1np1zCV_34MdOsViKMVdO1l4feLe8evmp/view?usp=drive_link) |\n\n### ADE-20K Semantic Segmentation\n\nCode, document, and config files have been released. See the [segmentation guide](segmentation/README.md) here.\n\nCheckpoints have already been released on hugging face. You can download them right now from https://huggingface.co/DingXiaoH/UniRepLKNet/tree/main.\n\nOr you can download these checkpoints from Google Drive as follows:\n\n| name | resolution |mIoU (ss/ms) | #params | FLOPs | Weights |\n|:---:|:---:|:---:|:---:| :---:|:---:|\n| UniRepLKNet-T | 512x512 | 48.6/49.1 | 61M | 946G  | [ckpt](https://drive.google.com/file/d/1R2teeQt7q48EBBRbeVXShISpOmS5YHjs/view?usp=drive_link) |\n| UniRepLKNet-S | 512x512 | 50.5/51.0 | 86M  | 1036G | [ckpt](https://drive.google.com/file/d/1SBHvbK4zoPSZ827F5Sp209LYIh2T7Iew/view?usp=drive_link) |\n| UniRepLKNet-S_22K | 512x512 | 51.9/52.7 | 86M  | 1036G | [ckpt](https://drive.google.com/file/d/15dNuw34kia5qtt6UijcnutEktY05OrKH/view?usp=drive_link) |\n| UniRepLKNet-S_22K | 640x640 | 52.3/52.7 | 86M  | 1618G | [ckpt](https://drive.google.com/file/d/1WVmAQ8sKDeX0APS9Q88z4dZge31kHx2v/view?usp=sharing) |   |\n| UniRepLKNet-B_22K | 640x640 | 53.5/53.9 | 130M  | 1850G | [ckpt](https://drive.google.com/file/d/1sflCn8ny-cU5Bk8yBGE3E-yIO8eECE0H/view?usp=drive_link) |\n| UniRepLKNet-L_22K | 640x640 | 54.5/55.0 | 254M  | 2507G | [ckpt](https://drive.google.com/file/d/1Qev75aKZY5bNAM17cLecD2OoZwKf5DA7/view?usp=drive_link) |\n| UniRepLKNet-XL_22K | 640x640 | 55.2/55.6 | 425M  | 3420G | [ckpt](https://drive.google.com/file/d/1Ajwc7ZOk5eK19XX6VzgmAu2Wn0Dkb3jI/view?usp=sharing) |\n\n## ImageNet evaluation and training\n\nWe give an example evaluation command.\n\nSingle-GPU\n```\npython main.py --model unireplknet_b --eval true \\\n--resume unireplknet_b_in22k_to_in1k_384_acc87.40.pth  \\\n--input_size 384 \\\n--data_path /path/to/imagenet-1k\n```\nMulti-GPU\n```\npython -m torch.distributed.launch --nproc_per_node=8 main.py \\\n--model unireplknet_b --eval true \\\n--resume unireplknet_b_in22k_to_in1k_384_acc87.40.pth  \\\n--input_size 384 \\\n--data_path /path/to/imagenet-1k\n```\nFor training or finetuning UniRepLKNets on ImageNet-1K or 22K, see [this guide](/Image/README.md)\n\n## Universal perception of audio, video, point cloud, and time-series tasks\n\nFor detailed documentation, please refer to these documents as follows:\n\n* [Audio guide](/Audio/README.md)\n* [Point cloud guide](/Point/README.md)\n* [Time-series forecasting guide](/Time-Series/README.md)\n* [Video guide](/Video/README.md)\n\n## Use an efficient large-kernel convolution with PyTorch\n\nWe use a large-kernel conv implementation in **PyTorch** that is more efficient than the native torch.nn.Conv2d . It is implemented based on the iGEMM algorithm and a lightweight tool named cutlass. The installation is very simple and will cost you less than one minute. If you do not install this implementation, you can still use our model anywhere you wish but it will be a bit slower.\n\n1. Download [cutlass.zip](https://github.com/DingXiaoH/RepLKNet-pytorch/blob/main/cutlass.zip), then ```unzip cutlass.zip```, enter the directory. This version of cutlass provided in this repository works fine with our large-kernel implementation and multiple python versions. You may alternatively use the cutlass branch maintained by the MegEngine team (clone https://github.com/MegEngine/cutlass), but you may need to be more careful with your python version (see [this issue](https://github.com/DingXiaoH/RepLKNet-pytorch/issues/34)).\n2. ```cd examples/19_large_depthwise_conv2d_torch_extension```\n3. ```./setup.py install --user```. If you get errors, check your ```CUDA_HOME```.\n4. You may do a quick check to verify that the results of forward/backward computations are the same as torch.nn.Conv2d: ```python depthwise_conv2d_implicit_gemm.py```\n5. Add ```PATH_TO_CUTLASS_DIRECTORY/examples/19_large_depthwise_conv2d_torch_extension``` into your ```PYTHONPATH``` so that you can ```from depthwise_conv2d_implicit_gemm import DepthWiseConv2dImplicitGEMM``` anywhere. Then you may use ```DepthWiseConv2dImplicitGEMM``` as a replacement of ```nn.Conv2d```.\n\nIt should work with a wide range of GPUs and PyTorch/CUDA versions. We suggest you try first and check the environments only if you get any errors. Our latest testes used both\n\n1. Ubuntu 18.04 + CUDA 11.3 + nvcc 11.3 + cudnn 8.2.0 + python 3.8.12 + pytorch 1.10 + gcc 7.3.0 + nccl 2.10.3 + NVIDIA driver 450.102.04 + V100 and A100 GPUs\n2. Ubuntu 18.04 + CUDA 10.2 + nvcc 10.0 + cudnn 7.6.5 + python 3.6.9 + pytorch 1.9 + gcc 7.5.0 + nccl 2.7.8 + NVIDIA driver 460.32.03 + 2080Ti and V100 GPUs\n\nIt is reported (see [here](https://github.com/DingXiaoH/RepLKNet-pytorch/issues/34)) that a python version mismatch may result in an error (```forward_fp32.cu(212): error: more than one instance of constructor \"cutlass::Tensor4DCoord::Tensor4DCoord\" ...``` or ```cutlass/include/cutlass/fast_math.h(741): error: no suitable conversion function from \"__half\" to \"float\" exists```). Please upgrade or downgrade your python. We sincerely thank @sleeplessai and @ewrfcas for sharing their experience.\n\nPull requests (e.g., better or other implementations or implementations on other frameworks) are welcomed.\n\n## Citation\n\nIf the code and paper help your research, please kindly cite:\n\n```\n@article{zhang2024scaling,\n  title={Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations},\n  author={Zhang, Yiyuan and Ding, Xiaohan and Yue, Xiangyu},\n  journal={arXiv preprint arXiv:2410.08049},\n  year={2024}\n}\n\n@inproceedings{ding2024unireplknet,\n  title={UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition},\n  author={Ding, Xiaohan and Zhang, Yiyuan and Ge, Yixiao and Zhao, Sijie and Song, Lin and Yue, Xiangyu and Shan, Ying},\n  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  pages={5513--5524},\n  year={2024}\n}\n```\n## License\nThis project is released under the Apache 2.0 license. Please see the [LICENSE](LICENSE) file for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Failab-cvc%2Funireplknet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Failab-cvc%2Funireplknet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Failab-cvc%2Funireplknet/lists"}