{"id":18727822,"url":"https://github.com/jdai-cv/cotnet","last_synced_at":"2025-04-05T17:02:56.046Z","repository":{"id":44874831,"uuid":"364822245","full_name":"JDAI-CV/CoTNet","owner":"JDAI-CV","description":"This is an official implementation for \"Contextual Transformer Networks for Visual Recognition\".","archived":false,"fork":false,"pushed_at":"2021-08-08T11:17:46.000Z","size":462,"stargazers_count":531,"open_issues_count":17,"forks_count":81,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-03-29T16:02:39.544Z","etag":null,"topics":["contextual-transformer","cotnet","image-classification","imagenet","instance-segmentation","mask-rcnn","mscoco","object-detection","semantic-segmentation","vision-transformer"],"latest_commit_sha":null,"homepage":"https://arxiv.org/pdf/2107.12292.pdf","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JDAI-CV.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-05-06T07:27:12.000Z","updated_at":"2025-03-20T18:10:51.000Z","dependencies_parsed_at":"2022-09-23T21:51:56.392Z","dependency_job_id":null,"html_url":"https://github.com/JDAI-CV/CoTNet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JDAI-CV%2FCoTNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JDAI-CV%2FCoTNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JDAI-CV%2FCoTNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JDAI-CV%2FCoTNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JDAI-CV","download_url":"https://codeload.github.com/JDAI-CV/CoTNet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247369950,"owners_count":20927928,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["contextual-transformer","cotnet","image-classification","imagenet","instance-segmentation","mask-rcnn","mscoco","object-detection","semantic-segmentation","vision-transformer"],"created_at":"2024-11-07T14:18:58.409Z","updated_at":"2025-04-05T17:02:56.026Z","avatar_url":"https://github.com/JDAI-CV.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Introduction\nThis repository is the official implementation of [**Contextual Transformer Networks for Visual Recognition**](https://arxiv.org/pdf/2107.12292.pdf). \n\nCoT is a unified self-attention building block, and acts as an alternative to standard convolutions in ConvNet. As a result, it is feasible to replace convolutions with their CoT counterparts for strengthening vision backbones with contextualized self-attention.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/framework.jpg\" width=\"800\"/\u003e\n\u003c/p\u003e\n\n# 2021/3/25-2021/6/5: CVPR 2021 Open World Image Classification Challenge\n**Rank 1** in [Open World Image Classification Challenge](https://eval.ai/web/challenges/challenge-page/1041/leaderboard/2695) @ CVPR 2021. (Team name: VARMS)\n\n\n# Usage\nThe code is mainly based on [timm](https://github.com/rwightman/pytorch-image-models).\n\n### Requirement:\n* PyTorch 1.8.0+\n* Python3.7\n* CUDA 10.1+\n* [CuPy](https://cupy.dev/). \n\n### Clone the repository:\n```\ngit clone https://github.com/JDAI-CV/CoTNet.git\n```\n\n### Train \nFirst, download the [ImageNet](https://github.com/facebookarchive/fb.resnet.torch/blob/master/INSTALL.md) dataset. To train CoTNet-50 on ImageNet on a single node with 8 gpus for 350 epochs run:\n```\npython -m torch.distributed.launch --nproc_per_node=8 train.py --folder ./experiments/cot_experiments/CoTNet-50-350epoch\n```\nThe training scripts for CoTNet (e.g., CoTNet-50) can be found in the [cot_experiments](cot_experiments) folder.\n\n# Inference Time vs. Accuracy\nCoTNet models consistently obtain better top-1 accuracy with less inference time than other vision backbones across both default and advanced training setups. In a word, CoTNet models seek better inference time-accuracy trade-offs than existing vision backbones.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/inference_time.jpg\" width=\"800\"/\u003e\n\u003c/p\u003e\n\n## Results on ImageNet\n| name | resolution | #params | FLOPs | Top-1 Acc. | Top-5 Acc. | model |\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: | \n| CoTNet-50 | 224 | 22.2M | 3.3 | 81.3 | 95.6 | [GoogleDrive](https://drive.google.com/file/d/1SR5ezIu7LN943zHaUh4mC0ehxBVMqtfv/view?usp=sharing) / [Baidu](https://pan.baidu.com/s/1czr00SglgD8dNVK8jT1yLg) |\n| CoTNeXt-50 | 224 | 30.1M | 4.3 | 82.1 | 95.9 | [GoogleDrive](https://drive.google.com/file/d/1j6b5D3xcZ5L_bHiQV0WfqyOieqZLVOCv/view?usp=sharing) / [Baidu](https://pan.baidu.com/s/1CeV9IH_P5N9yuO-wOpdGNw) |\n| SE-CoTNetD-50 | 224 | 23.1M | 4.1 | 81.6 | 95.8 | [GoogleDrive](https://drive.google.com/file/d/1D2b5fr3lxpBpiFcCYBKngmmSgfVHt_56/view?usp=sharing) / [Baidu](https://pan.baidu.com/s/1s5Xg7AqzWuwFJUzOJDoo4Q) |\n| CoTNet-101 | 224 | 38.3M | 6.1 | 82.8 | 96.2 | [GoogleDrive](https://drive.google.com/file/d/11jExbPEg4Eq5PApisZyE5k-1CbRYnsQb/view?usp=sharing) / [Baidu](https://pan.baidu.com/s/1Olpta0AV7N4OoiC8PB4BnA) |\n| CoTNeXt-101 | 224 | 53.4M | 8.2 | 83.2 | 96.4 | [GoogleDrive](https://drive.google.com/file/d/1des5wgkBDUscQAs8IYOmKCKKUA46QLfJ/view?usp=sharing) / [Baidu](https://pan.baidu.com/s/1FM0QRZJee7uY7iKaEiUA-w) |\n| SE-CoTNetD-101 | 224 | 40.9M | 8.5 | 83.2 | 96.5 | [GoogleDrive](https://drive.google.com/file/d/1PWIltQYpYZiDrpfZORRQzGzQeXVd2b2f/view?usp=sharing) / [Baidu](https://pan.baidu.com/s/1WGFzuwio5lWJKiOOJTnjdg) |\n| SE-CoTNetD-152 | 224 | 55.8M | 17.0 | 84.0 | 97.0 | [GoogleDrive](https://drive.google.com/file/d/1MkMx0a8an3ikt6LZwClIOyabBnMfR91v/view?usp=sharing) / [Baidu](https://pan.baidu.com/s/14mNVsSf-6WI3mxLN2WinWw) |\n| SE-CoTNetD-152 | 320 | 55.8M | 26.5 | 84.6 | 97.1 | [GoogleDrive](https://drive.google.com/file/d/1E43T2jS37gR07p_FVWnjJNkMWeYMXgX9/view?usp=sharing) / [Baidu](https://pan.baidu.com/s/1kO5of8IPgL4HOudLeykS6w) |\n\nAccess code for Baidu is **cotn**\n\n## CoTNet on downstream tasks\nFor **Object Detection and Instance Segmentation**, please see [CoTNet for Object Detection and Instance Segmentation](https://github.com/JDAI-CV/CoTNet-ObjectDetection-InstanceSegmentation).\n\n## Citing Contextual Transformer Networks\n```\n@article{cotnet,\n  title={Contextual Transformer Networks for Visual Recognition},\n  author={Li, Yehao and Yao, Ting and Pan, Yingwei and Mei, Tao},\n  journal={arXiv preprint arXiv:2107.12292},\n  year={2021}\n}\n```\n\n## Acknowledgements\nThanks the contribution of [timm](https://github.com/rwightman/pytorch-image-models) and awesome PyTorch team.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjdai-cv%2Fcotnet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjdai-cv%2Fcotnet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjdai-cv%2Fcotnet/lists"}