{"id":18322474,"url":"https://github.com/tencentarc/conmim","last_synced_at":"2025-04-05T23:31:02.719Z","repository":{"id":65712804,"uuid":"542928222","full_name":"TencentARC/ConMIM","owner":"TencentARC","description":"Official codes for ConMIM (ICLR 2023)","archived":false,"fork":false,"pushed_at":"2023-02-08T18:27:22.000Z","size":3672,"stargazers_count":58,"open_issues_count":4,"forks_count":3,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-21T13:23:05.032Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TencentARC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-29T05:04:35.000Z","updated_at":"2024-11-24T02:46:39.000Z","dependencies_parsed_at":"2023-02-18T07:30:44.348Z","dependency_job_id":null,"html_url":"https://github.com/TencentARC/ConMIM","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FConMIM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FConMIM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FConMIM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FConMIM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TencentARC","download_url":"https://codeload.github.com/TencentARC/ConMIM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247415783,"owners_count":20935383,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-05T18:24:47.244Z","updated_at":"2025-04-05T23:30:59.701Z","avatar_url":"https://github.com/TencentARC.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [Masked Image Modeling with Denoising Contrast](https://arxiv.org/abs/2205.09616)\n\nOfficial PyTorch implementation and pretrained models of \"Masked Image Modeling with Denoising Contrast\" in International Conference on Learning Representations (ICLR) 2023.\n\n---\n\n![Overview](./imgs/framework.png)\n\n\n## Model Zoo\n+ We provide the models **fine-tuned** on ImageNet1k. \n\n|   Arch   | Epochs | Resolution | Acc@1 | Fine-tuned model |\n|:--------:|:------:|:----------:|:-----:| :---: |\n| ViT-S/16 |  300   |  224x224   | 82.0  | [model](https://drive.google.com/file/d/1nI9IohDZ1KpBm4sUgLFyVoy6lHAu4LfF/view?usp=share_link) |\n| ViT-B/16 |  800   |  224x224   | 83.7  | [model](https://drive.google.com/file/d/18MWukX2CZp_Eu6RiDVTIBzSt_0K996ri/view?usp=share_link) |\n| ViT-L/16 |  800   |  224x224   | 85.2  | [model](https://drive.google.com/file/d/1adbm7ewm8uAEcdDklupGrrLAdoYeUDlN/view?usp=share_link) |\n| ViT-L/16 |  1600  |  224x224   | 85.5  | [model](https://drive.google.com/file/d/1NXCA_oZ0mUiDbR3fFO8V8DixzZ4oRa-z/view?usp=share_link) |\n\n## Results on ImageNet1K\n![Result](./imgs/results.png)\n\n## Visualization\nVisualize the self-attention map between [CLS] token and local tokens of the pre-trained ViT-B/16 model on ImageNet-1K, where (a) indicates ConMIM pretraining and (b) indicates the vanilla instance-level contrastive pre-training. Self-attention maps out of 12 attention heads are averaged. It can be observed that ConMIM-pretrained models are much more locally discriminative and aware of the visual context.\n![Vis](./imgs/vis.png)\n\n## Setup\nClone the github repo and install the required packages.\n```\ngit clone https://github.com/TencentARC/ConMIM.git\npip install -r requirements.txt\n```\nFor mixed-precision training, please install [apex](https://github.com/NVIDIA/apex)\n\n```\ngit clone https://github.com/NVIDIA/apex\ncd apex\npip install -v --disable-pip-version-check --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" ./\n```\n## Data Preparation\n+ We use standard ImageNet-1K dataset (http://image-net.org/) for pre-training\n+ Read from train and val list (download in this [link](https://drive.google.com/drive/folders/1Kmu3VHw1Ssqh6jwrWaUL1ihVx9KakKZv?usp=sharing)) to boost the speed of reading images from massive small files:\n```\n/dataset\n└── imagenet1k\n    ├── train\n    ├── val\n    ├── train_map.txt\n    └── val_map.txt\n```\n+ `train_map.txt`,`val_map.txt` : which store the relative path in the corresponding zip file and ground truth label, and can be downloaded in this [link](https://drive.google.com/drive/folders/1Kmu3VHw1Ssqh6jwrWaUL1ihVx9KakKZv?usp=sharing).\n## Pre-training on ImageNet-1K\n+ We pre-train the ViT-L/16 model with 32 NVIDIA A100 GPUs on ImageNet-1K as follows:\n\n```\nOUTPUT_DIR=\"./output/conmim_pretrained\"\nDATA_PATH=\"./dataset/imagenet1k\"\nmkdir -p $OUTPUT_DIR\n\npython -m torch.distributed.launch $@ run_conmim_pretraining.py \\\n        --data_path ${DATA_PATH} --output_dir ${OUTPUT_DIR} --mask_ratio 0.75 \\\n        --model conmim_large_patch16_224 \\\n        --batch_size 64 --lr 7.5e-4 --warmup_epochs 10 --epochs 1600 \\\n        --clip_grad 1.0 --drop_path 0 --layer_scale_init_value 1e-5 \\\n        --mask_type 'random_mps32' \\\n        --imagenet_default_mean_and_std \\\n        --save_ckpt_freq 20\n```\n\n## Fine-tuning on ImageNet-1K Classification\n+ We finetune the pre-trained ViT-Base model with 8 NVIDIA A100/V100 GPUs as follows: \n```\nCKP=\"./output/conmim_pretrained/checkpoint_copy-799.pth\"\nOUTPUT_DIR=\"./output/conmim_finetuned/\"\nDATA_PATH=\"/dataset/imagenet1k/\"\nmkdir -p $OUTPUT_DIR\n\npython -m torch.distributed.launch --nproc_per_node=8 run_class_finetuning.py \\\n    --model beit_base_patch16_224 --data_path ${DATA_PATH}\\\n    --finetune ${CKP} \\\n    --output_dir ${OUTPUT_DIR} --batch_size 128 --lr 4e-3 --update_freq 1 \\\n    --warmup_epochs 20 --epochs 100 --layer_decay 0.65 --drop_path 0.1 \\\n    --weight_decay 0.05 --mixup 0.8 --cutmix 1.0 --nb_classes 1000 --enable_deepspeed \\\n    --imagenet_default_mean_and_std\n```\n## Fine-tuning on ADE20K Semantic Segmentation\nWe follow the [BEiT](https://github.com/microsoft/unilm/tree/master/beit) to complete our experiments\n\n## Fine-tuning on COCO Detection and Segmentation\nWe follow the [MIMDet](https://github.com/hustvl/MIMDet) to complete our experiments\n\n## Acknowledgement\n\nThis repository is built using the [BEiT](https://github.com/microsoft/unilm/tree/master/beit) repository, the [mc-BEiT](https://github.com/lixiaotong97/mc-BEiT) repository, the [timm](https://github.com/rwightman/pytorch-image-models) library, the [DeiT](https://github.com/facebookresearch/deit) repository, and the [MIMDet](https://github.com/hustvl/MIMDet) repository.\n\n## Citation\nIf you find our work is useful for your research, please kindly cite our paper.\n```\n@article{yi2022masked,\n  title={Masked image modeling with denoising contrast},\n  author={Yi, Kun and Ge, Yixiao and Li, Xiaotong and Yang, Shusheng and Li, Dian and Wu, Jianping and Shan, Ying and Qie, Xiaohu},\n  journal={International Conference on Learning Representations},\n  year={2023}\n}\n```\n## Contact\nIf you have any questions, you can contact me from the email: kunyi@tencent.com or laneyikun@foxmail.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencentarc%2Fconmim","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftencentarc%2Fconmim","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencentarc%2Fconmim/lists"}