{"id":13709068,"url":"https://github.com/Sense-X/MixMIM","last_synced_at":"2025-05-06T15:32:12.701Z","repository":{"id":42399165,"uuid":"493976030","full_name":"Sense-X/MixMIM","owner":"Sense-X","description":"MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning","archived":false,"fork":false,"pushed_at":"2023-07-02T11:28:41.000Z","size":665,"stargazers_count":128,"open_issues_count":21,"forks_count":6,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-11-13T19:39:34.289Z","etag":null,"topics":["masked-image-modeling","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Sense-X.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-05-19T07:59:57.000Z","updated_at":"2024-09-11T13:28:44.000Z","dependencies_parsed_at":"2024-11-13T19:33:24.574Z","dependency_job_id":"b2ac4bdd-cd60-4943-a275-d6528a0642cb","html_url":"https://github.com/Sense-X/MixMIM","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sense-X%2FMixMIM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sense-X%2FMixMIM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sense-X%2FMixMIM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sense-X%2FMixMIM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Sense-X","download_url":"https://codeload.github.com/Sense-X/MixMIM/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252713014,"owners_count":21792410,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["masked-image-modeling","transformer"],"created_at":"2024-08-02T23:00:35.624Z","updated_at":"2025-05-06T15:32:07.693Z","avatar_url":"https://github.com/Sense-X.png","language":"Python","funding_links":[],"categories":["Self-Supervised Learning","Fundamental MIM Methods"],"sub_categories":["**Masked Image Modeling**","MIM for Transformers and CNNs"],"readme":"## Pytorch implementation of [MixMAE](https://arxiv.org/abs/2205.13137) (CVPR 2023)\n\n![tenser](figures/mixmae.png)\n\nThis repo is the offcial implementation of the paper [MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers](https://arxiv.org/abs/2205.13137)\n\n```\n@article{MixMAE,\n  author  = {Jihao Liu, Xin Huang, Jinliang Zheng, Yu Liu, Hongsheng Li},\n  journal = {arXiv:2205.13137},\n  title   = {MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers},\n  year    = {2022},\n}\n```\n\n\n### Availble pretrained models\n|Models | Params (M) | FLOPs (G) | Pretrain Epochs | Top-1 Acc. | Pretrain_ckpt | Finetune_ckpt |\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n| Swin-B/W14 | 88 | 16.3 | 600 | 85.1 | [base_600ep](https://drive.google.com/file/d/1pZYmTv08xK_kOe2kk6ahuvgJVkHm-ZIa/view?usp=sharing) | [base_600ep_ft](https://drive.google.com/file/d/1zkOyh8jnFW7iYG3sOfp6LLG5wu4VbiXb/view?usp=sharing)| \n| Swin-B/W16-384x384 | 89.6 | 52.6 | 600 | 86.3 | [base_600ep](https://drive.google.com/file/d/1pZYmTv08xK_kOe2kk6ahuvgJVkHm-ZIa/view?usp=sharing) | [base_600ep_ft_384x384](https://drive.google.com/file/d/1MIng19USn5T770YZ6mFfqTgNCCz_kEGL/view?usp=sharing)| \n| Swin-L/W14 | 197 | 35.9 | 600 | 85.9 | [large_600ep](https://drive.google.com/file/d/1dM8Lu2nVEukxPwn7PLmDmRAYwQV59ttx/view?usp=sharing) | [large_600ep_ft](https://drive.google.com/file/d/1b1BxGAewK1ICxxCEwF24YEDSjlQ9Ts9n/view?usp=sharing) |\n| Swin-L/W16-384x384 | 199 | 112 | 600 | 86.9 | [large_600ep](https://drive.google.com/file/d/1dM8Lu2nVEukxPwn7PLmDmRAYwQV59ttx/view?usp=sharing) | [large_600ep_ft_384x384](https://drive.google.com/file/d/1_IfqoQvAe2Z2jC7HBKi3umKD6c8qOu0P/view?usp=sharing)| \n\n\n### Training and evaluation\n\nWe use [Slurm](https://slurm.schedmd.com/documentation.html) for multi-node distributed pretraining and finetuning. \n\n#### Pretrain\n```\nsh exp/base_600ep/pretrain.sh partition 16 /path/to/imagenet\n```\n- Training with 16 GPUs on your partition.\n- Batch size is 128 * 16 = 2048.\n- Default setting is to train for 600 epochs with mask ratio of 0.5.\n\n#### Finetune\n```\nsh exp/base_600ep/finetune.sh partition 8 /path/to/imagenet\n```\n- Training with 8 GPUs on your partition.\n- Batch size is 128 * 8 = 1024.\n- Default setting is to finetune for 100 epochs.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSense-X%2FMixMIM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSense-X%2FMixMIM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSense-X%2FMixMIM/lists"}