{"id":13737014,"url":"https://github.com/bytedance/ibot","last_synced_at":"2025-04-04T14:05:12.940Z","repository":{"id":37377556,"uuid":"436574334","full_name":"bytedance/ibot","owner":"bytedance","description":"iBOT :robot:: Image BERT Pre-Training with Online Tokenizer (ICLR 2022)","archived":false,"fork":false,"pushed_at":"2022-04-14T12:32:13.000Z","size":8567,"stargazers_count":717,"open_issues_count":8,"forks_count":82,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-28T13:06:39.765Z","etag":null,"topics":["ibot","research","ssl"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2111.07832","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bytedance.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-12-09T10:28:34.000Z","updated_at":"2025-03-23T13:14:09.000Z","dependencies_parsed_at":"2022-08-08T20:15:11.360Z","dependency_job_id":null,"html_url":"https://github.com/bytedance/ibot","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2Fibot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2Fibot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2Fibot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2Fibot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bytedance","download_url":"https://codeload.github.com/bytedance/ibot/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247190188,"owners_count":20898697,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ibot","research","ssl"],"created_at":"2024-08-03T03:01:33.569Z","updated_at":"2025-04-04T14:05:12.912Z","avatar_url":"https://github.com/bytedance.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook","对象检测、分割","Fundamental MIM Methods"],"sub_categories":["网络服务_其他","MIM for Transformers"],"readme":"# Image BERT Pre-Training with iBOT \u003cimg width=\"32\" alt=\"iBOT Icon\" src=\".github/ibot.png\"\u003e\n\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ibot-image-bert-pre-training-with-online/unsupervised-image-classification-on-imagenet)](https://paperswithcode.com/sota/unsupervised-image-classification-on-imagenet?p=ibot-image-bert-pre-training-with-online) \\\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ibot-image-bert-pre-training-with-online/self-supervised-image-classification-on)](https://paperswithcode.com/sota/self-supervised-image-classification-on?p=ibot-image-bert-pre-training-with-online) \\\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ibot-image-bert-pre-training-with-online/self-supervised-image-classification-on-1)](https://paperswithcode.com/sota/self-supervised-image-classification-on-1?p=ibot-image-bert-pre-training-with-online)\n\nOfficial PyTorch implementation and pre-trained models for paper **iBOT: Image BERT Pre-Training with Online Tokenizer**. \n\n[[`arXiv`](https://arxiv.org/abs/2111.07832)] [[`Colab`](https://colab.research.google.com/github/bytedance/ibot/blob/main/notebooks/iBOT_demo.ipynb)] [[`BibTex`](https://github.com/bytedance/ibot#citing-ibot)]\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg width=\"90%\" alt=\"iBOT framework\" src=\".github/framework.png\"\u003e\n\u003c/div\u003e\n\niBOT is a novel self-supervised pre-training framework that performs masked image modeling with self-distillation. iBOT pre-trained model shows local semantic features, which helps the model transfer well to downstream tasks both at a global scale and a local scale. For example, iBOT achieves strong performance on COCO object detection (**51.2 box AP** and **44.2 mask AP**) and ADE20K semantic segmentation (**50.0 mIoU**) with vanilla ViT-B/16. iBOT can also extract semantic-meaningful local parts, like **dog's ear :dog:**.\n\n## News :tada:\n- January 2022 - The paper is accepted by ICLR 2022.\n- Update - ViT-L/16 with ImageNet-1K pre-training achieves **81.0%** in linear probing accuracy. ViT-L/16 with ImageNet-22K pre-training achieves **87.8%** in 512x fine-tuning accuracy.\n- Update - Random masking with a relatively larger prediction ratio performs slighly better than block-wise masking. For example, ViT-B/16 achieves an **84.1%** fine-tuning accuracy and a **51.5 box AP** in object detection. \n- December 2021 - Release the code and pre-trained [models](https://github.com/bytedance/ibot#pre-trained-models).\n- November 2021 - Release the pre-print on [arXiv](https://arxiv.org/abs/2111.07832).\n\n## Installation\n\nSee [installation structions](https://github.com/bytedance/ibot/blob/main/INSTALL.md) for details.\n\n## One-Line Command by Using `run.sh`\n\nWe provide `run.sh` with which you can complete the pre-training + fine-tuning experiment cycle in an one-line command.\n\n### Arguments\n\n- `TYPE` is named by the rule of dataset_task. For example, pre-training on ImageNet-1K has a TYPE of imagenet_pretrain and linear probing evalution on ImageNet-1K has a TYPE of imagenet_linear. Different types of task can be appended in one command.\n- `JOB_NAME` is the customized job name to distinguish from different groups of experiments.\n- `ARCH` is the architecture of the pre-trained models.\n- `KEY` chooses which pre-trained model to be evaluated and can be set as either teacher (generally better) or student for one model. \n- `GPUS` is GPUs needed for each node, and will be clamped by `MAX_GPUS` (default as 8).\n- Other additional arguments can directly appended after these required ones. For example, `--lr 0.001`.\n\nFor example, the following command will automatically evaluate the models on K-NN and linear probing benchmark after the pre-training with `student` and `teacher` model distributed across 2 nodes:\n```\nTOTAL_NODES=2 NODE_ID=0 ./run.sh imagenet_pretrain+imagenet_knn+imagenet_linear vit_small student,teacher 16 // the first node\nTOTAL_NODES=2 NODE_ID=1 ./run.sh imagenet_pretrain+imagenet_knn+imagenet_linear vit_small student,teacher 16 // the second node\n```\n\n## Training\n\nFor a glimpse at the full documentation of iBOT pre-training, please run:\n```\npython main_ibot.py --help\n```\n\n### iBOT Pre-Training with ViTs\n\nTo start the iBOT pre-training with Vision Transformer (ViT), simply run the following commands. `JOB_NAME` is a customized argument to distinguish different experiments and this will automatically save checkpoints into the seperate folders.\n```\n./run.sh imagenet_pretrain $JOB_NAME vit_{small,base,large} teacher {16,24,64}\n```\nThe exact arguments to reproduce the models presented in our paper can be found in the `args` column of the pre-trained [models](https://github.com/bytedance/ibot#pre-trained-models). We also provide the logs for pre-training to help reproducibility.\n\nFor example, run iBOT with ViT-S/16 network on two nodes with 8 GPUs for 800 epochs with the following command. The resulting checkpoint should reach 75.2% on k-NN accuracy, 77.9% on linear probing accuracy, and 82.3% on fine-tuning accuracy.\n\n```\n./run.sh imagenet_pretrain $JOB_NAME vit_small teacher 16 \\\n  --teacher_temp 0.07 \\\n  --warmup_teacher_temp_epochs 30 \\\n  --norm_last_layer false \\\n  --epochs 800 \\\n  --batch_size_per_gpu 64 \\\n  --shared_head true \\\n  --out_dim 8192 \\\n  --local_crops_number 10 \\\n  --global_crops_scale 0.25 1 \\\n  --local_crops_scale 0.05 0.25 \\\n  --pred_ratio 0 0.3 \\\n  --pred_ratio_var 0 0.2\n```\n\n### iBOT Pre-Training with Swins\nThis code also works for training iBOT on Swin Transformer (Swin). In the paper, we only conduct experiments on Swin-T with different window sizes:\n```\n./run.sh imagenet_pretrain $JOB_NAME swin_tiny teacher {16,40} \\\n  --patch_size 4 \\\n  --window_size {7,14}\n```\n\nFor example, run iBOT with Swin-T/14 network on five nodes with 8 GPUS for 300 epochs with the following command. The resulting checkpoint should reach 76.2% on k-NN accuracy, 79.3% on linear probing accuracy.\n\n```\n./run.sh imagenet_pretrain $JOB_NAME swin_tiny teacher 40 \\\n  --teacher_temp 0.07 \\\n  --warmup_teacher_temp_epochs 30 \\\n  --norm_last_layer false \\\n  --epochs 300 \\\n  --batch_size_per_gpu 26 \\\n  --shared_head true \\\n  --out_dim 8192 \\\n  --local_crops_number 10 \\\n  --global_crops_scale 0.25 1 \\\n  --local_crops_scale 0.05 0.25 \\\n  --pred_ratio 0 0.3 \\\n  --pred_ratio_var 0 0.2 \\\n  --pred_start_epoch 50 \\\n  --patch_size 4 \\\n  --window_size 14 \n```\n\n## Pre-Trained Models\n\nYou can choose to download only the weights of the pre-trained `backbone` used for downstream tasks, and the `full ckpt` which contains backbone and projection head weights for both student and teacher networks. For the `backbone`, `s` denotes that the student network is selected while `t` denotes that the teacher network is selected. `PS` denotes prediction shape.\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003cth\u003eArch.\u003c/th\u003e\n    \u003cth\u003ePar.\u003c/th\u003e\n    \u003cth\u003ePS\u003c/th\u003e\n    \u003cth\u003ek-NN\u003c/th\u003e\n    \u003cth\u003eLin.\u003c/th\u003e\n    \u003cth\u003eFin.\u003c/th\u003e\n    \u003cth colspan=\"6\"\u003edownload\u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eViT-S/16\u003c/td\u003e\n    \u003ctd\u003e21M\u003c/td\u003e\n    \u003ctd\u003eBlock\u003c/td\u003e\n    \u003ctd\u003e75.2%\u003c/td\u003e\n    \u003ctd\u003e77.9%\u003c/td\u003e\n    \u003ctd\u003e82.3%\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vits_16/checkpoint_teacher.pth\"\u003ebackbone (t)\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vits_16/checkpoint.pth\"\u003efull ckpt\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vits_16/args.txt\"\u003eargs\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vits_16/log.txt\"\u003elogs\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eSwin-T/7\u003c/td\u003e\n    \u003ctd\u003e28M\u003c/td\u003e\n    \u003ctd\u003eBlock\u003c/td\u003e\n    \u003ctd\u003e75.3%\u003c/td\u003e\n    \u003ctd\u003e78.6%\u003c/td\u003e\n    \u003ctd\u003e\\\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/swint_7/checkpoint_teacher.pth\"\u003ebackbone (t)\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/swint_7/checkpoint.pth\"\u003efull ckpt\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/swint_7/args.txt\"\u003eargs\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/swint_7/log.txt\"\u003elogs\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eSwin-T/14\u003c/td\u003e\n    \u003ctd\u003e28M\u003c/td\u003e\n    \u003ctd\u003eBlock\u003c/td\u003e\n    \u003ctd\u003e76.2%\u003c/td\u003e\n    \u003ctd\u003e79.3%\u003c/td\u003e\n    \u003ctd\u003e\\\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/swint_14/checkpoint_teacher.pth\"\u003ebackbone (t)\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/swint_14/checkpoint.pth\"\u003efull ckpt\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/swint_14/args.txt\"\u003eargs\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/swint_14/log.txt\"\u003elogs\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eViT-B/16\u003c/td\u003e\n    \u003ctd\u003e85M\u003c/td\u003e\n    \u003ctd\u003eBlock\u003c/td\u003e\n    \u003ctd\u003e77.1%\u003c/td\u003e\n    \u003ctd\u003e79.5%\u003c/td\u003e\n    \u003ctd\u003e84.0%\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitb_16/checkpoint_teacher.pth\"\u003ebackbone (t)\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitb_16/checkpoint.pth\"\u003efull ckpt\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitb_16/args.txt\"\u003eargs\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitb_16/log.txt\"\u003elogs\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eViT-B/16\u003c/td\u003e\n    \u003ctd\u003e85M\u003c/td\u003e\n    \u003ctd\u003eRand\u003c/td\u003e\n    \u003ctd\u003e77.3%\u003c/td\u003e\n    \u003ctd\u003e79.8%\u003c/td\u003e\n    \u003ctd\u003e84.1%\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitb_16_rand_mask/checkpoint_teacher.pth\"\u003ebackbone (t)\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitb_16_rand_mask/checkpoint.pth\"\u003efull ckpt\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitb_16_rand_mask/args.txt\"\u003eargs\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitb_16_rand_mask/log.txt\"\u003elogs\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eViT-L/16\u003c/td\u003e\n    \u003ctd\u003e307M\u003c/td\u003e\n    \u003ctd\u003eBlock\u003c/td\u003e\n    \u003ctd\u003e78.0%\u003c/td\u003e\n    \u003ctd\u003e81.0%\u003c/td\u003e\n    \u003ctd\u003e84.8%\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitl_16/checkpoint_teacher.pth\"\u003ebackbone (t)\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitl_16/checkpoint.pth\"\u003efull ckpt\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitl_16/args.txt\"\u003eargs\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitl_16/log.txt\"\u003elogs\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eViT-L/16\u003c/td\u003e\n    \u003ctd\u003e307M\u003c/td\u003e\n    \u003ctd\u003eRand\u003c/td\u003e\n    \u003ctd\u003e77.7%\u003c/td\u003e\n    \u003ctd\u003e81.3%\u003c/td\u003e\n    \u003ctd\u003e85.0%\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitl_16_rand_mask/checkpoint_teacher.pth\"\u003ebackbone (t)\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitl_16_rand_mask/checkpoint.pth\"\u003efull ckpt\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitl_16_rand_mask/args.txt\"\u003eargs\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitl_16_rand_mask/log.txt\"\u003elogs\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\nWe also provide the ViT-{B,L}/16 model pre-trained on ImageNet-22K dataset.\n\n \u003ctable\u003e\n  \u003ctr\u003e\n    \u003cth rowspan=\"2\"\u003eArch.\u003c/th\u003e\n    \u003cth rowspan=\"2\"\u003ePar.\u003c/th\u003e\n    \u003cth rowspan=\"2\"\u003ePS\u003c/th\u003e\n    \u003cth rowspan=\"2\"\u003ek-NN\u003c/th\u003e\n    \u003cth rowspan=\"2\"\u003eLin.\u003c/th\u003e\n    \u003cth colspan=\"3\"\u003eFin.\u003c/th\u003e\n    \u003cth rowspan=\"2\" colspan=\"6\"\u003edownload\u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n  \u003cth\u003e256\u003c/th\u003e\n  \u003cth\u003e384\u003c/th\u003e\n  \u003cth\u003e512\u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eViT-B/16\u003c/td\u003e\n    \u003ctd\u003e85M\u003c/td\u003e\n    \u003ctd\u003eBlock\u003c/td\u003e\n    \u003ctd\u003e71.1%\u003c/td\u003e\n    \u003ctd\u003e79.0%\u003c/td\u003e\n    \u003ctd\u003e84.4%\u003c/td\u003e\n    \u003ctd\u003e\\\u003c/td\u003e\n    \u003ctd\u003e\\\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitb_16_pt22k/checkpoint_student.pth\"\u003ebackbone (s)\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitb_16_pt22k/checkpoint.pth\"\u003efull ckpt\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitb_16_pt22k/args.txt\"\u003eargs\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitb_16_pt22k/log.txt\"\u003elogs\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eViT-L/16\u003c/td\u003e\n    \u003ctd\u003e307M\u003c/td\u003e\n    \u003ctd\u003eBlock\u003c/td\u003e\n    \u003ctd\u003e72.9%\u003c/td\u003e\n    \u003ctd\u003e82.3%\u003c/td\u003e\n    \u003ctd\u003e86.6%\u003c/td\u003e\n    \u003ctd\u003e87.5%\u003c/td\u003e\n    \u003ctd\u003e87.8%\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitl_16_pt22k/checkpoint_student.pth\"\u003ebackbone (s)\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitl_16_pt22k/checkpoint.pth\"\u003efull ckpt\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitl_16_pt22k/args.txt\"\u003eargs\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://lf3-nlp-opensource.bytetos.com/obj/nlp-opensource/archive/2022/ibot/vitl_16_pt22k/log.txt\"\u003elogs\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\nTo extract the backbone from the full checkpoint by yourself, please run the following command where `KEY` being either student or teacher.\n```\nWEIGHT_FILE=$OUTPUT_DIR/checkpoint_$KEY.pth\n\npython extract_backbone_weights.py \\\n  --checkpoint_key $KEY \\\n  $PRETRAINED \\\n  $WEIGHT_FILE \\\n```\n\n## Downstream Evaluation\n\nSee [Evaluating iBOT on Downstream Tasks](https://github.com/bytedance/ibot/blob/main/evaluation/README.md) for details.\n\n## Property Analysis\n\nSee [Analyzing iBOT's Properties](https://github.com/bytedance/ibot/blob/main/analysis/README.md) for robustness test and visualizing self-attention map:\n\u003cdiv align=\"center\"\u003e\n  \u003cimg width=\"100%\" alt=\"iBOT Global Pattern Layout\" src=\".github/attnmap.png\"\u003e\n\u003c/div\u003e\n\nor extracting sparse correspondence pairs between two images: \n\u003cdiv align=\"center\"\u003e\n  \u003cimg heigh=\"85%\" width=\"75%\" alt=\"iBOT Global Pattern Layout\" src=\".github/corresp.png\"\u003e\n\u003c/div\u003e\n\nWe also provide a [Colab page](https://colab.research.google.com/github/bytedance/ibot/blob/main/notebooks/iBOT_demo.ipynb) :bookmark_tabs: you can play around with iBOT pre-trained models.\n\n## Extracting Semantic Patterns\n\nWe extract top-k numbered local classes based on patch tokens with their corresponding patches and contexts by running the following command. We indentify very diverse behaviour like shared **low-level textures** and **high-level semantics**.\n```\npython3 -m torch.distributed.launch --nproc_per_node=8 \\\n    --master_port=${MASTER_PORT:-29500} \\\n    analysis/extract_pattern/extract_topk_cluster.py \\\n    --pretrained_path $PRETRAINED \\\n    --checkpoint {student,teacher} \\\n    --type patch \\\n    --topk 36 \\\n    --patch_window 5 \\\n    --show_pics 20 \\\n    --arch vit_small \\\n    --save_path memory_bank_patch.pth \\\n    --data_path data/imagenet/val\n```\n\u003cdiv align=\"center\"\u003e\n  \u003cimg width=\"100%\" alt=\"iBOT Local Part-Level Pattern Layout\" src=\".github/local_semantic_parts.png\"\u003e\n\u003c/div\u003e\n\nThe script also supports to extract the patern layout on the [CLS] token, which is actually doing clustering or unsupervised classification. This property is not induced by MIM objective since we also spot this feature on DINO.\n\n```\npython3 -m torch.distributed.launch --nproc_per_node=8 \\\n    --master_port=${MASTER_PORT:-29500} \\\n    analysis/extract_pattern/extract_topk_cluster.py \\\n    --pretrained_path $PRETRAINED \\\n    --checkpoint {student,teacher} \\\n    --type cls \\\n    --topk 36 \\\n    --show_pics 20 \\\n    --arch vit_small \\\n    --save_path memory_bank_cls.pth \\\n    --data_path data/imagenet/val\n```\n\u003cdiv align=\"center\"\u003e\n  \u003cimg width=\"75%\" alt=\"iBOT Global Pattern Layout\" src=\".github/global_semantics.png\"\u003e\n\u003c/div\u003e\n\n\n## Acknowledgement\n\nThis repository is built using the [DINO](https://github.com/facebookresearch/dino) repository and the [BEiT](https://github.com/microsoft/unilm/tree/master/beit) repository.\n\n## License\nThis repository is released under the Apache 2.0 license as found in the [LICENSE](LICENSE) file.\n\n## Citing iBOT\nIf you find this repository useful, please consider giving a star :star: and citation:\n```\n@article{zhou2021ibot,\n  title={iBOT: Image BERT Pre-Training with Online Tokenizer},\n  author={Zhou, Jinghao and Wei, Chen and Wang, Huiyu and Shen, Wei and Xie, Cihang and Yuille, Alan and Kong, Tao},\n  journal={International Conference on Learning Representations (ICLR)},\n  year={2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytedance%2Fibot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbytedance%2Fibot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytedance%2Fibot/lists"}