{"id":13738076,"url":"https://github.com/CVMI-Lab/KDEP","last_synced_at":"2025-05-08T15:32:31.586Z","repository":{"id":37925707,"uuid":"466690322","full_name":"CVMI-Lab/KDEP","owner":"CVMI-Lab","description":"(CVPR2022) Official PyTorch Implementation of KDEP. Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability","archived":false,"fork":false,"pushed_at":"2022-07-21T09:07:13.000Z","size":8151,"stargazers_count":61,"open_issues_count":1,"forks_count":8,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-12-16T13:24:03.957Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CVMI-Lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-03-06T09:24:02.000Z","updated_at":"2024-08-26T16:55:50.000Z","dependencies_parsed_at":"2022-07-14T02:10:35.633Z","dependency_job_id":null,"html_url":"https://github.com/CVMI-Lab/KDEP","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CVMI-Lab%2FKDEP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CVMI-Lab%2FKDEP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CVMI-Lab%2FKDEP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CVMI-Lab%2FKDEP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CVMI-Lab","download_url":"https://codeload.github.com/CVMI-Lab/KDEP/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253096429,"owners_count":21853601,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T03:02:10.546Z","updated_at":"2025-05-08T15:32:26.568Z","avatar_url":"https://github.com/CVMI-Lab.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Knowledge Distillation as Efficient Pretraining: Faster Convergence, Higher Data-efficiency, and Better Transferability\n\nThis repository contains the code and models necessary to replicate the results of our paper:\n\n```bibtex\n@inproceedings{he2022knowledge,\n  title={Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability\n},\n  author={He, Ruifei and Sun, Shuyang, and Yang, Jihan, and Bai, Song and Qi, Xiaojuan},\n  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  year={2022}\n}\n```\n\n## Abstract\n\nLarge-scale pre-training has been proven to be crucial for various computer vision tasks.\nHowever, with the increase of pre-training data amount, model architecture amount, and the private/inaccessible data, it is not very efficient or possible to pre-train all the model architectures on large-scale datasets. In this work, we investigate an alternative strategy for pre-training, namely Knowledge Distillation as Efficient Pre-training (**KDEP**), aiming to efficiently transfer the learned feature representation from existing pre-trained models to new student models for future downstream tasks. We observe that existing Knowledge Distillation (KD) methods are unsuitable towards pre-training since they normally distill the logits that are going to be discarded when transferred to downstream tasks. To resolve this problem, we propose a feature-based KD method with non-parametric feature dimension aligning. Notably, our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring **10x** less data and **5x** less pre-training time.\n\n![pics-cropped-1](pics-cropped-1.png)\n\n## Getting started\n1.  Clone our repo: `git clone https://github.com/CVMI-Lab/KDEP.git`\n\n2.  Install dependencies:\n    ```sh\n    conda create -n KDEP python=3.7\n    conda activate KDEP\n    pip install -r requirements.txt\n    ```\n\n## Data preparation\n\n* ImageNet-1K ([Download](https://www.image-net.org/)) \n* Caltech256 ([Download](http://www.vision.caltech.edu/Image_Datasets/Caltech256/))\n* Cifar100 **(Automatically downloaded when you run the code)**\n* DTD ([Download]( https://www.robots.ox.ac.uk/~vgg/data/dtd/))\n* CUB-200 ([Download](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html))\n* Cityscapes ([Download](https://www.cityscapes-dataset.com/))\n* VOC (segmentation and detection, [Download](http://host.robots.ox.ac.uk/pascal/VOC/))\n* ADE20K ([Download](http://groups.csail.mit.edu/vision/datasets/ADE20K/))\n* COCO ([Download](https://cocodataset.org/))\n\nFor image classification datasets (except for Caltech256), the folder structure should follow ImageNet:\n\n```\ndata root\n├─ train/\n  ├── n01440764\n  │   ├── n01440764_10026.JPEG\n  │   ├── n01440764_10027.JPEG\n  │   ├── ......\n  ├── ......\n├─ val/\n  ├── n01440764\n  │   ├── ILSVRC2012_val_00000293.JPEG\n  │   ├── ILSVRC2012_val_00002138.JPEG\n  │   ├── ......\n  ├── ......\n```\n\nFor semantic segmentation datasets, please refer to [PyTorch Semantic Segmentation](https://github.com/hszhao/semseg).\n\nFor object detection datasets, please refer to [Detectron2](https://github.com/facebookresearch/Detectron2).\n\n## Pre-training with KDEP\n\n1. Download teacher models ([Download](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/ruifeihe_connect_hku_hk/Es4xT6QzDi9JtkFmmqXIWF4B-MAxG7z6oIDqtrZ88m5MpA?e=Jh19d1)), and put them under `pretrained-models/` .\n\n2. You can use a provided python file `scripts/make-imgnet-subset.py` to create the 10% of ImageNet-1K data.\n\n3. Update the path of the dataset for KDEP (10% or 100% of ImageNet-1K) in `src/utils/constants.py`.\n\n4. Prepare the SVD weights for teacher models. You can download the weights we provide ([Download](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/ruifeihe_connect_hku_hk/Es4xT6QzDi9JtkFmmqXIWF4B-MAxG7z6oIDqtrZ88m5MpA?e=Jh19d1)) or generate using our provided script `scripts/gen_svd_weights.sh` .\n\n   ```sh\n   sh scripts/gen_svd_weights.sh imgnet_128k ex_gen_svd 0\n   ```\n\n5. Scripts of pre-training with KDEP are in `scripts/`. For example, you can use teacher-student pair of Microsoft ResNet50 -\u003e ResNet18 with `scripts/KDEP_MS-R50_R18.sh` by:\n\n   ```sh\n   sh scripts/KDEP_MS-R50_R18.sh imgnet_128k exp_name 90 30 5e-4 0,1,2,3\n   ### imgnet_128k or imgnet_full to select 10% or 100% ImageNet-1K data\n   ### 90 is #epoch, 30 is step-lr\n   ### 5e-4 is weight decay\n   ### 0,1,2,3 is GPU id\n   ```\n\n   You can run KDEP with different data amount and training schedules by changing the data name (imgnet_128k or imgnet_full), #epoch and step-lr, and weight decay. \n\n   Note that we do not generate the svd weights for 100% ImageNet-1K data, but directly use the svd weights generated from 10% data.\n\n## Transfer learning experiments\n\n### Image classification\n\n1. We use four image classification tasks: CIFAR100, DTD, Caltech256, CUB-200. \n\n2. Scripts (`scripts/TL_img-cls_R18.sh` and `scripts/TL_img-cls_mnv2.sh` ) are provided for running all  four tasks twice for a distilled student (R18/mnv2). \n\n   ```sh\n   sh scripts/TL_img-cls_R18.sh exp_name\n   # note the exp_name here should be identical to that of the distilled student\n   ```\n\n### Semantic segmentation\n\n1. We use three semantic segmentation tasks: Cityscapes, VOC2012, ADE20K.\n\n2. Transform the checkpoint into segmentation code format by `src/transform_ckpt_custom2seg.py` \n\n   ```sh\n   cd src\n   python3 transform_ckpt_custom2seg.py exp_name\n   # note the exp_name here should be identical to that of the distilled student\n   ```\n\n   Move the transformed checkpoint to `semseg/initmodel/`.\n\n3. Scripts (`semseg/tool/TL_seg_R18.sh` and `semseg/tool/TL_seg_mnv2.sh` ) are provided for running all three tasks twice for a distilled student (R18/mnv2). \n\n   ```sh\n   cd semseg\n   sh tool/TL_seg_R18.sh ckpt_name\n   # note the ckpt_name should be what you put into the semseg/initmodel/ in step1.\n   ```\n\n### Object detection\n\n1. We use two object detection tasks: COCO and VOC.\n\n2. Transform the checkpoint into Detectron2 format by `src/transform_ckpt_custom2det.py` \n\n   ```sh\n   cd src\n   python3 transform_ckpt_custom2det.py exp_name R18\n   # note the exp_name here should be identical to that of the distilled student\n   # R18 could be changed to mnv2\n   ```\n\n   Move the transformed checkpoint to `detectron2/ckpts/` .\n\n3. Install Detectron2, and export dataset path\n\n   ```sh\n   python3 -m pip install -e detectron2\n   export DETECTRON2_DATASETS='path/to/datasets'\n   ```\n\n4. Scripts (`detectron2/tool/TL_det_R18.sh` and `detectron2/tool/TL_det_mnv2.sh` ) are provided for running all two tasks twice for a distilled student (R18/mnv2). \n\n   ```sh\n   cd detectron2/tool\n   sh TL_det_R18.sh ckpt_name\n   # note the ckpt_name should be what you put into the semseg/initmodel/ in step1.\n   ```\n\n## Distilled models of KDEP\nWe provide some distilled models of KDEP here. \n1. ([Download](https://connecthkuhk-my.sharepoint.com/:u:/g/personal/ruifeihe_connect_hku_hk/ES1ZvPYyRRlAoMdtwwYh_d0B7Sfl1ghBMs09mLEHVY5HqA?e=P7dpdJ)) ResNet18, KDEP(SVD+PTS) from MS-R50 teacher on 100% ImageNet-1K data for 90 epochs.\n1. ([Download](https://connecthkuhk-my.sharepoint.com/:u:/g/personal/ruifeihe_connect_hku_hk/EeteR1gfIJZAqIypbG0LgJwBJ5sRT-GAvWU8M2WOkUlsHA?e=PiN74K)) MobileNet-V2, KDEP(SVD+PTS) from MS-R50 teacher on 100% ImageNet-1K data for 90 epochs.\n\n\n## Acknowledgement\n\nOur code is mainly based on  [robust-models-transfer](https://github.com/microsoft/robust-models-transfer), we also thank the open source code from [PyTorch Semantic Segmentation](https://github.com/hszhao/semseg) and [Detectron2](https://github.com/facebookresearch/Detectron2).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCVMI-Lab%2FKDEP","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FCVMI-Lab%2FKDEP","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCVMI-Lab%2FKDEP/lists"}