https://github.com/salesforce/must
PyTorch code for MUST
https://github.com/salesforce/must
clip masked-image-modeling self-training unsupervised-learning zero-shot-classification zero-shot-learning
Last synced: 6 months ago
JSON representation
PyTorch code for MUST
- Host: GitHub
- URL: https://github.com/salesforce/must
- Owner: salesforce
- License: bsd-3-clause
- Created: 2022-05-24T00:05:36.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-03-08T07:25:18.000Z (over 2 years ago)
- Last Synced: 2024-08-04T03:11:07.548Z (about 1 year ago)
- Topics: clip, masked-image-modeling, self-training, unsupervised-learning, zero-shot-classification, zero-shot-learning
- Language: Python
- Homepage:
- Size: 1.33 MB
- Stars: 103
- Watchers: 6
- Forks: 12
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: CODEOWNERS
- Security: SECURITY.md
Awesome Lists containing this project
README
# Masked Unsupervised Self-training for Zero-shot Image Classification
This is the PyTorch code of the [MUST paper](https://arxiv.org/abs/2206.02967). The repository supports finetuning a CLIP model on unlabeled images from a target domain.### Requirements
* pytorch 1.10.0
* timm 0.4.12
* tensorboardX
* ftfy### Dataset Setup
Dataset paths are stored in [dataset_catalog.json](https://github.com/salesforce/MUST/blob/main/dataset_catalog.json), which need to be modified to local paths. The imagenet dataset follows the standard folder structure. For other datasets, please refer to the scrips from [VISSL](https://github.com/facebookresearch/vissl/tree/main/extra_scripts/datasets) to download and prepare. CLIP's labels and prompt templates are stored in [classes.json](https://github.com/salesforce/MUST/blob/main/classes.json) and [templates.json](https://github.com/salesforce/MUST/blob/main/templates.json).### Training
Run the following code on 16 A100 GPUs:python -m torch.distributed.run --nproc_per_node=16 train.py --dataset [name_of_dataset] --clip_model ViT-B/16### Results
ViT-B/16:
Method | ImageNet | SUN397 | Food101 | GTSRB | DTD | UCF101
--- | :---: | :---: | :---: | :---: | :---: | :---:
CLIP | 68.3 | 64.4 | 88.7 | 43.4 | 44.7 | 68.8
MUST | 77.7 | 71.8 | 92.7 | 65.5 | 54.1 | 81.1ViT-L/14:
Method | ImageNet | SUN397 | Food101 | GTSRB | DTD | UCF101
--- | :---: | :---: | :---: | :---: | :---: | :---:
CLIP | 75.5 | 67.4 | 92.9 | 50.6 | 55.4 | 77.0
MUST | 82.1 | 74.6 | 95.3 | 68.7 | 62.6 | 85.7### Citation
@inproceedings{li2022masked,
title={Masked Unsupervised Self-training for Label-Free Image Classification},
author={Junnan Li and Silvio Savarese and Steven C. H. Hoi},
year={2023},
booktitle={ICLR},
}