{"id":13737589,"url":"https://github.com/Han-Jia/UNICORN-MAML","last_synced_at":"2025-05-08T14:33:11.515Z","repository":{"id":58936655,"uuid":"373512187","full_name":"Han-Jia/UNICORN-MAML","owner":"Han-Jia","description":"PyTorch implementation of \"How to Train Your MAML to Excel in Few-Shot Classification\"","archived":false,"fork":false,"pushed_at":"2024-06-29T17:16:56.000Z","size":4165,"stargazers_count":38,"open_issues_count":1,"forks_count":7,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-11-15T06:32:07.500Z","etag":null,"topics":["few-shot-learning","iclr2022","maml","meta-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Han-Jia.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-03T13:12:46.000Z","updated_at":"2024-08-20T09:03:07.000Z","dependencies_parsed_at":"2024-01-06T19:53:28.603Z","dependency_job_id":"34f7fc51-f888-459a-9a9d-4590f38237ee","html_url":"https://github.com/Han-Jia/UNICORN-MAML","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Han-Jia%2FUNICORN-MAML","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Han-Jia%2FUNICORN-MAML/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Han-Jia%2FUNICORN-MAML/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Han-Jia%2FUNICORN-MAML/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Han-Jia","download_url":"https://codeload.github.com/Han-Jia/UNICORN-MAML/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253085774,"owners_count":21851699,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["few-shot-learning","iclr2022","maml","meta-learning"],"created_at":"2024-08-03T03:01:54.261Z","updated_at":"2025-05-08T14:33:06.494Z","avatar_url":"https://github.com/Han-Jia.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# How to Train Your MAML to Excel in Few-Shot Classification\n\nThe code repository for \"[How to Train Your MAML to Excel in Few-Shot Classification](https://arxiv.org/abs/2106.16245)\" (Accepted by ICLR 2022) in PyTorch. \n\nIf you use any content of this repo for your work, please cite the following bib entry:\n\n    @inproceedings{ye2021UNICORN,\n      author    = {Han-Jia Ye and\n                   Wei-Lun Chao},\n\t  title     = {How to Train Your {MAML} to Excel in Few-Shot Classification},\n\t  booktitle = {10th International Conference on Learning Representations ({ICLR})},\n\t  year      = {2021}\n\t}\n\n\n## Main idea of UNICORN-MAML\n\nModel-agnostic meta-learning (MAML) is arguably the most popular meta-learning algorithm nowadays, given its flexibility to incorporate various model architectures and to be applied to different problems. Nevertheless, its performance on few-shot classification is far behind many recent algorithms dedicated to the problem. In this paper, we point out several key facets of how to train MAML to excel in few-shot classification. First, we find that a large number of gradient steps are needed for the inner loop update, which contradicts the common usage of MAML for few-shot classification. Second, we find that MAML is sensitive to the permutation of class assignments in meta-testing: for a few-shot task of N classes, there are exponentially many ways to assign the learned initialization of the N-way classifier to the N classes, leading to an unavoidably huge variance. Third, we investigate several ways for permutation invariance and find that learning a shared classifier initialization for all the classes performs the best. On benchmark datasets such as *Mini*ImageNet and *Tiered*ImageNet, our approach, which we name UNICORN-MAML, performs on a par with or even outperforms state-of-the-art algorithms, **while keeping the simplicity of MAML without adding any extra sub-networks**.\n\n\u003cimg src='imgs/mean-MAML.png' width='1040' height='280'\u003e\n\n## Standard Few-shot Learning Results\n\nExperimental results on few-shot learning datasets with ResNet-12 backbone (Same as the [MetaOptNet](https://github.com/kjunelee/MetaOptNet)). We report average results with 10,000 randomly sampled few-shot learning episodes for stablized evaluation.\n\n**MiniImageNet Dataset**\n|  Setups  | 1-Shot 5-Way | 5-Shot 5-Way |\n|:--------:|:------------:|:------------:|\n| ProtoMAML |     62.62    |     79.24    |\n|  MetaOptNet  |     62.64    |     78.63    |\n| DeepEMD |     65.91    |     82.41    |\n|    FEAT   |     **66.78**  |     82.05    |\n|    MAML   |     64.42  |     83.44    |\n|   UNICORN-MAML   |   [65.17](https://drive.google.com/file/d/15496NKRBNrOpyyx3tQ_wD9fB2tx4BeT1/view?usp=sharing)  |   **[84.30](https://drive.google.com/file/d/1gjjQYOAyzoePKL4tvoag-bHPkGi6CmEQ/view?usp=sharing)**  |\n\n**TieredImageNet Dataset**\n\n|  Setups  | 1-Shot 5-Way | 5-Shot 5-Way |\n|:--------:|:------------:|:------------:|\n| ProtoMAML |     67.10   |     81.18    |\n|  MetaOptNet  |     65.99    |     81.56   |\n| DeepEMD |     **71.52**    |     86.03    |\n|   FEAT   |   70.80  |   84.79  |\n|    MAML   |     65.72    |     84.37    |\n| UNICORN-MAML   |     69.24    |     **86.06**    |\n\n## Prerequisites\n\nThe following packages are required to run the scripts:\n\n- [PyTorch-1.6 and torchvision](https://pytorch.org)\n\n- Package [tensorboardX](https://github.com/lanpa/tensorboardX)\n\n- Dataset: please download the dataset and put images into the folder data/[name of the dataset, miniimagenet or cub]/images\n\n- Pre-trained weights: The pre-trained weights (used for initialization) could be downloaded at [here](https://drive.google.com/drive/folders/1WiNF-qKm8yBH4KcC1cdW3gpEwrxTQ0qN?usp=sharing).\n\n## Dataset\n\n### MiniImageNet Dataset\n\nThe MiniImageNet dataset is a subset of the ImageNet that includes a total number of 100 classes and 600 examples per class. We follow the [previous setup](https://github.com/twitter/meta-learning-lstm), and use 64 classes as *base* categories, 16 and 20 as two sets of *novel* categories for model validation and evaluation, respectively.\n\n### TieredImageNet Dataset\n\n[TieredImageNet](https://github.com/renmengye/few-shot-ssl-public) is a large-scale dataset  with more categories, which contains 351, 97, and 160 categoriesfor model training, validation, and evaluation, respectively.\n\n## Code Structures\nTo reproduce our experiments with UNICORN-MAML, please use **train_fsl.py**. There are four parts in the code.\n - `model`: It contains the main files of the code, including the few-shot learning trainer, the dataloader, the network architectures, and baseline and comparison models.\n - `data`: Images and splits for the data sets.\n - `saves`: The pre-trained weights of different networks.\n - `checkpoints`: To save the trained models.\n\n## Model Training and Evaluation\nPlease use **train_fsl.py** and follow the instructions below. The file will automatically evaluate the model on the meta-test set with 10,000 tasks after given epochs.\n\n## Arguments\nThe train_fsl.py takes the following command line options (details are in the `model/utils.py`):\n\n**Task Related Arguments**\n- `dataset`: Option for the dataset (`MiniImageNet`, `TieredImageNet`, or `CUB`), default to `MiniImageNet`\n\n- `way`: The number of classes in a few-shot task during meta-training, default to `5`\n\n- `eval_way`: The number of classes in a few-shot task during meta-test, default to `5`\n\n- `shot`: Number of instances in each class in a few-shot task during meta-training, default to `1`\n\n- `eval_shot`: Number of instances in each class in a few-shot task during meta-test, default to `1`\n\n- `query`: Number of instances in each class to evaluate the performance during meta-training, default to `15`\n\n- `eval_query`: Number of instances in each class to evaluate the performance during meta-test, default to `15`\n\n**Optimization Related Arguments**\n- `max_epoch`: The maximum number of training epochs, default to `200`\n\n- `episodes_per_epoch`: The number of tasks sampled in each epoch, default to `100`\n\n- `num_eval_episodes`: The number of tasks sampled from the meta-val set to evaluate the performance of the model (note that we fix sampling 10,000 tasks from the meta-test set during final evaluation), default to `200`\n\n- `lr`: Learning rate for the model, default to `0.001` with pre-trained weights\n\n- `lr_mul`: This is specially designed for set-to-set functions like FEAT. The learning rate for the top layer will be multiplied by this value (usually with faster learning rate). Default to `10`\n\n- `lr_scheduler`: The scheduler to set the learning rate (`step`, `multistep`, or `cosine`), default to `step`\n\n- `step_size`: The step scheduler to decrease the learning rate. Set it to a single value if choose the `step` scheduler and provide multiple values when choosing the `multistep` scheduler. Default to `20`\n\n- `gamma`: Learning rate ratio for `step` or `multistep` scheduler, default to `0.1`\n\n- `fix_BN`: Set the encoder to the evaluation mode during the meta-training. This parameter is useful when meta-learning with the WRN. Default to `False`\n\n- `mom`: The momentum value for the SGD optimizer, default to `0.9`\n\n- `weight_decay`: The weight_decay value for SGD optimizer, default to `0.0005`\n\n**Model Related Arguments**\n- `model_class`: The model to use during meta-learning. We provide implementations for `MAML` and our`MAMLUnicorn`. Default to `MAML`\n\n- `backbone_class`: Types of the encoder, i.e., ResNet-12 (`Res12`), default to `ConvNet`\n\n- `temperature`: Temperature over the logits, we #divide# logits with this value. It is useful when meta-learning with pre-trained weights. Default to `0.5`\n\n**Other Arguments** \n\n- `gpu`: The index of GPU to use. Please provide multiple indexes if choose `multi_gpu`. Default to `0`\n\n- `log_interval`: How often to log the meta-training information, default to every `50` tasks\n\n- `eval_interval`: How often to validate the model over the meta-val set, default to every `1` epoch\n\n- `save_dir`: The path to save the learned models, default to `./checkpoints`\n\nRunning the command without arguments will train the models with the default hyper-parameter values. Loss changes will be recorded as a tensorboard file.\n\n## Training scripts for UNICORN-MAML\n\nFor example, to train the 1-shot/5-shot 5-way MAML/UNICORN-MAML model with ResNet-12 backbone on MiniImageNet:\n\n    $ python train_fsl.py --max_epoch 100 --way 5 --eval_way 5 --lr_scheduler step --model_class MAML --lr_mul 10 --backbone_class Res12 --dataset MiniImageNet --gpu 0 --query 15 --step_size 20 --gamma 0.1 --para_init './saves/initialization/miniimagenet/Res12-pre.pth' --lr 0.001 --shot 1 --eval_shot 1  --temperature 0.5 --gd_lr 0.05 --inner_iters 15\n\t$ python train_fsl.py --max_epoch 100 --way 5 --eval_way 5 --lr_scheduler step --model_class MAML --lr_mul 10 --backbone_class Res12 --dataset MiniImageNet --gpu 0 --query 15 --step_size 20 --gamma 0.1 --para_init './saves/initialization/miniimagenet/Res12-pre.pth' --lr 0.001 --shot 5 --eval_shot 5  --temperature 0.5 --gd_lr 0.1 --inner_iters 20 \n\t$ python train_fsl.py --max_epoch 100 --way 5 --eval_way 5 --lr_scheduler step --model_class MAMLUnicorn --lr_mul 10 --backbone_class Res12 --dataset MiniImageNet --gpu 0 --query 15 --step_size 20 --gamma 0.1 --para_init './saves/initialization/miniimagenet/Res12-pre.pth' --lr 0.001 --shot 1 --eval_shot 1  --temperature 0.5 --gd_lr 0.1 --inner_iters 5 \n\t$ python train_fsl.py --max_epoch 100 --way 5 --eval_way 5 --lr_scheduler step --model_class MAMLUnicorn --lr_mul 10 --backbone_class Res12 --dataset MiniImageNet --gpu 0 --query 15 --step_size 20 --gamma 0.1 --para_init './saves/initialization/miniimagenet/Res12-pre.pth' --lr 0.001 --shot 5 --eval_shot 5  --temperature 0.5 --gd_lr 0.1 --inner_iters 20 \n\nto train the 1-shot/5-shot 5-way MAML/UNICORN-MAML model with ResNet-12 backbone on TieredImageNet:\n\n    $ python train_fsl.py --max_epoch 100 --way 5 --eval_way 5 --lr_scheduler step --model_class MAML --lr_mul 10 --backbone_class Res12 --dataset TieredImageNet --gpu 0 --query 15 --step_size 20 --gamma 0.1 --para_init './saves/initialization/tieredimagenet/Res12-pre.pth' --lr 0.001 --shot 1 --eval_shot 1  --temperature 0.5 --gd_lr 0.01 --inner_iters 20\n\t$ python train_fsl.py --max_epoch 100 --way 5 --eval_way 5 --lr_scheduler step --model_class MAML --lr_mul 10 --backbone_class Res12 --dataset TieredImageNet --gpu 0 --query 15 --step_size 20 --gamma 0.1 --para_init './saves/initialization/tieredimagenet/Res12-pre.pth' --lr 0.001 --shot 1 --eval_shot 5  --temperature 0.5 --gd_lr 0.05 --inner_iters 15\n\t$ python train_fsl.py --max_epoch 100 --way 5 --eval_way 5 --lr_scheduler step --model_class MAMLUnicorn --lr_mul 10 --backbone_class Res12 --dataset TieredImageNet --gpu 0 --query 15 --step_size 20 --gamma 0.1 --para_init './saves/initialization/tieredimagenet/Res12-pre.pth' --lr 0.001 --shot 5 --eval_shot 1  --temperature 0.5 --gd_lr 0.02 --inner_iters 10\n\t$ python train_fsl.py --max_epoch 100 --way 5 --eval_way 5 --lr_scheduler step --model_class MAMLUnicorn --lr_mul 10 --backbone_class Res12 --dataset TieredImageNet --gpu 0 --query 15 --step_size 20 --gamma 0.1 --para_init './saves/initialization/tieredimagenet/Res12-pre.pth' --lr 0.001 --shot 1 --eval_shot 5  --temperature 0.5 --gd_lr 0.05 --inner_iters 20 \n\n## Verifying the permutation variance of a learned MAML model\n\nWe can evaluate a learned MAML model and check whether the permutation will introduce large variance. For example, 1-shot/5-shot 5-way model with ResNet-12 backbone on MiniImageNet:\n\n\t$ python eval_maml_permutation.py --shot_list 1 --model_path './MAML-1-shot.pth' --gpu 0 --gd_lr 0.05 --inner_iters 15  --model_class MAML --dataset MiniImageNet\n\t$ python eval_maml_permutation.py --shot_list 5 --model_path './MAML-5-shot.pth' --gpu 0 --gd_lr 0.1 --inner_iters 20  --model_class MAML --dataset MiniImageNet\n\n\n## Acknowledgment\nWe thank the following repos providing helpful components/functions in our work.\n\n- [FEAT](https://github.com/Sha-Lab/FEAT)\n\n- [AVIATOR](https://github.com/Han-Jia/AVIATOR)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHan-Jia%2FUNICORN-MAML","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FHan-Jia%2FUNICORN-MAML","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHan-Jia%2FUNICORN-MAML/lists"}