{"id":13738450,"url":"https://github.com/google/mentornet","last_synced_at":"2025-05-08T16:34:13.384Z","repository":{"id":39854528,"uuid":"144607074","full_name":"google/mentornet","owner":"google","description":"Code for MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks","archived":true,"fork":false,"pushed_at":"2023-03-25T00:03:02.000Z","size":5656,"stargazers_count":320,"open_issues_count":5,"forks_count":63,"subscribers_count":10,"default_branch":"master","last_synced_at":"2024-08-04T03:12:31.766Z","etag":null,"topics":["deep-learning","google","label","noisy","noisy-data"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2018-08-13T16:40:50.000Z","updated_at":"2024-05-27T11:54:55.000Z","dependencies_parsed_at":"2024-04-16T22:03:55.667Z","dependency_job_id":"5037348c-1c24-4f12-aa25-ccea17150d06","html_url":"https://github.com/google/mentornet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fmentornet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fmentornet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fmentornet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fmentornet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google","download_url":"https://codeload.github.com/google/mentornet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224746740,"owners_count":17363103,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","google","label","noisy","noisy-data"],"created_at":"2024-08-03T03:02:22.851Z","updated_at":"2024-11-15T07:31:09.349Z","avatar_url":"https://github.com/google.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks\n\nThis is the code for the paper:\n\n**\u003ca href=\"https://arxiv.org/abs/1712.05055\"\u003eMentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels\n\u003c/a\u003e**\n\u003cbr\u003e\nLu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, Li Fei-Fei\n\u003cbr\u003e\nPresented at [ICML 2018](https://icml.cc/Conferences/2018)\n\n\n*Please note that this is not an officially supported Google product.*\n\nIf you find this code useful in your research then please cite\n\n```\n@inproceedings{jiang2018mentornet,\n  title={MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels},\n  author={Jiang, Lu and Zhou, Zhengyuan and Leung, Thomas and Li, Li-Jia and Fei-Fei, Li},\n  booktitle={ICML},\n  year={2018}\n}\n```\n\n\n## Introduction\n\nWe are interested in training a deep network using curriculum learning (Bengio et al., 2009), i.e. learning examples with focus.\nEach curriculum is implemented as a network (called **MentorNet**).\n\n- During training, MentorNet supervises the training of the base network (called **StudentNet**).\n- At the test time, StudentNet makes prediction alone without MentorNet.\n\n\n![Training Overview](images/overview.png)\n\n\n## Setups\n\nAll code was developed and tested on Nvidia V100/P100 (16GB) the following environment.\n\n- Ubuntu 18.04\n- Python 2.7.15\n- TensorFlow 1.8.0\n- numpy 1.13.3\n- imageio 2.3.0\n\nDownload [Cloud SDK](https://cloud.google.com/sdk/) to get data and models. Next we need to download the dataset and pre-trained MentorNet models. Put them into the same directory as the `code` directory. \n\n```bash\ngsutil -m cp -r gs://mentornet_project/data .\ngsutil -m cp -r gs://mentornet_project/mentornet_models .\n```\n\nAlternatively, you may download the zip files: [data](https://storage.cloud.google.com/mentornet_project/data.zip) and [models](https://storage.cloud.google.com/mentornet_project/mentornet_models.zip).\n\n\n\n## Running MentorNet on CIFAR\n\n\n```bash\nexport PYTHONPATH=\"$PYTHONPATH:$PWD/code/\"\n\npython code/cifar_train_mentornet.py \\\n  --dataset_name=cifar10   \\\n  --trained_mentornet_dir=mentornet_models/models/mentornet_pd1_g_1/mentornet_pd \\\n  --loss_p_precentile=0.75  \\\n  --nofixed_epoch_after_burn_in  \\\n  --burn_in_epoch=0  \\\n  --example_dropout_rates=\"0.5,17,0.05,83\" \\\n  --data_dir=data/cifar10/0.2 \\\n  --train_log_dir=cifar_models/cifar10/resnet/0.2/mentornet_pd1_g_1/train \\\n  --studentnet=resnet101 \\\n  --max_number_of_steps=39000\n```\n\nA full list of commands can be found in this file.\nThe training script has a number of command-line flags that you can use to configure the model architecture, hyperparameters, and input / output settings:\n\n- `--trained_mentornet_dir`: Directory where to find the trained MentorNet model, created by `mentornet_learning/train.py`.\n- `--loss_p_percentile`: p-percentile used to compute the loss moving average. Default is `0.7`.\n- `--burn_in_epoch`: Number of first epochs to perform burn-in. In the burn-in period, every sample has a fixed 1.0 weight. Default is `0`.\n- `--fixed_epoch_after_burn_in`: Whether to use the fixed epoch as the MentorNet input feature after the burn-in period. Set True for MentorNet DD. Default is `False`.\n- `--loss_moving_average_decay`: Decay factor used in moving average. Default is `0.5`.\n- `--example_dropout_rates`: Comma-separated list indicating the example drop-out rate for the total of 100 epochs. The format is [dropout rate, epoch_num]+, the piecewise drop-out rate from boundaries and values. The sum of epoch_num is 100. Drop-out means the probability of setting sample weights to zeros proposed (Liang et al., 2016). Default is `0.5, 17, 0.05, 78, 1.0, 5`.\n\nTo evaluate a model, run the evaluation job in parallel with the training job (on a different GPU).\n\n```bash\npython cifar/cifar_eval.py \\\n --dataset_name=cifar10 \\\n --data_dir=cifar/data/cifar10/val/ \\\n --checkpoint_dir=cifar_models/cifar10/resnet/0.2/mentornet_pd1_g_1/train \\\n --eval_dir=cifar_models/cifar10/resnet/0.2/mentornet_pd1_g_1//eval_val \\\n --studentnet=resnet101 \\\n --device_id=1\n```\n\nA complete list of commands of running experiments can be found at `commands/train_studentnet_resnet.sh` and `commands/train_studentnet_inception.sh`.\n\n## MentorNet Framework\n\nMentorNet is a **general** framework for curriculum learning, where various curriculums can be learned by the same MentorNet structure of different parameters.\n\nIt is **flexible** as we can switch curriculums by attaching different MentorNets without modifying the pipeline.\n\nWe train a few MentorNets listed below. We can think of a MentorNet as a hyper-parameter and will be tuned for different problems.\n\n\n| Curriculum                            |                                           Visualization                                              |                   Intuition                      |       Model Name      |\n| :-------------------------------------| :----------------------------------------------------------------------------------------------------| :------------------------------------------------| :-------------------- |\n| No curriculum         |  ![image](images/no_curriculum.gif)     |  Assign uniform weight to every sample uniform.                        |`baseline_mentornet` |\n| Self-paced \u003cbr/\u003e(Kuma et al. 2010)         |  ![image](images/self_paced.gif)    |  Favor samples of smaller loss.                        |`self_paced_mentornet` |\n| SPCL linear \u003cbr/\u003e(Jiang et al. 2015)       |  ![image](images/spcl_linear.gif)     |  Discount the weight by loss linearly.                 |`spcl_linear_mentornet`|\n| Hard example mining \u003cbr/\u003e(Felzenszwalb et al., 2008) | ![image](images/hard_example_mining.gif) | Favor samples of greater loss.         | `hard_example_mining_mentornet` |\n| Focal loss \u003cbr/\u003e(Lin et al., 2017)         |  ![image](images/focal_loss.gif)     |  Increase the weight by loss by the exponential CDF.        | `focal_loss_mentornet`|\n| Predefined Mixture         |  ![image](images/mentornet_pd.gif) |  Mixture of SPL and SPCL changing by epoch.       |   `mentornet_pd`       |\n| MentorNet Data-driven     |  ![image](images/mentornet_dd.gif)  | Learned on a small subset of the CIFAR data.  | `mentornet_dd`         |\n\n\n\nNote there are many more curriculums can be trained by MentorNet, for example,\nprediction variance (Chang et al., 2017), implicit regularizer (Fan et al. 2017), self-paced with diversity (Jiang et al. 2014),\nsample re-weighting (Dehghani et al., 2018, Ren et al., 2018), etc.\n\n\n\n### Performance\n\n*The numbers are slightly different from the ones reported in the paper due to\nthe re-implementation on the third party library.*\n\n\n**CIFAR-10 ResNet**\n\n| noise_fraction| baseline | self_paced | focal_loss | mentornet_pd| mentornet_dd |\n| ----------: | -------: | ---------: | ---------: | ---------: | -----------: |\n| 0.2         | 0.796    | 0.822      | 0.797      | 0.910       | **0.914**        |\n| 0.4         | 0.568    | 0.802      | 0.634      | 0.776      | **0.887**        |\n| 0.8         | 0.238    | 0.297      | 0.25       | 0.283      | **0.463**        |\n\n**CIFAR-100 ResNet**\n\n| noise_fraction| baseline | self_paced | focal_loss | mentornet_pd| mentornet_dd |\n| ----------: | -------: | ---------: | ---------: | ---------: | -----------: |\n| 0.2         | 0.624    | 0.652      | 0.613      | **0.733**      | 0.726        |\n| 0.4         | 0.448    | 0.509      | 0.467      | 0.567      | **0.675**        |\n| 0.8         | 0.084    | 0.089      | 0.079      | 0.193      | **0.301**        |\n\n**CIFAR-10 Inception**\n\n| noise_fraction| baseline | self_paced | focal_loss | mentornet_pd | mentornet_dd|\n| ----------: | -------: | ---------: | ---------: | -----------: | ---------: |\n| 0.2         | 0.775    | 0.784      | 0.747      | 0.798        | **0.800**        |\n| 0.4         | 0.72     | 0.733      | 0.695      | 0.731        | **0.763**      |\n| 0.8         | 0.29     | 0.272      | 0.309      | 0.312        | **0.461**      |\n\n**CIFAR-100 Inception**\n\n| noise_fraction| baseline | self_paced | focal_loss | mentornet_pd | mentornet_dd|\n| ----------: | -------: | ---------: | ---------: | -----------: | ---------: |\n| 0.2         | 0.42     | 0.408      | 0.391      | 0.451        | **0.466**      |\n| 0.4         | 0.346    | 0.32       | 0.313      | 0.386        | **0.411**      |\n| 0.8         | 0.108    | 0.091      | 0.107      | 0.125        | **0.203**      |\n\n\n### Algorithm\n\nWe propose an algorithm to optimize the StudentNet model parameter w jointly with a\n\ngiven MentorNet. Unlike the alternating minimization, it minimizes w (StudentNet parameter) and v (sample weight) **stochastically over mini-batches**.\n\nThe curriculum can change during training, and MentorNet is updated a few times in the algorithm.\n\n\n![Algorithm](images/alg.gif)\n\nTo learn new curriculums (Step 6), see [this page](TRAINING.md).\n\n*We found specific MentorNet architectures do not matter that much.*\n\n\n## References\n- Bengio, Yoshua, et al. \"Curriculum learning\". In ICML, 2009.\n- Kumar M. Pawan, Packer Benjamin, and Koller Daphne \"Self-paced learning for latent variable models\". In NIPS, 2010.\n- Jiang, Lu et al. \"Self-paced Learning with Diversity\", In NIPS 2014\n- Jiang, Lu, et al. \"Self-Paced Curriculum Learning.\" In AAAI. 2015.\n- Liang, Junwei et al. Learning to Detect Concepts from Webly-Labeled Video Data, In IJCAI  2016.\n- Lin, Tsung-Yi, et al. \"Focal loss for dense object detection.\" In ICCV. 2017.\n- Fan, Yanbo, et al. \"Self-Paced Learning: an Implicit Regularization Perspective.\" In AAAI 2017.\n- Felzenszwalb, Pedro, et al. \"A discriminatively trained, multiscale, deformable part model.\" In CVPR 2008.\n- Dehghani, Mostafa, et al. \"Fidelity-Weighted Learning.\" In ICLR 2018.\n- Ren, Mengye, et al. \"Learning to reweight examples for robust deep learning.\" In ICML 2018.\n- Fan, Yang, et al. \"Learning to Teach.\" In ICLR 2018.\n- Chang, Haw-Shiuan, et al. \"Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples.\" In NIPS 2017.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle%2Fmentornet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle%2Fmentornet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle%2Fmentornet/lists"}