{"id":19932260,"url":"https://github.com/amazon-science/regression-constraint-model-upgrade","last_synced_at":"2025-07-22T23:03:17.468Z","repository":{"id":139012838,"uuid":"569035547","full_name":"amazon-science/regression-constraint-model-upgrade","owner":"amazon-science","description":"Regression Constraint Model Upgrade","archived":false,"fork":false,"pushed_at":"2023-12-08T20:26:18.000Z","size":1809,"stargazers_count":9,"open_issues_count":0,"forks_count":1,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-05-03T11:35:53.789Z","etag":null,"topics":["computer-vision","deep-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amazon-science.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-11-22T00:00:18.000Z","updated_at":"2024-09-23T09:23:46.000Z","dependencies_parsed_at":null,"dependency_job_id":"56062129-2d6f-4991-95c0-afd16a47f2bd","html_url":"https://github.com/amazon-science/regression-constraint-model-upgrade","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/amazon-science/regression-constraint-model-upgrade","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fregression-constraint-model-upgrade","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fregression-constraint-model-upgrade/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fregression-constraint-model-upgrade/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fregression-constraint-model-upgrade/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amazon-science","download_url":"https://codeload.github.com/amazon-science/regression-constraint-model-upgrade/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fregression-constraint-model-upgrade/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266586905,"owners_count":23952205,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-22T02:00:09.085Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","deep-learning"],"created_at":"2024-11-12T23:09:31.441Z","updated_at":"2025-07-22T23:03:17.457Z","avatar_url":"https://github.com/amazon-science.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Regression Constraint Model Upgrade \n\n## Introduction\n\nThe inconsistency in the behaviour of different versions of an AI module may bringsignificant instability to the overall system.  While an improved module usuallyreduces the average number of errors, it always introduces new ones when compared to its predecessor.  This phenomenon is known as **regression**. \n\nThis repository holds the codebase for exploring regression problem in classification task. It provides training scripts for mainstream network architectures and analysis tools for evaluating regression extent among them. \n\n![Regression in model update](img/teaser.png)\n*Regression in model update: When updating an old classifier (red) to a new one (dashed blue line), we correct\nmistakes (top-right, white), but we also introduce errors that the old classifier did not make (negative flips, bottom-left, red). While on average the errors decrease (from 57% to 42% in this toy example), regression can wreak havoc with downstream processing, nullifying the benefit of the update.*\n## Installation\n\n- Install [Anaconda (with python3.7)](https://www.anaconda.com/products/individual) \n- Install the dependencies with `pip install -r ConstrainedUpgrade/requirements.txt`\n- Download the ImageNet dataset from http://www.image-net.org/\n    - Then, and move validation images to labeled subfolders, using [this shell script](https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh).\n    \n## Training\n\nTo train a model, run `ConstrainedUpgrade/main.py` with the desired model architecture and the path to the ImageNet dataset:\n\n```bash\npython ConstrainedUpgrade/train.py -a resnet18 [imagenet-folder with train and val folders]\n```\n\nThe default learning rate schedule starts at 0.1 and decays by a factor of 10 every 30 epochs. This is appropriate for ResNet and models with batch normalization, but too high for AlexNet and VGG. Use 0.01 as the initial learning rate for AlexNet or VGG:\n\n```bash\npython ConstrainedUpgrade/train.py -a alexnet --lr 0.01 [imagenet-folder with train and val folders]\n```\n\n## Multi-processing Distributed Data Parallel Training\n\nYou should always use the NCCL backend for multi-processing distributed training since it currently provides the best distributed training performance.\n\n### Single node, multiple GPUs:\n\n```bash\npython ConstrainedUpgrade/train.py -a resnet50 --dist-url 'tcp://127.0.0.1:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders]\n```\n\n### Multiple nodes:\n\nNode 0:\n```bash\npython ConstrainedUpgrade/train.py -a resnet50 --dist-url 'tcp://IP_OF_NODE0:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 2 --rank 0 [imagenet-folder with train and val folders]\n```\n\nNode 1:\n```bash\npython ConstrainedUpgrade/train.py -a resnet50 --dist-url 'tcp://IP_OF_NODE0:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 2 --rank 1 [imagenet-folder with train and val folders]\n```\n\n## Evaluation\nThe best model in training process will be evaluated automatically. The prediction outputs on validation set will be saved in the working folder `--work_dir`.\n\nYou can also evaluate model after training by:\n```bash\npython ConstrainedUpgrade/train.py --evaluate --resume $MODEL_PATH [other options]\n```\n. The results will be stored as `evaluate.result` in the working folder by default.\n\nWe also provide anylisis tools to calucate the statistic numbers such as accuracy and negative flip rate. Please refer to `ConstrainedUpgrade\\analysis`. There is an example code snippet:\n```python\nfrom analysis.utils import ModelAnalyzer\nold_model = ModelAnalyzer('{}/model_best.result'.format(work_dir_1))\nnew_model = ModelAnalyzer('{}/model_best.result'.format(work_dir_2))\nensemble_model = old_model + new_model\nprint('Accuracy: {}, NFR: {}'.format(new_model.Acc(), new_model.NFR(old_model)))\nprint('Ensemble Accuracy: {}'.format(ensemble_model.Acc()))\n```\n\n## Commands\n\n```\nusage: ConstrainedUpgrade/train.py [-h] [-d DIR] [-w DIR] [-a ARCH]\n                [--model_kwargs KEY=VAL [KEY=VAL ...]] [-j N] [--epochs N]\n                [--start-epoch N] [-b N] [--lr LR] [--lr_step LR_STEP]\n                [--momentum M] [--wd W] [-p N] [--resume PATH] [-e]\n                [--evaluate_aux EVALUATE_AUX]\n                [--evaluate_results_name EVALUATE_RESULTS_NAME] [--pretrained]\n                [--world-size WORLD_SIZE] [--rank RANK] [--dist-url DIST_URL]\n                [--dist-backend DIST_BACKEND] [--seed SEED] [--gpu GPU]\n                [--multiprocessing-distributed]\n                [--bct_loss_weight BCT_LOSS_WEIGHT]\n                [--bct_old_model BCT_OLD_MODEL]\n                [--bct_eval_alpha BCT_EVAL_ALPHA]\n                [--kd_model_arch KD_MODEL_ARCH]\n                [--kd_model_path KD_MODEL_PATH]\n                [--kd_loss_weight KD_LOSS_WEIGHT]\n                [--kd_temperature KD_TEMPERATURE]\n                [--cna_temperature CNA_TEMPERATURE]\n                [--kd_filter {all_pass,neg_flip,old_correct,new_incorrect}]\n                [--kd_loss_mode {normal,gt}] [--auto-scale] [--auto-shotdown]\n                [--save-init-checkpoint]\n\nPyTorch ImageNet Training\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -d DIR, --data DIR    path to dataset\n  -w DIR, --work_dir DIR\n                        path to working folder\n  -a ARCH, --arch ARCH  model architecture: AlexNet | AuxResNet | DenseNet |\n                        GoogLeNet | GoogLeNetOutputs | Inception3 |\n                        InceptionOutputs | MBResNet | MNASNet | MobileNetV2 |\n                        ResNet | ResNet_StoDepth_lineardecay | ShuffleNetV2 |\n                        SqueezeNet | VGG | alexnet | aux_resnet101 |\n                        aux_resnet152 | aux_resnet18 | aux_resnet34 |\n                        aux_resnet50 | densenet121 | densenet161 | densenet169\n                        | densenet201 | googlenet | inception_v3 |\n                        mb_resnet101 | mb_resnet152 | mb_resnet18 |\n                        mb_resnet34 | mb_resnet50 | mnasnet0_5 | mnasnet0_75 |\n                        mnasnet1_0 | mnasnet1_3 | mobilenet_v2 | resnet101 |\n                        resnet101_StoDepth_lineardecay | resnet152 |\n                        resnet152_StoDepth_lineardecay | resnet18 |\n                        resnet18_StoDepth_lineardecay | resnet34 |\n                        resnet34_StoDepth_lineardecay | resnet50 |\n                        resnet50_StoDepth_lineardecay | resnext101_32x8d |\n                        resnext50_32x4d | shufflenet_v2_x0_5 |\n                        shufflenet_v2_x1_0 | shufflenet_v2_x1_5 |\n                        shufflenet_v2_x2_0 | squeezenet1_0 | squeezenet1_1 |\n                        vgg11 | vgg11_bn | vgg13 | vgg13_bn | vgg16 | vgg16_bn\n                        | vgg19 | vgg19_bn | wide_resnet101_2 |\n                        wide_resnet50_2 (default: resnet18)\n  --model_kwargs KEY=VAL [KEY=VAL ...]\n                        additional hyper-parameters for model\n  -j N, --workers N     number of data loading workers (default: 4)\n  --epochs N            number of total epochs to run\n  --start-epoch N       manual epoch number (useful on restarts)\n  -b N, --batch-size N  mini-batch size (default: 256), this is the total\n                        batch size of all GPUs on the current node when using\n                        Data Parallel or Distributed Data Parallel\n  --lr LR, --learning-rate LR\n                        initial learning rate\n  --lr_step LR_STEP\n  --momentum M          momentum\n  --wd W, --weight-decay W\n                        weight decay (default: 1e-4)\n  -p N, --print-freq N  print frequency (default: 10)\n  --resume PATH         path to latest checkpoint (default: none)\n  -e, --evaluate        evaluate model on validation set\n  --evaluate_aux EVALUATE_AUX\n  --evaluate_results_name EVALUATE_RESULTS_NAME\n                        name for saving the evaluate results\n  --pretrained          use pre-trained model\n  --world-size WORLD_SIZE\n                        number of nodes for distributed training\n  --rank RANK           node rank for distributed training\n  --dist-url DIST_URL   url used to set up distributed training\n  --dist-backend DIST_BACKEND\n                        distributed backend\n  --seed SEED           seed for initializing training.\n  --gpu GPU             GPU id to use.\n  --multiprocessing-distributed\n                        Use multi-processing distributed training to launch N\n                        processes per node, which has N GPUs. This is the\n                        fastest way to use PyTorch for either single node or\n                        multi node data parallel training\n  --bct_loss_weight BCT_LOSS_WEIGHT\n                        loss weight for backward compatible representation\n                        learning\n  --bct_old_model BCT_OLD_MODEL\n                        source model for backward compatible representation\n                        learning\n  --bct_eval_alpha BCT_EVAL_ALPHA\n                        ensemble alpha in evaluation stage for bct model\n  --kd_model_arch KD_MODEL_ARCH\n                        model architecture for knowledge distillation source.\n  --kd_model_path KD_MODEL_PATH\n                        model path of knowledge distillation source.\n  --kd_loss_weight KD_LOSS_WEIGHT\n                        loss weight of KD loss\n  --kd_temperature KD_TEMPERATURE\n                        temperature of KD loss (typically 10 - 100)\n  --cna_temperature CNA_TEMPERATURE\n                        temperature of CNA loss (typicaally 0.01)\n  --kd_filter {all_pass,neg_flip,old_correct,new_incorrect}\n                        the subset of training set applied KD loss\n  --kd_loss_mode {normal,gt, l2, cna}\n                        the supervision in knowledge distillation\n  --auto-scale          auto scale learning rate and batch size by nodes\n  --auto-shotdown       auto shotdown after training. (Only active for 8-gpu\n                        servers)\n  --save-init-checkpoint\n                        save model after initialization before training\n```\n\n## Examples\n\nUse focal distillation in training with Pytorch DDP. It automatically uses all GPUs available on a node. \nThe KD loss tmperature is set to 100 and alpha=1, beta=5\n\n```bash\npython $BASEDIR/ConstrainedUpgrade/train.py \\\n--dist-url 'tcp://127.0.0.1:8000' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \\\n--data ~/resource/imagenet/ --work_dir $SCRIPTDIR --seed 4331 \\\n--auto-scale --workers 2 --batch-size 160 --lr 0.1 --lr_step 30 --epochs 90 \\\n-a resnet18 \\\n--kd_model_arch resnet18 \\\n--kd_model_path FOLDER_OF_RESNET18/model_best.pth.tar \\\n--kd_loss_weight 1 --kd_alpha 0.9 --kd_loss_mode kl --kd_temperature 100 --kd_filter old_correct --filter-base 1 --filter-scale 5 \\\n2\u003e\u00261 | tee -a $SCRIPTDIR/log.txt\n```\n\nUse the CNA in training with the Pytorh DDP. \nThe CNA loss is set to temperature 0.01. By default it uses the outside log sum formulation. Beta=0 for the focal distillation. \n\n```bash\npython $BASEDIR/ConstrainedUpgrade/train.py \\\n--dist-url 'tcp://127.0.0.1:8000' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \\\n--data ~/resource/imagenet/ --work_dir $SCRIPTDIR --seed 4331 \\\n--auto-scale --workers 2 --batch-size 160 --lr 0.1 --lr_step 30 --epochs 90 \\\n-a resnet18 \\\n--kd_model_arch resnet18 \\\n--kd_model_path FOLDER_OF_RESNET18/model_best.pth.tar \\\n--kd_loss_weight 1 --kd_alpha 0.9 --kd_loss_mode cna --cna_temperature 0.01 --kd_temperature 100 --kd_filter old_correct --filter-base 1 --filter-scale 0 \\\n2\u003e\u00261 | tee -a $SCRIPTDIR/log.txt\n```\n\nUse the LDI in training with the PyTorch DDP.\nBy default, we use LDI margin of 0.5, and p of 2. We can additional set \"--li_compute_topk 10\" to calculate the classes whose new logits are ranked highest. \n\n```bash\npython $BASEDIR/ConstrainedUpgrade/train.py \\\n--dist-url 'tcp://127.0.0.1:8000' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \\\n--data ~/resource/imagenet/ --work_dir $SCRIPTDIR --seed 1 \\\n--auto-scale --workers 4 --batch-size 128 --lr 0.1 --lr_step 30 --epochs 90 \\\n-a resnet50 \\\n--kd_model_arch resnet18 \\\n--kd_model_path FOLDER_OF_RESNET18/model_best.pth.tar \\\n--kd_loss_weight 1 --kd_alpha 0.5 --kd_loss_mode li --kd_filter all_pass \\\n--li_p 2 --li_margin 0.5 \\\n2\u003e\u00261 | tee -a $SCRIPTDIR/log.txt\n```\n\nUse Ensemble Distillation with LDI in training with the PyTorch DDP.\nCompared with single model+LDI, we set a smaller margin (--li_margin 0.2 or even 0) nad a larger KD loss weight (--kd_alpha 0.8).\n\n```bash\npython $BASEDIR/ConstrainedUpgrade/train.py \\\n--dist-url 'tcp://127.0.0.1:8000' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \\\n--data ~/resource/imagenet/ --work_dir $SCRIPTDIR --seed 1 \\\n--auto-scale --workers 4 --batch-size 80 --lr 0.1 --lr_step 30 --epochs 90 \\\n-a resnet50 --kd_model_arch resnet50 \\\n--kd_model_path \\\nFOLDER_OF_RESNET18@SEED=1/model_best.pth.tar FOLDER_OF_RESNET18@SEED=2/model_best.pth.tar \\\nFOLDER_OF_RESNET18@SEED=3/model_best.pth.tar FOLDER_OF_RESNET18@SEED=4/model_best.pth.tar \\\nFOLDER_OF_RESNET18@SEED=5/model_best.pth.tar FOLDER_OF_RESNET18@SEED=6/model_best.pth.tar \\\nFOLDER_OF_RESNET18@SEED=7/model_best.pth.tar FOLDER_OF_RESNET18@SEED=8/model_best.pth.tar \\\n--kd_loss_weight 1 --kd_alpha 0.8 --kd_loss_mode li --kd_filter all_pass \\\n--li_p 2 --li_margin 0.0 \\\n--li_exclude_gt \\\n--save-init-checkpoint \\\n--epochs_per_save 1 \\\n2\u003e\u00261 | tee -a $SCRIPTDIR/log.txt\n```\n\n## Citation\n\nIf this code helps your research or project, please cite\n\n```\n@inproceedings{yan2021positive,\n  title={Positive-congruent training: Towards regression-free model updates},\n  author={Yan, Sijie and Xiong, Yuanjun and Kundu, Kaustav and Yang, Shuo and Deng, Siqi and Wang, Meng and Xia, Wei and Soatto, Stefano},\n  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  pages={14299--14308},\n  year={2021}\n}\n\n@article{zhao2022elodi,\n  title={ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training},\n  author={Zhao, Yue and Shen, Yantao and Xiong, Yuanjun and Yang, Shuo and Xia, Wei and Tu, Zhuowen and Shiele, Bernt and Soatto, Stefano},\n  journal={arXiv preprint arXiv:2205.06265},\n  year={2022}\n}\n\n@article{zhu2022contrastive,\n  title={Contrastive Neighborhood Alignment},\n  author={Zhu, Pengkai and Cai, Zhaowei and Xiong, Yuanjun and Tu, Zhuowen and Goncalves, Luis and Mahadevan, Vijay and Soatto, Stefano},\n  journal={arXiv preprint arXiv:2201.01922},\n  year={2022}\n}\n```\n\n## Security\n\nSee [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.\n\n## License\n\nThis project is licensed under the Apache-2.0 License.\n\n## Contribution\nThis codebase is built by [Sijie Yan](https://github.com/yysijie) and [Yue Zhao](https://github.com/zhaoyue-zephyrus) during their internship at [AWS Rekognition](https://aws.amazon.com/rekognition/) team, mentored by [Yuanjun Xiong](https://github.com/yjxiong).\n```\nSijie Yan: yysijie@gmail.com\nYue Zhao: yzhao@cs.utexas.edu\nYuanjun Xiong (mentor): yjxiong at ie.cuhk.edu.hk\n```\n\n## Contact\nFor any question, feel free to contact：\n```\nYantao Shen: ytshen@link.cuhk.edu.hk\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Fregression-constraint-model-upgrade","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famazon-science%2Fregression-constraint-model-upgrade","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Fregression-constraint-model-upgrade/lists"}