{"id":20663776,"url":"https://github.com/vita-group/optimizeramalgamation","last_synced_at":"2025-03-10T09:09:23.133Z","repository":{"id":107046893,"uuid":"386694194","full_name":"VITA-Group/OptimizerAmalgamation","owner":"VITA-Group","description":"[ICLR 2022] \"Optimizer Amalgamation\" by Tianshu Huang, Tianlong Chen, Sijia Liu, Shiyu Chang, Lisa Amini, Zhangyang Wang","archived":false,"fork":false,"pushed_at":"2022-01-25T16:14:58.000Z","size":2052,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-01-17T12:35:35.142Z","etag":null,"topics":["generalization","knowledge-amalgamation","learning-to-optimize","optimization","stability"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/VITA-Group.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-07-16T16:06:04.000Z","updated_at":"2023-02-09T20:57:03.000Z","dependencies_parsed_at":"2023-04-22T01:37:17.364Z","dependency_job_id":null,"html_url":"https://github.com/VITA-Group/OptimizerAmalgamation","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VITA-Group%2FOptimizerAmalgamation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VITA-Group%2FOptimizerAmalgamation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VITA-Group%2FOptimizerAmalgamation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VITA-Group%2FOptimizerAmalgamation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/VITA-Group","download_url":"https://codeload.github.com/VITA-Group/OptimizerAmalgamation/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242821496,"owners_count":20190654,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["generalization","knowledge-amalgamation","learning-to-optimize","optimization","stability"],"created_at":"2024-11-16T19:19:48.294Z","updated_at":"2025-03-10T09:09:23.124Z","avatar_url":"https://github.com/VITA-Group.png","language":"Python","readme":"# Optimizer Amalgamation\n\nCode for [ICLR 2022] [\"Optimizer Amalgamation\"](https://openreview.net/pdf?id=VqzXzA9hjaX) by Tianshu Huang, Tianlong Chen, Sijia Liu, Shiyu Chang, Lisa Amini, Zhangyang Wang\n\n## Setup and Basic Usage\n\n### Basic Setup\n\n1. Clone repository and submodules\n```\ngit clone --recursive https://github.com/VITA-Group/OptimizerDistillation\n```\n\n2. Check dependencies:\n\n| Library | Known Working | Known Not Working |\n| - | - | - |\n| tensorflow | 2.3.0, 2.4.1 | \u003c= 2.2 |\n| tensorflow_datasets | 3.1.0, 4.2.0 | n/a |\n| pandas | 0.24.1, 1.2.4 | n/a |\n| numpy | 1.18.5, 1.19.2 | \u003e=1.20 |\n| scipy | 1.4.1, 1.6.2 | n/a |\n\nSee [here](https://github.com/thetianshuhuang/l2o) for more dependency information.\n\n### Load pre-trained optimizer\n\nPre-trained weights can be found in the ``releases\" tab on github.\nAfter downloading and unzipping, the optimizers can be loaded as an L2O framework extending tf.keras.optimizers.Optimizer:\n```python\nimport tensorflow as tf\nimport l2o\n\n# Folder is sorted as ```pre-trained/{distillation type}/{replicate #}\nopt = l2o.load(\"pre-trained/choice-large/7\")\n# The following is True\nisinstance(opt, tf.keras.optimizers.Optimizer)\n```\n\nPre-trained weights for Mean distillation (small pool), Min-max distillation (small pool), Choice distillation (small pool), and Choice distillation (large pool) are included.\nEach folder contains 8 replicates with varying performance.\n\n### Included scripts\n\nSee the docstring for each script for a full list of arguments (debug, other testing args).\n\nCommon (technical) arguments:\n\n| Arg | Type | Description |\n| - | - | - |\n| ```gpus``` | ```int[]``` | Comma separated list of GPUs (1) |\n| ```cpu``` | ```bool``` | Whether to run on CPU instead of GPU |\n\n(1) GPUs are specified by GPU index (i.e. as returned by ```gpustat```). If no ```--gpus``` are provided, all GPUs on the system are used. If no GPUs are installed, CPU will be used.\n\n```evaluate.py```:\n\n| Arg | Type | Description |\n| - | - | - |\n| ```problem``` | ```str``` | Problem to evaluate on. Can pass a comma separated list. |\n| ```directory``` | ```str``` | Target directory to load from. Can pass a comma separated list. |\n| ```repeat``` | ```int``` | Number of times to run evaluation. Default: 10 |\n\n```train.py```: \n\n| Arg | Type | Description |\n| - | - | - |\n| ```strategy``` | ```str``` | Training strategy to use. |\n| ```policy``` | ```str``` | Policy to train. |\n| ```presets``` | ```str[]``` | Comma separated list of presets to apply. | \n| (all other args) | - | Passed as overrides to strategy/policy building. |\n\n```baseline.py```:\n\n| Arg | Type | Description |\n| - | - | - |\n| ```problem``` | ```str``` | Problem to evaluate on. Can pass a comma separated list. |\n| ```optimizer``` | ```str``` | Name of optimizer to use. |\n\n### Experiment folder structure\n\nExperiment file path:\n```\nresults/{policy_name}/{experiment_name}/{replicate_number}\n```\n\nExperiment file structure:\n```\n[root]\n  \u003e [checkpoint]\n      \u003e stage_{stage_0.0.0}.index\n      \u003e stage_{stage_0.0.0}.data-00000-of-00001\n      \u003e stage_{stage_0.1.0}.index\n      \u003e ....\n  \u003e [eval]\n      \u003e [{eval_problem_1}]\n          \u003e stage_{x.x.x}.npz\n      \u003e ....\n  \u003e [log]\n      \u003e stage_{stage_0.0.0}.npz\n      \u003e stage_{stage_0.1.0}.npz\n      \u003e ....\n  \u003e config.json\n  \u003e summary.csv\n```\n\nKey files:\n- ```config.json```: experiment configuration (hyperparameters, technical details, etc)\n- ```summary.csv```: log of training details (losses, training time, etc)\n\n## Experiments\n\n### Mean, min-max distillation\n\nTraining with min-max distillation, rnnprop as target, small pool, convolutional network for training:\n```\npython train.py \\\n    --presets=conv_train,adam,rmsprop,il_more \\\n    --strategy=curriculum \\\n    --policy=rnnprop \\\n    --directory=results/rnnprop/min-max/1\n```\n\nEvaluation:\n```\npython evaluate.py \\\n    --problem=conv_train \\\n    --directory=results/rnnprop/min-max/1 \\\n    --repeat=10\n```\n\nMin-max distillation is the default setting. To use mean distillation, add the ```reduce_mean``` preset.\n\n### Choice distillation\n\nTrain the choice policy:\n```\npython train.py \\\n    --presets=conv_train,cl_fixed \\\n    --strategy=repeat \\\n    --policy=less_choice \\\n    --directory=results/less-choice/base/1\n```\n\nTrain for the final distillation step:\n```\npython train.py \\\n    --presets=conv_train,less_choice,il_more \\\n    --strategy=curriculum \\\n    --policy=rnnprop \\\n    --directory=results/rnnprop/choice2/1\n```\n\nEvaluation:\n```\npython evaluate.py \\\n    --problem=conv_train \\\n    --directory=results/rnnprop/choice2/1 \\\n    --repeat=10\n```\n\n### Stability-Aware Optimizer Distillation\n\nFGSM, PGD, Adaptive PGD, Gaussian, and Adaptive Gaussian perturbations are implemented.\n| Perturbation | Description | Preset Name | Magnitude Parameter |\n| - | - | - | - |\n| FGSM | Fast Gradient Sign Method | ```fgsm``` | ```step_size``` |\n| PGD | Projected Gradient Descent | ```pgd``` | ```magnitude``` |\n| Adaptive PGD | Adaptive PGD / \"Clipped\" GD | ```cgd``` | ```magnitude``` |\n| Random | Random Gaussian | ```gaussian``` | ```noise_stddev``` |\n| Adaptive Random | Random Gaussian, Adaptive Magnitude | ```gaussian_rel``` | ```noise_stddev``` |\n\nModify the magnitude of noise by passing\n```\n--policy/perturbation/config/[Magnitude Parameter]=[Desired Magnitude].\n```\n\nFor PGD variants, the number of adversarial attack steps can also be modified:\n```\n--policy/perturbation/config/steps=[Desired Steps]\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvita-group%2Foptimizeramalgamation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvita-group%2Foptimizeramalgamation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvita-group%2Foptimizeramalgamation/lists"}