{"id":30456507,"url":"https://github.com/alexdremov/optml","last_synced_at":"2025-10-16T22:29:00.609Z","repository":{"id":298894498,"uuid":"1001463117","full_name":"alexdremov/optml","owner":"alexdremov","description":"This repository contains the materials for the semester project \"Gradient Clipping for Coping with Heavy-Tailed Noise in Neural Networks.\" The project investigates the impact of gradient clipping on the training of neural networks, particularly in the context of heavy-tailed noise.","archived":false,"fork":false,"pushed_at":"2025-06-13T13:43:55.000Z","size":6,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-13T13:44:19.959Z","etag":null,"topics":["epfl","ml","optimization","optml"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alexdremov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-13T12:26:50.000Z","updated_at":"2025-06-13T13:43:59.000Z","dependencies_parsed_at":"2025-06-13T13:44:21.465Z","dependency_job_id":"0e8027a8-b162-4bba-815a-3ac05eafeca5","html_url":"https://github.com/alexdremov/optml","commit_stats":null,"previous_names":["alexdremov/optml"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/alexdremov/optml","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexdremov%2Foptml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexdremov%2Foptml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexdremov%2Foptml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexdremov%2Foptml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alexdremov","download_url":"https://codeload.github.com/alexdremov/optml/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexdremov%2Foptml/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271755687,"owners_count":24815459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-23T02:00:09.327Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["epfl","ml","optimization","optml"],"created_at":"2025-08-23T16:33:43.708Z","updated_at":"2025-10-16T22:28:55.557Z","avatar_url":"https://github.com/alexdremov.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Gradient Clipping for Coping with Heavy-Tailed Noise in Neural Networks\n\nThis repository contains the materials for the semester project \"Gradient Clipping for Coping with Heavy-Tailed Noise in Neural Networks.\" The project investigates the impact of gradient clipping on the training of neural networks, particularly in the context of heavy-tailed noise.\n\n## Authors\n\n- **Vsevolod Skorokhodov**\n  - SCIPER: 389703\n  - Email: vsevolod.skorokhodov@epfl.ch\n\n- **Andrei Semenov**\n  - SCIPER: 388983\n  - Email: andrii.semenov@epfl.ch\n\n- **Aleksandr Dremov**\n  - SCIPER: 387716\n  - Email: aleksandr.dremov@epfl.ch\n\n## Abstract\n\nUnlike SGD, methods with adaptive step sizes — such as Adam — are essential for training modern deep learning models, especially large language models.\nTypically, the noise in the stochastic gradients is heavy-tailed for the latter.\nGradient clipping helps achieve good high-probability convergence for such noises.\nMoreover, clipping fixes the provably poor high-probability convergence of Adam and AdaGrad.\nIn this project, we show that the distribution of gradient norms for NLP problems is significantly heavy-tailed, whereas this is not the case for computer vision settings.\nWe then investigate how different types of clipping affect model convergence in these two settings.\nOur results demonstrate that clipping largely improves training under heavy-tailed noise scenarios, while it is not critical when the noise is sub-Gaussian.\n\n## Code\n\nCode is split into two parts: llm-based code and resnet-based code. They refer to the same repository, but different branches. We made such choice for reproducibility reasons as some changes of resnet experiments may be breaking.\n\nWe provide bash file `run.sh` that should reproduce the reported results.\n\nTo fetch git submodules code, use\n\n```\ngit submodule update --init\n```\n\n## Requirements\n\n**Packages.** \nJupyter notebooks and PyTorch.\n\n## Organization of the code\n\nThe code is divided into two parts:\n- Codes for experiments with different versions of AdaGrad on synthetic quadratic problem are given in the folder \"Quadratic problem\".\n- Codes for experiments with different versions of Adam on RoBERTa Large and ALBERT Base models fine-tuning are given in the folder \"ALBERT fine-tuning\".\n- Codes for experiments with different versions of Adam and SGD on the ResNet training problem.\n\n## How to install\n\n```bash\ngit clone https://github.com/Andron00e/Clipped-AdaGrad-and-Adam\ncd Clipped-AdaGrad-and-Adam\npip install -r requirements.txt\n```\n\n## How to run\n\nThe repository contains 2 main folders: \"Quadratic problem\" and \"ALBERT fine-tuning\". \nTo check the RoBERTa fine-tuning experiments you firstly need to set the hyperparameters values. Depending on the task, go to the ```config_cola.yaml``` or ```config_qnli.yaml``` in the ```configs``` subfolder.\nWe use the same set of hyperparameters for both CoLA and QNLI datasets: \n- Optimizer hyperparameters (```opt```): ```lr```, ```betas```, ```eps```, ```weight_decay```, ```correct_bias```, ```clipping``` (use ```local```), ```max_grad_norm``` (i.e., clipping level), ```exp_avg_sq_value``` ($\\epsilon$), ```etta``` ($\\eta$).\n- training hyperparameters (```train```): ```model_checkpoint``` (```roberta-large```, ```albert-base-v2```), ```max_epoch```, ```batch_size``` (we suggest to use the ```8``` for RTE, and ```16``` for CoLA), ```seed```, ```classifier_dropout```, ```val_check_interval``` (pick ```12``` for RTE and ```20``` for CoLA).\n- ```data``` hyperparameters: ```task``` (use ```cola``` or ```rte```).\n\nWhen you have picked all the hyperparameters, please run the following scripts in the \"ALBERT fine-tuning\" directory: (dataset_name == cola, rte or qnli):\n1. To see how the selected hyperparameters affect training, run ```one_run_{dataset_name}.py```.\n2. To conduct many experiments, use the ```multi_runs_{dataset_name}.py``` script.\n3. To check the heavy tails during training, utilize ```check_tails_{dataset_name}.py```.\n4. Finally, run ```visualization.ipynb``` to reproduce the Figures from our report.\n\n**We believe the details provided are clear enough to reproduce the experimental part of our project.**","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexdremov%2Foptml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falexdremov%2Foptml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexdremov%2Foptml/lists"}