{"id":19066174,"url":"https://github.com/epfml/localsgd-code","last_synced_at":"2025-06-27T00:35:57.603Z","repository":{"id":94154250,"uuid":"237430569","full_name":"epfml/LocalSGD-Code","owner":"epfml","description":null,"archived":false,"fork":false,"pushed_at":"2020-03-04T08:21:58.000Z","size":114,"stargazers_count":46,"open_issues_count":0,"forks_count":6,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-04-18T16:15:44.131Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/epfml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-01-31T12:58:50.000Z","updated_at":"2025-03-02T03:11:12.000Z","dependencies_parsed_at":"2023-04-04T14:46:52.031Z","dependency_job_id":null,"html_url":"https://github.com/epfml/LocalSGD-Code","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2FLocalSGD-Code","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2FLocalSGD-Code/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2FLocalSGD-Code/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2FLocalSGD-Code/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/epfml","download_url":"https://codeload.github.com/epfml/LocalSGD-Code/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251312288,"owners_count":21569207,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T00:55:02.232Z","updated_at":"2025-04-28T12:25:10.996Z","avatar_url":"https://github.com/epfml.png","language":"Python","readme":"# Don't Use Large Mini-batches, Use Local SGD\nWe present here the code of the experimental parts of the paper [Don't Use Large Mini-batches, Use Local SGD](https://openreview.net/forum?id=B1eyO1BFPr).\n\nAbstract:\nMini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep neural networks. \nDrastic increases in the mini-batch sizes have lead to key efficiency and scalability gains in recent years. \nHowever, progress faces a major roadblock, as models trained with large batches often do not generalize well, i.e. they do not show good accuracy on new data.\nAs a remedy, we propose a post-local SGD and show that it significantly improves the generalization performance compared to large-batch training on standard benchmarks while enjoying the same efficiency (time-to-accuracy) and scalability. We further provide an extensive study of the communication efficiency vs. performance trade-offs associated with a host of local SGD variants. \n\n\n# Code usage\nWe rely on `Docker` for our experimental environments. Please refer to the folder `distributed_code/environments/docker` for more details.\n\nThe script below trains `ResNet-20` with `CIFAR-10`, as an example of centralized training algorithm `(post) local SGD`.\nFor the detailed instructions and more examples, please refer to the file `distributed_code/README.md`.\n```bash\nOMP_NUM_THREADS=2 MKL_NUM_THREADS=2 $HOME/conda/envs/pytorch-py3.6/bin/python run.py \\\n    --arch resnet20 --optimizer local_sgd \\\n    --avg_model True --experiment demo --manual_seed 6 \\\n    --data cifar10 --pin_memory True \\\n    --batch_size 128 --base_batch_size 64 --num_workers 2 \\\n    --num_epochs 300 --partition_data random --reshuffle_per_epoch True --stop_criteria epoch \\\n    --n_mpi_process 16 --n_sub_process 1 --world 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 \\\n    --on_cuda True --use_ipc False \\\n    --lr 0.1 --lr_scaleup True --lr_warmup True --lr_warmup_epochs 5 \\\n    --lr_scheduler MultiStepLR --lr_decay 0.1 --lr_milestones 150,225 \\\n    --local_step 16 --turn_on_local_step_from 150 \\\n    --weight_decay 1e-4 --use_nesterov True --momentum_factor 0.9 \\\n    --hostfile hostfile --graph_topology complete --track_time True --display_tracked_time True \\\n    --python_path $HOME/conda/envs/pytorch-py3.6/bin/python --mpi_path $HOME/.openmpi/\n```\n\n# Reference\nIf you use this code, please cite the following [paper](https://openreview.net/forum?id=B1eyO1BFPr)\n\n```\n@inproceedings{lin2020dont,\n    title={Don't Use Large Mini-batches, Use Local {SGD}},\n    author={Tao Lin and Sebastian U. Stich and Kumar Kshitij Patel and Martin Jaggi},\n    booktitle={ICLR - International Conference on Learning Representations},\n    year={2020},\n    url={https://openreview.net/forum?id=B1eyO1BFPr}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepfml%2Flocalsgd-code","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fepfml%2Flocalsgd-code","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepfml%2Flocalsgd-code/lists"}