{"id":15662396,"url":"https://github.com/rasbt/b3-basic-batchsize-benchmark","last_synced_at":"2025-05-05T23:27:48.646Z","repository":{"id":41424568,"uuid":"509544750","full_name":"rasbt/b3-basic-batchsize-benchmark","owner":"rasbt","description":"Experiments for the blog post \"No, We Don't Have to Choose Batch Sizes As Powers Of 2\"","archived":false,"fork":false,"pushed_at":"2022-07-05T18:53:12.000Z","size":19,"stargazers_count":19,"open_issues_count":0,"forks_count":4,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-31T00:41:21.389Z","etag":null,"topics":["deep-learning","deep-neural-networks","machine-learning","neural-networks"],"latest_commit_sha":null,"homepage":"https://sebastianraschka.com/blog/2022/batch-size-2.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rasbt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-07-01T17:53:49.000Z","updated_at":"2024-12-26T18:00:00.000Z","dependencies_parsed_at":"2022-09-08T12:10:21.009Z","dependency_job_id":null,"html_url":"https://github.com/rasbt/b3-basic-batchsize-benchmark","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rasbt%2Fb3-basic-batchsize-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rasbt%2Fb3-basic-batchsize-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rasbt%2Fb3-basic-batchsize-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rasbt%2Fb3-basic-batchsize-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rasbt","download_url":"https://codeload.github.com/rasbt/b3-basic-batchsize-benchmark/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252592261,"owners_count":21773230,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","deep-neural-networks","machine-learning","neural-networks"],"created_at":"2024-10-03T13:32:18.590Z","updated_at":"2025-05-05T23:27:48.627Z","avatar_url":"https://github.com/rasbt.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# B3 -- Basic Batchsize Benchmark\n\n\n\nA quick benchmark with different batch sizes that was prompted by the discussion [here](https://twitter.com/rasbt/status/1542882893181108227?s=20\u0026t=96dUITuyaNJUfw1TWxDLng), which was in turn prompted by the [Do Batch Sizes Actually Need to be Powers of 2?](https://wandb.ai/datenzauberai/Batch-Size-Testing/reports/Do-Batch-Sizes-Actually-Need-to-be-Powers-of-2---VmlldzoyMDkwNDQx) article.\n\n\n\nRight now, this benchmark is a [MobileNetV3 (large)](https://arxiv.org/abs/1905.02244) on CIFAR-10 (the images are resized to 224 to reach proper GPU utilization). You can run it as follows:\n\n\n\n**Step 1: Initial Setup**\n\n```bash\ngit clone https://github.com/rasbt/b3-basic-batchsize-benchmark.git\ncd b3-basic-batchsize-benchmark\nconda create -n benchmark python=3.8\nconda activate benchmark\npip install -r requirements.txt\n```\n\n\n\n**Step 2: Running the Training Script**\n\n\n```python\npython main.py --num_epochs 10 --batch_size 127 --mixed_precision true\n```\n\n\n\n### Additional Resources\n\n- [Ross Wightman mentioning](https://twitter.com/wightmanr/status/1542917523556904960?s=20\u0026t=96dUITuyaNJUfw1TWxDLng) that it might matter more for TPUs\n- [Nvidia's Deep Learning Performance Documentation on matrix multiplication](https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html) explaining the theoretical rational behind choosing batch sizes as multiples of 8 for tensor cores\n\n\n\n### Results\n\n\n\n\n| batch size | train time | inf. time  | epochs | GPU  | mixed prec. |\n| ---------- | ---------- | --------- | ------ | ---- | ----------- |\n| 100        | 10.50 min  | 0.15 min  | 10     | V100 | Yes         |\n| 127        | 9.80 min   | 0.15 min  | 10     | V100 | Yes         |\n| 128        | 9.78 min   | 0.15 min  | 10     | V100 | Yes         |\n| 129        | 9.92 min   | 0.15 min  | 10     | V100 | Yes         |\n| 156        | 9.38 min   | 0.16 min  | 10     | V100 | Yes         |\n|            |            |           |        |      |             |\n| 511        | 8.74 min   | 0.17 min  | 10     | V100 | Yes         |\n| 512        | 8.71 min   | 0.17 min  | 10     | V100 | Yes         |\n| 513        | 8.72 min   | 0.17 min  | 10     | V100 | Yes         |\n\n\nBelow, I trained the same neural network using 4 V100 GPUs with the distributed data parallel strategy:\n\n```bash\npython main.py --num_epochs 10 --batch_size 255 --mixed_precision true --num_workers 4 --strategy ddp\n```\n\n| batch size | train time | epochs | GPU    | mixed prec. |\n| ---------- | ---------- | ------ | ------ | ----------- |\n| 255        |  2.95 min  |  10    | 4xV100 | Yes         |\n| 256        |  2.87 min  |  10    | 4xV100 | Yes         |\n| 257        |  2.86 min  |  10    | 4xV100 | Yes         |\n\nNote that I removed the inference time (here: evaluation on the test set) from this table, because in practice, you would still use a single V100 for inference purposes. \n\n\n\n\nNote that this is all from one run each. To get more reliable stats, repeating the runs many times and reporting the average + SD might be worthwhile. However, even from the numbers above, it is probably apparent that there is only a small but barely noticeable difference between 127, 128, and 129.\n\n\n\n**Or in other words, do you have a batch size of 128 that you would like to run, but it doesn't fit into memory? It's probably okay to train that model with a batch size of 120 and 100 before scaling it down to 64** 😊.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frasbt%2Fb3-basic-batchsize-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frasbt%2Fb3-basic-batchsize-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frasbt%2Fb3-basic-batchsize-benchmark/lists"}