{"id":26711747,"url":"https://github.com/shermanlo77/multinode-gpu-study","last_synced_at":"2026-05-17T19:06:47.820Z","repository":{"id":226192229,"uuid":"768011268","full_name":"shermanlo77/multinode-gpu-study","owner":"shermanlo77","description":"These are some exercises and implementations to use multiple nodes of GPUs","archived":false,"fork":false,"pushed_at":"2024-09-06T11:29:47.000Z","size":41,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-27T10:39:48.477Z","etag":null,"topics":["bagging","distributed-computing","gpu-computing","multinode","parallel-computing","pytorch"],"latest_commit_sha":null,"homepage":"https://blog.hpc.qmul.ac.uk/pleasingly-parallel-gpu-case-studies-ml/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shermanlo77.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-03-06T10:01:20.000Z","updated_at":"2024-09-06T11:29:51.000Z","dependencies_parsed_at":"2024-03-12T10:15:21.735Z","dependency_job_id":null,"html_url":"https://github.com/shermanlo77/multinode-gpu-study","commit_stats":null,"previous_names":["shermanlo77/multinode-gpu-study"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/shermanlo77/multinode-gpu-study","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shermanlo77%2Fmultinode-gpu-study","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shermanlo77%2Fmultinode-gpu-study/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shermanlo77%2Fmultinode-gpu-study/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shermanlo77%2Fmultinode-gpu-study/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shermanlo77","download_url":"https://codeload.github.com/shermanlo77/multinode-gpu-study/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shermanlo77%2Fmultinode-gpu-study/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33151625,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-17T09:28:26.183Z","status":"ssl_error","status_checked_at":"2026-05-17T09:27:52.702Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bagging","distributed-computing","gpu-computing","multinode","parallel-computing","pytorch"],"created_at":"2025-03-27T10:33:41.930Z","updated_at":"2026-05-17T19:06:47.786Z","avatar_url":"https://github.com/shermanlo77.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Multiple Nodes of GPUs Examples in Machine Learning\n\nSherman Lo 2023-24\n\nQueen Mary, University of London\n\nThese are some exercises and implementations to use multiple nodes of GPUs. See\nthe [blog](https://blog.hpc.qmul.ac.uk/pleasingly-parallel-gpu-case-studies-ml/)\nfor more details.\n\n## Case Studies\n\nThe dataset used is the MNIST dataset, available in\n[`torchvision`](https://pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html).\n\n### Grid Search and SVM\n\nA GPU implementation of an SVM is available in\n[RAPIDS' `cuML`](https://docs.rapids.ai/api/cuml/stable/). The exercise is to\nimplement a grid search to tune the parameters `C` and `gamma`. This can be\nparallelised by splitting the search space between GPUs and nodes.\n\nThis is implemented in `mnist_svm`.\n\n![](assets/svm.svg)\n\nFigure 1: A box plot of validation errors for a fixed `gamma` when fitting a\n`cuml.svm.SVC` on the MNIST dataset. The box plot captures the 5 validation\nerrors from 5-fold cross-validation. The minimum is somewhere around\n`C=math.log(1.3, 10)`.\n\n### Bagging and Neural Networks\n\nA [PyTorch](https://pytorch.org/) neural network `Net` is defined in\n`mnist_nn/model.py` and can be trained using the optimiser `torch.optim.SGD`\nwith loss function `torch.nn.CrossEntropyLoss()`.\n\nThe exercise is to implement bagging and random search to tune and quantify the\nuncertainty of the neural network's parameters `n_conv_layer`, `kernel_size`\n`n_hidden_layer` and the optimiser's parameters `lr` and `momentum`. This can\nbe parallelised by distributing the bootstrap samples, or a seed, between GPUs\nand nodes.\n\nThis is implemented in `mnist_nn`.\n\n![](assets/nn.webp)\n\nFigure 1: Box plot of the certainty of each model's prediction of each digit.\nThe handwriting to predict is shown on the top left.\n\n### Parallel Strategies\n\nA hybrid approach was implemented here, using both `multiprocessing`, to use\nmultiple GPUs on a node, and `mpi4py`, to use multiple nodes.\n\nIn the single node case, having MPI and `mpi4py` installed is optional.\n\nIn the multiple node case, it is required to execute `python` with MPI. This\ncan be done with commands such as `mpirun` and `srun`. You must allocate a\nprocess per node. This can be done, for example:\n\n- For OpenMPI, supply the option `--map-by ppr:1:node`\n- For IntelMPI, supply the option `-ppn 1`\n- On Slurm, supply the option `#SBATCH --ntasks-per-node=1`\n\n## Getting Started\n\nThe `pip` requirement files are available in `requirements-cu*.txt` where the\nsuffix is the CUDA version. For example `requirements-cu11.txt` for CUDA 11.\nThese packages can be installed, for example in a virtual environment, using\nfor example\n\n```shell\npip install -r requirements-cu11.txt\n```\n\nRun `python mnist.py` to download the data.\n\n### Reproducibility\n\nThe requirements file `requirements-pin.txt` has pinned versions should you\nwish to exactly reproduce results on the blog. Furthermore, Python 3.10.7 was\nused on Apocrita, Python 3.10.4 on Sulis.\n\n### `mnist_svm`\n\nIf you wish to you multiple nodes, use, for example, `mpirun` as explained\npreviously.\n\n```text\npython -m mnist_svm [-h] [--gpu GPU] [--batch BATCH] [--results RESULTS] n_tuning\n\npositional arguments:\n  n_tuning           Number of tuning parameters\n\noptions:\n  -h, --help         show this help message and exit\n  --gpu GPU          What GPUs to use. Indicate individual GPUs using integers separated by\n                     a comma, eg 0,1,2. Or provide 'all' to use all available GPUs. Defaults\n                     to device 0\n  --batch BATCH      Number of tuning parameters to validate for a worker before a new one\n                     instantiates. Defaults to no re-instantiation\n  --results RESULTS  Where to save figures and results, defaults to here\n```\n\n### `mnist_nn`\n\nIf you wish to you multiple nodes, use, for example, `mpirun` as explained\npreviously.\n\n```text\npython -m mnist_nn [-h] [--gpu GPU] [--results RESULTS] [--seed SEED] n_bootstrap n_tuning\n\npositional arguments:\n  n_bootstrap        Number of bootstrap samples\n  n_tuning           Number of tuning parameters to search\n\noptions:\n  -h, --help         show this help message and exit\n  --gpu GPU          What GPUs to use. Indicate individual GPUs using integers separated by\n                     a comma, eg 0,1,2. Or provide 'all' to use all available GPUs. Defaults\n                     to device 0\n  --results RESULTS  Where to save figures and results, defaults to here\n  --seed\n```\n\n### Apptainers\n\nApptainers definition files `requirements-cu*.def` are provided too if you would\nlike to use a container. They can be built, using for example\n\n```shell\napptainer build container-cu11.sif requirements-cu11.def\n```\n\nand run using\n\n```shell\napptainer run --nv container-cu11.sif [OPTIONS]\n```\n\nwhere `[OPTIONS]` are options and commands you would usually put after `python`.\n\nIt may be tricky if you require to run MPI with these containers, please refer\nto the [Apptainer manual](https://apptainer.org/docs/user/latest/mpi.html).\n\n### Reducing Code For Testing\n\nIf you would like to run small tests to verify the code works, please see the\nsuggestions below.\n\n- `mnist_svm`\n  - Run for a small grid, eg `python -m mnist_svm 1`\n- `mnist_nn`\n  - Run for fewer bootstrap samples and search less, eg `python -m mnist_nn 1 1`\n  - You can reduce the number of iterations of the dataset when doing stochastic\n    gradient descent. In `mnist_nn/train.py`, you can reduce `N_MAX_EPOCH` to\n    something small, eg `N_MAX_EPOCH = 1`\n  - You can reduce the number of epochs used in stochastic gradient descent. In\n    `mnist_nn/train.py`, see the function `train_model()`. The loop `for data in\n    data_loader:` iterates for each epoch.\n  - You can reduce the complexity of the neural network when doing random\n    search. You can modify the distribution of the tuning parameters by\n    modifying the function `random_parameter()` in `mnist_nn/model.py`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshermanlo77%2Fmultinode-gpu-study","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshermanlo77%2Fmultinode-gpu-study","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshermanlo77%2Fmultinode-gpu-study/lists"}