{"id":17632688,"url":"https://github.com/seieric/pytorch-mpi-singularity","last_synced_at":"2026-04-18T00:01:56.155Z","repository":{"id":253090771,"uuid":"840906597","full_name":"seieric/pytorch-mpi-singularity","owner":"seieric","description":"Singularity Container including PyTorch with CUDA and mpi backend for DistributedDataParallel","archived":false,"fork":false,"pushed_at":"2024-08-14T10:44:12.000Z","size":5,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-30T03:31:46.296Z","etag":null,"topics":["cuda","hpc","nvidia","openmpi","pytorch","singularity","utokyo"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/seieric.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-11T03:52:07.000Z","updated_at":"2024-08-14T10:44:16.000Z","dependencies_parsed_at":"2024-08-14T12:29:24.099Z","dependency_job_id":null,"html_url":"https://github.com/seieric/pytorch-mpi-singularity","commit_stats":null,"previous_names":["seieric/pytorch-mpi-singularity"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/seieric/pytorch-mpi-singularity","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seieric%2Fpytorch-mpi-singularity","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seieric%2Fpytorch-mpi-singularity/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seieric%2Fpytorch-mpi-singularity/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seieric%2Fpytorch-mpi-singularity/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/seieric","download_url":"https://codeload.github.com/seieric/pytorch-mpi-singularity/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seieric%2Fpytorch-mpi-singularity/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31950891,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-17T17:29:20.459Z","status":"ssl_error","status_checked_at":"2026-04-17T17:28:47.801Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","hpc","nvidia","openmpi","pytorch","singularity","utokyo"],"created_at":"2024-10-23T01:45:08.016Z","updated_at":"2026-04-18T00:01:56.119Z","avatar_url":"https://github.com/seieric.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PyTorch with OpenMPI singularity container\n\nThis is a singularity container which includes PyTorch with MPI backend support. The container is aimed to be used in HPC environments where MPI is the standard for parallel computing. Tested on [Wisteria/BDEC-01(Aquarius)](https://www.cc.u-tokyo.ac.jp/en/supercomputer/wisteria/system.php) cluster at The University of Tokyo.\n\nFor use with Wisetria/BDEC-01(Aquarius) cluster, see [wisteria/README.md](wisteria/README.md).\n\nPyTorch from official package does not support MPI backend for distributed learning. By using this image, you can run multi-node distributed learning with PyTorch's DistributedDataParallel module on MPI backend.\n\n## Software versions\n\nThis container is built based on `nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04` docker image. The following software versions are included:\n\n- PyTorch v2.4.0\n- torchvision v0.19.0\n- OpenMPI v4.1.4\n- UCX v1.17.0\n- CUDA v12.4.1\n\n## Usage\n\n### Build\n\nTo build the container, run the following command:\n\n```bash\nsingularity build --fakeroot container.sif container.def\n```\n\nBy default, the number of build proccesses is set to 72, the same as the number of available cpu cores on Wisteria's `prepost` resource group. You should change the number by modifying `MAX_JOBS` environment variable in `container.def`.\n\n**Warning:** The build process may take a long time and consumes a lot of memory. Make sure you have enough resources to build the container. In the case of Wisteria/BDEC-01(Aquarius) cluster, you need to use `prepost` resource group to avoid the build process being killed by the system.\n\n### Run\n\nYou can run the container with the following command:\n\n```bash\nmpirun -np 4 singularity exec container.sif python3 /path/to/your/script.py\n```\n\n## Notes\n\n- To improve build time, backends other than MPI (`gloo` and `nccl`) is not built.\n- Build process rarely fails due to unexpected reason. In that case, just try again.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseieric%2Fpytorch-mpi-singularity","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fseieric%2Fpytorch-mpi-singularity","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseieric%2Fpytorch-mpi-singularity/lists"}