{"id":19048464,"url":"https://github.com/siboehm/shallowspeed","last_synced_at":"2026-03-02T22:03:03.839Z","repository":{"id":86478537,"uuid":"531047470","full_name":"siboehm/ShallowSpeed","owner":"siboehm","description":"Small scale distributed training of sequential deep learning models, built on Numpy and MPI.","archived":false,"fork":false,"pushed_at":"2023-10-19T22:47:31.000Z","size":22219,"stargazers_count":134,"open_issues_count":1,"forks_count":6,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-06-21T02:57:54.809Z","etag":null,"topics":["deep-learning","distributed-computing","pipelines"],"latest_commit_sha":null,"homepage":"https://siboehm.com/articles/22/pipeline-parallel-training","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/siboehm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-08-31T11:09:17.000Z","updated_at":"2025-06-06T14:12:27.000Z","dependencies_parsed_at":null,"dependency_job_id":"fa4edcc7-831e-4e3d-9bfd-c98040b80f01","html_url":"https://github.com/siboehm/ShallowSpeed","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/siboehm/ShallowSpeed","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/siboehm%2FShallowSpeed","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/siboehm%2FShallowSpeed/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/siboehm%2FShallowSpeed/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/siboehm%2FShallowSpeed/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/siboehm","download_url":"https://codeload.github.com/siboehm/ShallowSpeed/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/siboehm%2FShallowSpeed/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261973727,"owners_count":23238586,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","distributed-computing","pipelines"],"created_at":"2024-11-08T23:07:00.055Z","updated_at":"2026-03-02T22:03:03.770Z","avatar_url":"https://github.com/siboehm.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Shallowspeed\n![stability-wip](https://img.shields.io/badge/stability-work_in_progress-lightgrey.svg)\n\nA tiny POC implementation of distributed training for sequential deep learning models.\nImplemented using plain Numpy \u0026 mpi4py.\n\n![](.github/assets/title_picture.jpg)\n\n\nCurrently implements:\n- Sequential models / deep MLPs, training using SGD.\n- Data parallel training with interleaved communication \u0026 computation, similar to PyTorch's [DistributedDataParallel](https://arxiv.org/abs/2006.15704).\n- Pipeline parallel training:\n  - Naive schedule without interleaved stages.\n  - [Gpipe](https://arxiv.org/abs/1811.06965) schedule with interleaved FWD \u0026 interleaved BWD.\n  - (soon) [PipeDream Flush](https://arxiv.org/abs/2006.09503) schedule with additional inter-FWD \u0026 BWD interleaving.\n- Any combination of DP \u0026 PP algorithms.\n\n## Setup\n```bash\nconda env create\npip install -e .\n# M1 Macs: conda install \"libblas=*=*accelerate\"\npython download_dataset.py\npytest\n```\n\n## Usage\n```bash\n# Sequential training\npython train.py\n# Data parallel distributed training\nmpirun -n 4 python train.py --dp 4\n# Pipeline parallel distributed training\nmpirun -n 4 python train.py --pp 4 --schedule naive\n# Data \u0026 pipeline parallel distributed training\nmpirun -n 8 python train.py --dp 2 --pp 4 --schedule gpipe\n```\n\n## Internals\n![](.github/assets/PP_pebble_graph.gif)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsiboehm%2Fshallowspeed","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsiboehm%2Fshallowspeed","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsiboehm%2Fshallowspeed/lists"}