https://github.com/daekeun-ml/sm-distributed-training-step-by-step
This repository provides hands-on labs on PyTorch-based Distributed Training and SageMaker Distributed Training. It is written to make it easy for beginners to get started, and guides you through step-by-step modifications to the code based on the most basic BERT use cases.
https://github.com/daekeun-ml/sm-distributed-training-step-by-step
data-parallelism distributed-training pytorch-ddp sagemaker
Last synced: 8 months ago
JSON representation
This repository provides hands-on labs on PyTorch-based Distributed Training and SageMaker Distributed Training. It is written to make it easy for beginners to get started, and guides you through step-by-step modifications to the code based on the most basic BERT use cases.
- Host: GitHub
- URL: https://github.com/daekeun-ml/sm-distributed-training-step-by-step
- Owner: daekeun-ml
- License: mit
- Created: 2023-01-20T14:33:34.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-07-18T00:52:39.000Z (almost 3 years ago)
- Last Synced: 2025-04-05T10:33:30.265Z (about 1 year ago)
- Topics: data-parallelism, distributed-training, pytorch-ddp, sagemaker
- Language: Jupyter Notebook
- Homepage:
- Size: 1.3 MB
- Stars: 13
- Watchers: 1
- Forks: 2
- Open Issues: 0