https://github.com/daekeun-ml/sm-distributed-training-step-by-step

This repository provides hands-on labs on PyTorch-based Distributed Training and SageMaker Distributed Training. It is written to make it easy for beginners to get started, and guides you through step-by-step modifications to the code based on the most basic BERT use cases.
https://github.com/daekeun-ml/sm-distributed-training-step-by-step

data-parallelism distributed-training pytorch-ddp sagemaker

Last synced: 8 months ago
JSON representation

Host: GitHub
URL: https://github.com/daekeun-ml/sm-distributed-training-step-by-step
Owner: daekeun-ml
License: mit
Created: 2023-01-20T14:33:34.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-07-18T00:52:39.000Z (almost 3 years ago)
Last Synced: 2025-04-05T10:33:30.265Z (about 1 year ago)
Topics: data-parallelism, distributed-training, pytorch-ddp, sagemaker
Language: Jupyter Notebook
Homepage:
Size: 1.3 MB
Stars: 13
Watchers: 1
Forks: 2
Open Issues: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/daekeun-ml/sm-distributed-training-step-by-step

Awesome Lists containing this project