https://github.com/tchaton/pytorch2lightning
https://github.com/tchaton/pytorch2lightning
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/tchaton/pytorch2lightning
- Owner: tchaton
- Created: 2021-07-26T13:36:26.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2021-08-03T12:07:21.000Z (almost 5 years ago)
- Last Synced: 2024-12-30T07:51:23.228Z (over 1 year ago)
- Language: Python
- Size: 12.8 MB
- Stars: 15
- Watchers: 2
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Converting PyTorch 2 Lightning Examples
The repository will show you how to:
* Convert a pure `PyTorch Convolutional Neural Network Classifier` trained on MNIST to [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning).
* Extend Pure PyTorch trivially with Lightning best practice features.
* Seamlessly scale your training in the cloud with [Grid.ai](https://www.grid.ai/) - No code changes.
* Learn about [Lighting Flash](https://github.com/PyTorchLightning/lightning-flash) and its 15+ production ready tasks.
Find below PyTorch Community Voices | PyTorch Lightning | William Falcon & Thomas Chaton presenting this repository.
[](https://www.youtube.com/watch?v=A1bkh4gNDJA)
## Bare MNIST Classifier

* [PyTorch](bare_mnist/pytorch.py) | 127 lines
* [Lightning](bare_mnist/lightning.py) | 101 lines
## Add DDP Support
* [PyTorch](ddp_mnist/pytorch.py) | 184 lines
* [Lightning](ddp_mnist/lightning.py) | 102 lines: -82 lines
## Add DDP Spawn Support
* [PyTorch](ddp_mnist_spawn/pytorch.py) | 196 lines
* [Lightning](ddp_mnist_spawn/lightning.py) | 105 lines: -91 lines
## Add Accumulated Gradients Support
* [PyTorch](ddp_mnist_accumulate_gradients/pytorch.py) | +198 lines
* [Lightning](ddp_mnist_accumulate_gradients/lightning.py) | 106 lines: -92 lines
## Add Profiling Support
* [PyTorch](ddp_mnist_accumulate_gradients_profiler/pytorch.py) | +226 lines
* [Lightning](ddp_mnist_accumulate_gradients_profiler/lightning.py) | 106 lines: -120 lines
## Add DeepSpeed, FSDP, Multiple Loggers, Mutliple Profilers, TorchScript, Loop Customization, Fault Tolerant Training, etc ....
* [PyTorch](https://github.com/PyTorchLightning/pytorch-lightning) | requires a huge number of addtional lines. You `definitely` do not want to do that :tired_face:
* [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning) | Still ~ 106 lines. Let's keep it simple. :rocket:
Learn more with [Lighting Docs](https://pytorch-lightning.readthedocs.io/en/stable/).
PyTorch Lightning 1.4 is out ! Here is our [CHANGELOG](https://github.com/PyTorchLightning/pytorch-lightning/releases/tag/1.4.0).
Don't forget to :star: [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning).
# Training on [Grid.ai](https://www.grid.ai/)
[Grid.ai](https://www.grid.ai/) is a ML Platform from the creators of [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning) that enables you to train Machine Learning code without worrying about infrastructure.
Learn more with [Grid.ai Docs](https://docs.grid.ai/platform/about-these-features/multi-node)
### 1. Install Lightning-Grid
```bash
pip install lightning-grid --upgrade
```
### 2. SEAMLESSLY TRAIN 100s OF MACHINE LEARNING MODELS ON THE CLOUD FROM YOUR LAPTOP - NO CODE CHANGES
```bash
grid run --instance_type 4_M60_8gb ddp_mnist_grid/lightning.py --trainer.max_epochs 2 --trainer.gpus 4 --trainer.accelerator ddp
```
With [Grid DataStores](https://docs.grid.ai/products/global-cli-configs/cli-api/grid-datastores), low-latency, highly-scalable auto-versioned dataset.
```bash
grid datastore create --name mnist --source data
grid run --instance_type 4_M60_8gb --datastore_name mnist --datastore_mount_dir data ddp_mnist_grid/lightning.py --trainer.max_epochs 2 --trainer.gpus 4 --trainer.accelerator ddp
```
Pure PyTorch:
```bash
grid datastore create --name mnist --source data
grid run --instance_type g4dn.xlarge --gpus 2 ddp_mnist_grid/boring_pytorch.py
```
Add `--use_spot` to use interruptible machines.
[Grid.ai](https://www.grid.ai/) makes scaling multi node training easy :rocket: Train on 2+ nodes with 4 GPUS using [DDP Sharded](https://medium.com/pytorch/pytorch-lightning-1-1-model-parallelism-training-and-more-logging-options-7d1e47db7b0b) :fire:
```bash
grid run --instance_type 4_M60_8gb --gpus 8 --datastore_name mnist --datastore_mount_dir data ddp_mnist_grid/lightning.py --trainer.max_epochs 2 --trainer.num_nodes 2 --trainer.gpus 4 --trainer.accelerator ddp_sharded
```
Train [Andrej Karpathy](https://karpathy.ai) [minGPT](https://github.com/karpathy/minGPT) converted to [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning) by [@williamFalcon](https://github.com/williamFalcon) and bencharmked with DeepSpeed by [@SeanNaren](https://github.com/SeanNaren)
```
git clone https://github.com/SeanNaren/minGPT.git
git checkout benchmark
grid run --instance_type g4dn.12xlarge --gpus 8 benchmark.py --n_layer 6 --n_head 16 --n_embd 2048 --gpus 4 --num_nodes 2 --precision 16 --batch_size 32 --plugins deepspeed_stage_3
```
Learn how to scale your scripts with [PyTorch Lighting + DeepSpeed](https://devblog.pytorchlightning.ai/accessible-multi-billion-parameter-model-training-with-pytorch-lightning-deepspeed-c9333ac3bb59)
# [Lighting Flash](https://github.com/PyTorchLightning/lightning-flash).
[Lighting Flash](https://github.com/PyTorchLightning/lightning-flash) is collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning built on top of PyTorch Lightning.
Train a [PyTorchVideo Classifier](https://github.com/PyTorchLightning/lightning-flash/blob/master/flash_examples/video_classification.py) with [Lighting Flash](https://github.com/PyTorchLightning/lightning-flash). Check out [Grid.ai](https://www.grid.ai/) reproducible button:
[](https://platform.grid.ai/#/runs?script=https://github.com/aribornstein/KineticsDemo/blob/188f1948725506914b67d3814073a7bec152ac0a/train.py&cloud=grid&instance=g4dn.xlarge&accelerators=1&disk_size=200&framework=lightning&script_args=train.py%20--gpus%201%20--max_epochs%203)
```py
import os
import flash
from flash.core.data.utils import download_data
from flash.video import VideoClassificationData, VideoClassifier
# 1. Create the DataModule
# Find more datasets at https://pytorchvideo.readthedocs.io/en/latest/data.html
download_data("https://pl-flash-data.s3.amazonaws.com/kinetics.zip", "./data")
datamodule = VideoClassificationData.from_folders(
train_folder=os.path.join(os.getcwd(), "data/kinetics/train"),
val_folder=os.path.join(os.getcwd(), "data/kinetics/val"),
clip_sampler="uniform",
clip_duration=1,
decode_audio=False,
)
# 2. Build the task
model = VideoClassifier(backbone="x3d_xs", num_classes=datamodule.num_classes, pretrained=False)
# 3. Create the trainer and finetune the model
trainer = flash.Trainer(max_epochs=3)
trainer.finetune(model, datamodule=datamodule, strategy="freeze")
# 4. Make a prediction
predictions = model.predict(os.path.join(os.getcwd(), "data/kinetics/predict"))
print(predictions)
# 5. Save the model!
trainer.save_checkpoint("video_classification.pt")
```
## Credits
Credit to PyTorch Team for providing the [Bare Mnist example](https://github.com/pytorch/examples/blob/master/mnist/main.py).
Credit to Andrej Karpathy for providing an implementation of minGPT.
## Troubleshooting
Kill ddp processes
```bash
sudo kill -9 $(ps -aef | grep -i 'ddp' | grep -v 'grep' | awk '{ print $2 }')
```