https://github.com/Lightning-AI/lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
https://github.com/Lightning-AI/lightning-thunder
Last synced: 19 days ago
JSON representation
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
- Host: GitHub
- URL: https://github.com/Lightning-AI/lightning-thunder
- Owner: Lightning-AI
- License: apache-2.0
- Created: 2024-03-18T15:30:56.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-29T09:07:43.000Z (6 months ago)
- Last Synced: 2024-10-29T09:22:23.562Z (6 months ago)
- Language: Python
- Homepage:
- Size: 6.45 MB
- Stars: 1,177
- Watchers: 34
- Forks: 77
- Open Issues: 276
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
- AiTreasureBox - Lightning-AI/lightning-thunder - 04-07_1321_1](https://img.shields.io/github/stars/Lightning-AI/lightning-thunder.svg)|Make PyTorch models Lightning fast! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once.| (Repos)
README
# Give your PyTorch models superpowers ⚡
![]()
![]()
Source-to-source compiler for PyTorch.
Fast. Understandable. Extensible.______________________________________________________________________
**Thunder** makes optimizing PyTorch models easy, augmenting them with custom kernels, fusions, quantization, distributed strategies, and more.
For **end users**, Thunder comes with plugins that provide model speed-ups out of the box, for optimal utilization of last generation hardware.
For **performance experts**, Thunder is the most ergonomic framework for understanding, modifying, and optimizing AI models through composable transformations.
✅ Run PyTorch 40% faster ✅ Quantization ✅ Kernel fusion
✅ Training recipes ✅ FP4/FP6/FP8 precision ✅ Distributed TP/PP/DP
✅ Inference recipes ✅ Ready for NVIDIA Blackwell ✅ CUDA Graphs
✅ LLMs, non LLMs and more ✅ Custom Triton kernels ✅ Compose all the above[](https://github.com/Lightning-AI/lightning-thunder/blob/main/LICENSE)
[](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-testing.yml)
[](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-checks.yml)
[](https://lightning-thunder.readthedocs.io/en/latest/?badge=latest)
[](https://results.pre-commit.ci/latest/github/Lightning-AI/lightning-thunder/main)
![]()
# Quick start
Install Thunder via pip ([more options](https://lightning.ai/docs/litserve/home/install)):
```bash
pip install torch==2.6.0 torchvision==0.21 nvfuser-cu124-torch26pip install lightning-thunder
```Advanced install options
### Blackwell support
For Blackwell you'll need CUDA 12.8
```bash
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu128
pip install --pre nvfuser-cu128 --extra-index-url https://pypi.nvidia.compip install lightning-thunder
```### Install additional executors
These are optional, feel free to mix and match
```bash
# cuDNN SDPA
pip install nvidia-cudnn-frontend# Float8 support (this will compile from source, be patient)
pip install "transformer_engine[pytorch]"
```### Install Thunder bleeding edge
```bash
pip install git+https://github.com/Lightning-AI/lightning-thunder.git@main
```### Install Thunder for development
```bash
git clone https://github.com/Lightning-AI/lightning-thunder.git
cd lightning-thunder
pip install -e .
```### Hello world
Define a function or a torch module:
```python
import torch.nn as nnmodel = nn.Sequential(nn.Linear(2048, 4096), nn.ReLU(), nn.Linear(4096, 64))
```Optimize it with thunder:
```python
import thunderthunder_model = thunder.compile(model)
x = torch.randn(64, 2048)
y = thunder_model(x)
assert y == model(x)
```## Examples
### Speed up LLM training
```python
import thunder
import torch
import litgptwith torch.device("cuda"):
model = litgpt.GPT.from_name("Llama-3.2-1B").to(torch.bfloat16)thunder_model = thunder.compile(model)
inp = torch.ones((1, 2048), device="cuda", dtype=torch.int64)
out = thunder_model(inp)
out.sum().backward()
```### Speed up HuggingFace BERT inference
```python
import thunder
import torch
import transformersmodel_name = "bert-large-uncased"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
with torch.device("cuda"):
model = transformers.AutoModelForCausalLM.from_pretrained(
model_name, torch_dtype=torch.bfloat16
)
model.requires_grad_(False)
model.eval()inp = tokenizer(["Hello world!"], return_tensors="pt")
thunder_model = thunder.compile(model, plugins="reduce-overhead")
out = thunder_model(**inp)
print(out)
```### Speed up HuggingFace DeepSeek R1 distill inference
```python
import torch
import transformers
import thundermodel_name = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
with torch.device("cuda"):
model = transformers.AutoModelForCausalLM.from_pretrained(
model_name, torch_dtype=torch.bfloat16
)
model.requires_grad_(False)
model.eval()inp = tokenizer(["Hello world! Here's a long story"], return_tensors="pt")
thunder_model = thunder.compile(
model, recipe="hf-transformers", plugins="reduce-overhead"
)out = thunder_model.generate(
**inp, do_sample=False, cache_implementation="static", max_new_tokens=100
)
print(out)
```### Speed up Vision Transformer inference
```python
import thunder
import torch
import torchvision as tvwith torch.device(device):
model = tv.models.vit_b_16()
model.requires_grad_(False)
model.eval()inp = torch.randn(128, 3, 224, 224)
out = model(inp)
thunder_model = thunder.compile(model, plugins="reduce-overhead")
out = thunder_model(inp)
```## Plugins
Plugins are a way to apply optimizations to a model, such as parallelism and quantization.
Thunder comes with a few plugins included of the box, but it's easy to write new ones.
- scale up with distributed strategies with DDP, FSDP, TP ()
- optimize numerical precision with FP8, MXFP8
- save memory with quantization
- reduce latency with CUDAGraphs
- debugging and profiling## How it works
Thunder works in three stages:
1. ⚡️ It acquires your model by interpreting Python bytecode and producing a straight-line Python program
1. ️⚡️ It transforms the computation trace to make it distributed, change precision
1. ⚡️ It routes parts of the trace for execution
- fusion (`NVFuser`, `torch.compile`)
- specialized libraries (e.g. `cuDNN SDPA`, `TransformerEngine`)
- custom Triton and CUDA kernels
- PyTorch eager operations
![]()
This is how the trace looks like for a simple MLP:
```python
import thunder
import torch.nn as nnmodel = nn.Sequential(nn.Linear(1024, 2048), nn.ReLU(), nn.Linear(2048, 256))
thunder_model = thunder.compile(model)
y = thunder_model(torch.randn(4, 1024))print(thunder.last_traces(thunder_model)[-1])
```This is the acquired trace, ready to be transformed and executed:
```python
def computation(input, t_0_bias, t_0_weight, t_2_bias, t_2_weight):
# input: "cuda:0 f32[4, 1024]"
# t_0_bias: "cuda:0 f32[2048]"
# t_0_weight: "cuda:0 f32[2048, 1024]"
# t_2_bias: "cuda:0 f32[256]"
# t_2_weight: "cuda:0 f32[256, 2048]"
t3 = ltorch.linear(input, t_0_weight, t_0_bias) # t3: "cuda:0 f32[4, 2048]"
t6 = ltorch.relu(t3, False) # t6: "cuda:0 f32[4, 2048]"
t10 = ltorch.linear(t6, t_2_weight, t_2_bias) # t10: "cuda:0 f32[4, 256]"
return (t10,)
```Note how Thunder's intermediate representation is just (a subset of) Python!
## Performance
Thunder is fast. Here are the speed-ups obtained on a pre-training task using LitGPT on H100 and B200 hardware, relative to PyTorch eager.
![]()
# Community
Thunder is an open source project, developed in collaboration with the community with significant contributions from NVIDIA.
💬 [Get help on Discord](https://discord.com/invite/XncpTy7DSt)
📋 [License: Apache 2.0](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)