{"id":15030931,"url":"https://github.com/lightning-ai/lightning-thunder","last_synced_at":"2025-04-28T15:33:07.456Z","repository":{"id":228883514,"uuid":"773894671","full_name":"Lightning-AI/lightning-thunder","owner":"Lightning-AI","description":"Thunder gives you PyTorch models superpowers for training and inference. Unlock out-of-the-box optimizations for performance, memory and parallelism, or roll out your own.","archived":false,"fork":false,"pushed_at":"2025-04-28T07:52:54.000Z","size":9458,"stargazers_count":1334,"open_issues_count":384,"forks_count":92,"subscribers_count":39,"default_branch":"main","last_synced_at":"2025-04-28T08:28:30.419Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Lightning-AI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-03-18T15:30:56.000Z","updated_at":"2025-04-28T07:22:51.000Z","dependencies_parsed_at":"2024-03-20T23:47:57.138Z","dependency_job_id":"59d2a6e6-1f50-43b0-bd89-49811c579ebb","html_url":"https://github.com/Lightning-AI/lightning-thunder","commit_stats":null,"previous_names":["lightning-ai/lightning-thunder"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lightning-AI%2Flightning-thunder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lightning-AI%2Flightning-thunder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lightning-AI%2Flightning-thunder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lightning-AI%2Flightning-thunder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Lightning-AI","download_url":"https://codeload.github.com/Lightning-AI/lightning-thunder/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251338963,"owners_count":21573645,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-09-24T20:14:33.482Z","updated_at":"2025-04-28T15:33:07.418Z","avatar_url":"https://github.com/Lightning-AI.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align='center'\u003e\n\n# Give your PyTorch models superpowers ⚡\n\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg alt=\"Thunder\" src=\"docs/source/_static/images/LightningThunderLightModewByline.png#gh-light-mode-only\" width=\"400px\" style=\"max-width: 100%;\"\u003e\n\u003cimg alt=\"Thunder\" src=\"docs/source/_static/images/LightningThunderDarkModewByline.png#gh-dark-mode-only\" width=\"400px\" style=\"max-width: 100%;\"\u003e\n\u003cbr/\u003e\n\u003cbr/\u003e\n\n\u0026#160;\n\n\u003cstrong\u003eSource-to-source compiler for PyTorch.\u003c/strong\u003e\nFast. Understandable. Extensible.\n\n\u003c/div\u003e\n\n______________________________________________________________________\n\n**Thunder** makes optimizing PyTorch models easy, augmenting them with custom kernels, fusions, quantization, distributed strategies, and more.\n\nFor **end users**, Thunder comes with plugins that provide model speed-ups out of the box, for optimal utilization of last generation hardware.\n\nFor **performance experts**, Thunder is the most ergonomic framework for understanding, modifying, and optimizing AI models through composable transformations.\n\n\u003cdiv align='center'\u003e\n\n\u003cpre\u003e\n✅ Run PyTorch 40% faster   ✅ Quantization                ✅ Kernel fusion        \n✅ Training recipes         ✅ FP4/FP6/FP8 precision       ✅ Distributed TP/PP/DP \n✅ Inference recipes        ✅ Ready for NVIDIA Blackwell  ✅ CUDA Graphs          \n✅ LLMs, non LLMs and more  ✅ Custom Triton kernels       ✅ Compose all the above\n\u003c/pre\u003e\n\n\u003c/div\u003e\n\n\u003cdiv align='center'\u003e\n\n[![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/Lightning-AI/lightning-thunder/blob/main/LICENSE)\n[![CI testing](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-testing.yml/badge.svg?event=push)](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-testing.yml)\n[![General checks](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-checks.yml/badge.svg?event=push)](https://github.com/Lightning-AI/lightning-thunder/actions/workflows/ci-checks.yml)\n[![Documentation Status](https://readthedocs.org/projects/lightning-thunder/badge/?version=latest)](https://lightning-thunder.readthedocs.io/en/latest/?badge=latest)\n[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/Lightning-AI/lightning-thunder/main.svg)](https://results.pre-commit.ci/latest/github/Lightning-AI/lightning-thunder/main)\n\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cdiv style=\"text-align: center;\"\u003e\n    \u003ca target=\"_blank\" href=\"#quick-start\" style=\"margin: 0 10px;\"\u003eQuick start\u003c/a\u003e •\n    \u003ca target=\"_blank\" href=\"#examples\" style=\"margin: 0 10px;\"\u003eExamples\u003c/a\u003e •\n    \u003ca target=\"_blank\" href=\"#performance\" style=\"margin: 0 10px;\"\u003ePerformance\u003c/a\u003e •\n    \u003c!-- \u003ca target=\"_blank\" href=\"#hosting-options\" style=\"margin: 0 10px;\"\u003eHosting\u003c/a\u003e • --\u003e\n    \u003ca target=\"_blank\" href=\"https://lightning.ai/docs/thunder/latest/\" style=\"margin: 0 10px;\"\u003eDocs\u003c/a\u003e\n  \u003c/div\u003e\n\u003c/div\u003e\n\n\u0026#160;\n\n\u003c!--\n\u003cdiv align=\"center\"\u003e\n\u003ca target=\"_blank\" href=\"https://lightning.ai/docs/thunder/home/get-started\"\u003e\n  \u003cimg src=\"https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/get-started-badge.svg\" height=\"36px\" alt=\"Get started\"/\u003e\n\u003c/a\u003e\n\u003c/div\u003e\n--\u003e\n\n\u0026#160;\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg alt=\"Thunder\" src=\"docs/source/_static/images/pretrain_perf.png\" width=\"800px\" style=\"max-width: 100%;\"\u003e\n\u003c/div\u003e\n\n# Quick start\n\nInstall Thunder via pip ([more options](https://lightning.ai/docs/thunder/latest/fundamentals/installation.html)):\n\n```bash\npip install torch==2.6.0 torchvision==0.21 nvfuser-cu124-torch26\n\npip install lightning-thunder\n```\n\n\u003cdetails\u003e\n  \u003csummary\u003eAdvanced install options\u003c/summary\u003e\n\n### Blackwell support\n\nFor Blackwell you'll need CUDA 12.8\n\n```bash\npip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu128\npip install --pre nvfuser-cu128 --extra-index-url https://pypi.nvidia.com\n\npip install lightning-thunder\n```\n\n### Install additional executors\n\nThese are optional, feel free to mix and match\n\n```bash\n# cuDNN SDPA\npip install nvidia-cudnn-frontend\n\n# Float8 support (this will compile from source, be patient)\npip install \"transformer_engine[pytorch]\"\n```\n\n### Install Thunder bleeding edge\n\n```bash\npip install git+https://github.com/Lightning-AI/lightning-thunder.git@main\n```\n\n### Install Thunder for development\n\n```bash\ngit clone https://github.com/Lightning-AI/lightning-thunder.git\ncd lightning-thunder\npip install -e .\n```\n\n\u003c/details\u003e\n\n### Hello world\n\nDefine a function or a torch module:\n\n```python\nimport torch.nn as nn\n\nmodel = nn.Sequential(nn.Linear(2048, 4096), nn.ReLU(), nn.Linear(4096, 64))\n```\n\nOptimize it with thunder:\n\n```python\nimport thunder\nimport torch\n\nthunder_model = thunder.compile(model)\n\nx = torch.randn(64, 2048)\n\ny = thunder_model(x)\n\nassert torch.testing.assert_close(y, model(x))\n```\n\n## Examples\n\n### Speed up LLM training\n\nInstall LitGPT (without updating other dependencies)\n\n```\npip install --no-deps 'litgpt[all]'\n```\n\nand run\n\n```python\nimport thunder\nimport torch\nimport litgpt\n\nwith torch.device(\"cuda\"):\n    model = litgpt.GPT.from_name(\"Llama-3.2-1B\").to(torch.bfloat16)\n\nthunder_model = thunder.compile(model)\n\ninp = torch.ones((1, 2048), device=\"cuda\", dtype=torch.int64)\n\nout = thunder_model(inp)\nout.sum().backward()\n```\n\n### Speed up HuggingFace BERT inference\n\nInstall Hugging Face Transformers (recommended version is `4.50.2` and above)\n\n```\npip install -U transformers\n```\n\nand run\n\n```python\nimport thunder\nimport torch\nimport transformers\n\nmodel_name = \"bert-large-uncased\"\n\ntokenizer = transformers.AutoTokenizer.from_pretrained(model_name)\n\nwith torch.device(\"cuda\"):\n    model = transformers.AutoModelForCausalLM.from_pretrained(\n        model_name, torch_dtype=torch.bfloat16\n    )\n    model.requires_grad_(False)\n    model.eval()\n\n    inp = tokenizer([\"Hello world!\"], return_tensors=\"pt\")\n\nthunder_model = thunder.compile(model)\n\nout = thunder_model(**inp)\nprint(out)\n```\n\n### Speed up HuggingFace DeepSeek R1 distill inference\n\nInstall Hugging Face Transformers (recommended version is `4.50.2` and above)\n\n```\npip install -U transformers\n```\n\nand run\n\n```python\nimport torch\nimport transformers\nimport thunder\n\nmodel_name = \"deepseek-ai/DeepSeek-R1-Distill-Llama-8B\"\n\ntokenizer = transformers.AutoTokenizer.from_pretrained(model_name)\n\nwith torch.device(\"cuda\"):\n    model = transformers.AutoModelForCausalLM.from_pretrained(\n        model_name, torch_dtype=torch.bfloat16\n    )\n    model.requires_grad_(False)\n    model.eval()\n\n    inp = tokenizer([\"Hello world! Here's a long story\"], return_tensors=\"pt\")\n\nthunder_model = thunder.compile(model)\n\nout = thunder_model.generate(\n    **inp, do_sample=False, cache_implementation=\"static\", max_new_tokens=100\n)\nprint(out)\n```\n\nTo get an idea of the speedups, just run\n\n```bash\npython examples/quickstart/hf_llm.py\n```\n\nHere what you get on a L4 machine from [Lightning Studio](https://lightning.ai):\n\n```bash\nEager: 2273.22ms\nThunder: 1254.39ms\n```\n\n81% faster 🏎️! Quite the speedup ⚡️\n\n### Speed up Vision Transformer inference\n\n```python\nimport thunder\nimport torch\nimport torchvision as tv\n\nwith torch.device(\"cuda\"):\n    model = tv.models.vit_b_16()\n    model.requires_grad_(False)\n    model.eval()\n\n    inp = torch.randn(128, 3, 224, 224)\n\nout = model(inp)\n\nthunder_model = thunder.compile(model)\n\nout = thunder_model(inp)\n```\n\n## Plugins\n\nPlugins are a way to apply optimizations to a model, such as parallelism and quantization.\n\nThunder comes with a few plugins included of the box, but it's easy to write new ones.\n\n- scale up with distributed strategies with DDP, FSDP, TP ()\n- optimize numerical precision with FP8, MXFP8\n- save memory with quantization\n- reduce latency with CUDAGraphs\n- debugging and profiling\n\nFor example, in order to reduce CPU overheads via CUDAGraphs you can add \"reduce-overhead\"\nto the `plugins=` argument of `thunder.compile`:\n\n```python\nthunder_model = thunder.compile(model, plugins=\"reduce-overhead\")\n```\n\nThis may or may not make a big difference. The point of Thunder is that you can easily\nswap optimizations in and out and explore the best combination for your setup.\n\n## How it works\n\nThunder works in three stages:\n\n1. ⚡️ It acquires your model by interpreting Python bytecode and producing a straight-line Python program\n\n1. ️⚡️ It transforms the computation trace to make it distributed, change precision\n\n1. ⚡️ It routes parts of the trace for execution\n\n   - fusion (`NVFuser`, `torch.compile`)\n   - specialized libraries (e.g. `cuDNN SDPA`, `TransformerEngine`)\n   - custom Triton and CUDA kernels\n   - PyTorch eager operations\n\n\u0026#160;\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg alt=\"Thunder\" src=\"docs/source/_static/images/how_it_works.png\" width=\"800px\" style=\"max-width: 100%;\"\u003e\n\u003c/div\u003e\n\n\u0026#160;\n\nThis is how the trace looks like for a simple MLP:\n\n```python\nimport thunder\nimport torch.nn as nn\n\nmodel = nn.Sequential(nn.Linear(1024, 2048), nn.ReLU(), nn.Linear(2048, 256))\n\nthunder_model = thunder.compile(model)\ny = thunder_model(torch.randn(4, 1024))\n\nprint(thunder.last_traces(thunder_model)[-1])\n```\n\nThis is the acquired trace, ready to be transformed and executed:\n\n```python\ndef computation(input, t_0_bias, t_0_weight, t_2_bias, t_2_weight):\n# input: \"cuda:0 f32[4, 1024]\"\n# t_0_bias: \"cuda:0 f32[2048]\"\n# t_0_weight: \"cuda:0 f32[2048, 1024]\"\n# t_2_bias: \"cuda:0 f32[256]\"\n# t_2_weight: \"cuda:0 f32[256, 2048]\"\nt3 = ltorch.linear(input, t_0_weight, t_0_bias) # t3: \"cuda:0 f32[4, 2048]\"\nt6 = ltorch.relu(t3, False) # t6: \"cuda:0 f32[4, 2048]\"\nt10 = ltorch.linear(t6, t_2_weight, t_2_bias) # t10: \"cuda:0 f32[4, 256]\"\nreturn (t10,)\n```\n\nNote how Thunder's intermediate representation is just (a subset of) Python!\n\n## Performance\n\nThunder is fast. Here are the speed-ups obtained on a pre-training task using LitGPT on H100 and B200 hardware, relative to PyTorch eager.\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg alt=\"Thunder\" src=\"docs/source/_static/images/pretrain_perf.png\" width=\"800px\" style=\"max-width: 100%;\"\u003e\n\u003c/div\u003e\n\n# Community\n\nThunder is an open source project, developed in collaboration with the community with significant contributions from NVIDIA.\n\n💬 [Get help on Discord](https://discord.com/invite/XncpTy7DSt)\n📋 [License: Apache 2.0](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flightning-ai%2Flightning-thunder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flightning-ai%2Flightning-thunder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flightning-ai%2Flightning-thunder/lists"}