{"id":13477959,"url":"https://github.com/pytorch/torchtune","last_synced_at":"2025-05-13T18:05:24.389Z","repository":{"id":229035857,"uuid":"707869465","full_name":"pytorch/torchtune","owner":"pytorch","description":"PyTorch native post-training library","archived":false,"fork":false,"pushed_at":"2025-05-06T13:27:22.000Z","size":62487,"stargazers_count":5155,"open_issues_count":369,"forks_count":592,"subscribers_count":45,"default_branch":"main","last_synced_at":"2025-05-06T17:14:55.130Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://pytorch.org/torchtune/main/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pytorch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-10-20T21:10:49.000Z","updated_at":"2025-05-06T13:27:26.000Z","dependencies_parsed_at":"2024-04-14T21:23:30.874Z","dependency_job_id":"277b1504-185b-4f80-90fe-ae0ed651559c","html_url":"https://github.com/pytorch/torchtune","commit_stats":{"total_commits":759,"total_committers":81,"mean_commits":9.37037037037037,"dds":0.844532279314888,"last_synced_commit":"ee343e61804f9942b2bd48243552bf17b5d0d553"},"previous_names":["pytorch/torchtune"],"tags_count":49,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Ftorchtune","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Ftorchtune/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Ftorchtune/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Ftorchtune/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pytorch","download_url":"https://codeload.github.com/pytorch/torchtune/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254000824,"owners_count":21997441,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T16:01:50.468Z","updated_at":"2025-05-13T18:05:24.317Z","avatar_url":"https://github.com/pytorch.png","language":"Python","readme":"\n\n\n# torchtune\n\n[![Unit Test](https://github.com/pytorch/torchtune/actions/workflows/unit_test.yaml/badge.svg?branch=main)](https://github.com/pytorch/torchtune/actions/workflows/unit_test.yaml)\n![Integration Tests](https://github.com/pytorch/torchtune/actions/workflows/gpu_test.yaml/badge.svg)\n[![](https://dcbadge.vercel.app/api/server/4Xsdn8Rr9Q?style=flat)](https://discord.gg/4Xsdn8Rr9Q)\n\n[**Overview**](#overview-) | [**Installation**](#installation-%EF%B8%8F) | [**Get Started**](#get-started-) |  [**Documentation**](https://pytorch.org/torchtune/main/index.html) | [**Community**](#community-) | [**Citing torchtune**](#citing-torchtune-) | [**License**](#license)\n\n### 📣 Recent updates 📣\n* *April 2025*: **Llama4** is now available in torchtune! Try out our full and LoRA finetuning configs [here](recipes/configs/llama4)\n* *February 2025*: Multi-node training is officially [open for business in torchtune](https://pytorch.org/torchtune/main/tutorials/multinode.html)! Full finetune on multiple nodes to take advantage of larger batch sizes and models.\n* *December 2024*: torchtune now supports **Llama 3.3 70B**! Try it out by following our installation instructions [here](#installation-%EF%B8%8F), then run any of the configs [here](recipes/configs/llama3_3).\n* *November 2024*: torchtune has released [v0.4.0](https://github.com/pytorch/torchtune/releases/tag/v0.4.0) which includes stable support for exciting features like activation offloading and multimodal QLoRA\n* *November 2024*: torchtune has added [Gemma2](recipes/configs/gemma2) to its models!\n* *October 2024*: torchtune added support for Qwen2.5 models - find the configs [here](recipes/configs/qwen2_5/)\n* *September 2024*: torchtune has support for **Llama 3.2 11B Vision**, **Llama 3.2 3B**, and **Llama 3.2 1B** models! Try them out by following our installation instructions [here](#installation-%EF%B8%8F), then run any of the text configs [here](recipes/configs/llama3_2) or vision configs [here](recipes/configs/llama3_2_vision).\n\n\n\u0026nbsp;\n\n## Overview 📚\n\n\ntorchtune is a PyTorch library for easily authoring, post-training, and experimenting with LLMs. It provides:\n\n- Hackable training recipes for SFT, knowledge distillation, DPO, PPO, GRPO, and quantization-aware training\n- Simple PyTorch implementations of popular LLMs like Llama, Gemma, Mistral, Phi, Qwen, and more\n- Best-in-class memory efficiency, performance improvements, and scaling, utilizing the latest PyTorch APIs\n- YAML configs for easily configuring training, evaluation, quantization or inference recipes\n\n\u0026nbsp;\n\n### Post-training recipes\n\ntorchtune supports [the entire post-training lifecycle](https://pytorch.org/torchtune/main/recipes/recipes_overview.html). A successful post-trained model will likely utilize several of the below methods.\n\n#### Supervised Finetuning (SFT)\n\n| Type of Weight Update | 1 Device | \u003e1 Device | \u003e1 Node |\n|-----------------------|:--------:|:---------:|:-------:|\n| Full                  |    ✅    |     ✅    |   ✅    |\n| [LoRA/QLoRA](https://pytorch.org/torchtune/stable/recipes/lora_finetune_single_device.html)            |    ✅    |     ✅    |    ✅    |\n\nExample: ``tune run lora_finetune_single_device --config llama3_2/3B_lora_single_device`` \u003cbr /\u003e\nYou can also run e.g. ``tune ls lora_finetune_single_device`` for a full list of available configs.\n\n#### [Knowledge Distillation (KD)](https://pytorch.org/torchtune/0.4/tutorials/llama_kd_tutorial.html)\n\n| Type of Weight Update | 1 Device | \u003e1 Device | \u003e1 Node |\n|-----------------------|:--------:|:---------:|:-------:|\n| Full                  |    ❌    |     ❌    |    ❌    |\n| LoRA/QLoRA            |    ✅    |     ✅    |    ❌    |\n\nExample: ``tune run knowledge_distillation_distributed --config qwen2/1.5B_to_0.5B_KD_lora_distributed`` \u003cbr /\u003e\nYou can also run e.g. ``tune ls knowledge_distillation_distributed`` for a full list of available configs.\n\n#### Reinforcement Learning / Reinforcement Learning from Human Feedback (RLHF)\n\n| Method | Type of Weight Update | 1 Device | \u003e1 Device | \u003e1 Node |\n|------------------------------|-----------------------|:--------:|:---------:|:-------:|\n| [DPO](https://pytorch.org/torchtune/stable/recipes/dpo.html)                          | Full                  |    ❌    |     ✅    |    ❌    |\n|                           | LoRA/QLoRA            |    ✅    |     ✅    |    ❌    |\n| PPO                          | Full                  |    ✅    |     ❌    |    ❌    |\n|                           | LoRA/QLoRA            |    ❌    |     ❌    |    ❌    |\n| GRPO                         | Full                  |    🚧    |     ✅    |  ✅   |\n|                           | LoRA/QLoRA            |    ❌    |     ❌    |    ❌    |\n\nExample: ``tune run lora_dpo_single_device --config llama3_1/8B_dpo_single_device`` \u003cbr /\u003e\nYou can also run e.g. ``tune ls full_dpo_distributed`` for a full list of available configs.\n\n#### [Quantization-Aware Training (QAT)](https://pytorch.org/torchtune/main/tutorials/qat_finetune.html)\n\n| Type of Weight Update | 1 Device | \u003e1 Device | \u003e1 Node |\n|-----------------------|:--------:|:---------:|:-------:|\n| [Full](https://pytorch.org/torchtune/stable/recipes/qat_distributed.html)                  |    ❌    |     ✅    |    ❌    |\n| LoRA/QLoRA            |    ❌    |     ✅    |    ❌    |\n\nExample: ``tune run qat_distributed --config llama3_1/8B_qat_lora`` \u003cbr /\u003e\nYou can also run e.g. ``tune ls qat_distributed`` for a full list of available configs.\n\nThe above configs are just examples to get you started. The full list of recipes can be found [here](recipes/). If you'd like to work on one of the gaps you see, please submit a PR! If there's a entirely new post-training method you'd like to see implemented in torchtune, feel free to open an Issue.\n\n\u0026nbsp;\n\n### Models\n\nFor the above recipes, torchtune supports many state-of-the-art models available on the [Hugging Face Hub](https://huggingface.co/models) or [Kaggle Hub](https://www.kaggle.com/models). Some of our supported models:\n\n| Model                                         | Sizes     |\n|-----------------------------------------------|-----------|\n| [Llama4](https://www.llama.com/docs/model-cards-and-prompt-formats/llama4)    | Scout (17B x 16E) [[models](torchtune/models/llama4/_model_builders.py), [configs](recipes/configs/llama4/)]        |\n| [Llama3.3](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_3)    | 70B [[models](torchtune/models/llama3_3/_model_builders.py), [configs](recipes/configs/llama3_3/)]        |\n| [Llama3.2-Vision](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2#-llama-3.2-vision-models-(11b/90b)-)    | 11B, 90B [[models](torchtune/models/llama3_2_vision/_model_builders.py), [configs](recipes/configs/llama3_2_vision/)]        |\n| [Llama3.2](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2)    | 1B, 3B [[models](torchtune/models/llama3_2/_model_builders.py), [configs](recipes/configs/llama3_2/)]        |\n| [Llama3.1](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1)    | 8B, 70B, 405B [[models](torchtune/models/llama3_1/_model_builders.py), [configs](recipes/configs/llama3_1/)]        |\n| [Mistral](https://huggingface.co/mistralai)   | 7B [[models](torchtune/models/mistral/_model_builders.py), [configs](recipes/configs/mistral/)] |\n| [Gemma2](https://huggingface.co/docs/transformers/main/en/model_doc/gemma2)   | 2B, 9B, 27B [[models](torchtune/models/gemma2/_model_builders.py), [configs](recipes/configs/gemma2/)] |\n| [Microsoft Phi4](https://huggingface.co/collections/microsoft/phi-4-677e9380e514feb5577a40e4) | 14B [[models](torchtune/models/phi4/), [configs](recipes/configs/phi4/)]\n| [Microsoft Phi3](https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3) | Mini [[models](torchtune/models/phi3/), [configs](recipes/configs/phi3/)]\n| [Qwen2.5](https://qwenlm.github.io/blog/qwen2.5/) | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B [[models](torchtune/models/qwen2_5/), [configs](recipes/configs/qwen2_5/)]\n| [Qwen2](https://qwenlm.github.io/blog/qwen2/) | 0.5B, 1.5B, 7B [[models](torchtune/models/qwen2/), [configs](recipes/configs/qwen2/)]\n\nWe're always adding new models, but feel free to [file an issue](https://github.com/pytorch/torchtune/issues/new) if there's a new one you would like to see in torchtune.\n\n\u0026nbsp;\n\n### Memory and training speed\n\nBelow is an example of the memory requirements and training speed for different Llama 3.1 models.\n\n\u003e [!NOTE]\n\u003e For ease of comparison, all the below numbers are provided for batch size 2 (without gradient accumulation), a dataset packed to sequence length 2048, and torch compile enabled.\n\nIf you are interested in running on different hardware or with different models, check out our documentation on memory optimizations [here](https://pytorch.org/torchtune/main/tutorials/memory_optimizations.html) to find the right setup for you.\n\n| Model | Finetuning Method | Runnable On | Peak Memory per GPU | Tokens/sec * |\n|:-:|:-:|:-:|:-:|:-:|\n| Llama 3.1 8B | Full finetune | 1x 4090 | 18.9 GiB | 1650 |\n| Llama 3.1 8B | Full finetune | 1x A6000 | 37.4 GiB |  2579|\n| Llama 3.1 8B | LoRA | 1x 4090 |  16.2 GiB | 3083 |\n| Llama 3.1 8B | LoRA | 1x A6000 | 30.3 GiB  | 4699 |\n| Llama 3.1 8B | QLoRA | 1x 4090 | 7.4 GiB | 2413  |\n| Llama 3.1 70B | Full finetune | 8x A100  | 13.9 GiB ** | 1568  |\n| Llama 3.1 70B | LoRA | 8x A100 | 27.6 GiB  | 3497  |\n| Llama 3.1 405B | QLoRA | 8x A100 | 44.8 GB  | 653  |\n\n*= Measured over one full training epoch \u003cbr /\u003e\n**= Uses CPU offload with fused optimizer\n\n\u0026nbsp;\n\n### Optimization flags\n\ntorchtune exposes a number of levers for memory efficiency and performance. The table below demonstrates the effects of applying some of these techniques sequentially to the Llama 3.2 3B model. Each technique is added on top of the previous one, except for LoRA and QLoRA, which do not use `optimizer_in_bwd` or `AdamW8bit` optimizer.\n\n\u003e Baseline uses Recipe=**full_finetune_single_device**, Model=**Llama 3.2 3B**, Batch size=**2**, Max sequence length=**4096**, Precision=**bf16**, Hardware=**A100**\n\n| Technique | Peak Memory Active (GiB) | % Change Memory vs Previous | Tokens Per Second | % Change Tokens/sec vs Previous|\n|:--|:-:|:-:|:-:|:-:|\n| Baseline | 25.5 | - | 2091 | - |\n| [+ Packed Dataset](https://pytorch.org/torchtune/main/basics/packing.html) | 60.0 | +135.16% | 7075 | +238.40% |\n| [+ Compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) | 51.0 | -14.93% | 8998 | +27.18% |\n| [+ Chunked Cross Entropy](https://pytorch.org/torchtune/main/generated/torchtune.modules.loss.CEWithChunkedOutputLoss.html) | 42.9 | -15.83% | 9174 | +1.96% |\n| [+ Activation Checkpointing](https://pytorch.org/torchtune/main/tutorials/memory_optimizations.html#activation-checkpointing) | 24.9 | -41.93% | 7210 | -21.41% |\n| [+ Fuse optimizer step into backward](https://pytorch.org/torchtune/main/tutorials/memory_optimizations.html#fusing-optimizer-step-into-backward-pass) | 23.1 | -7.29% | 7309 | +1.38% |\n| [+ Activation Offloading](https://pytorch.org/torchtune/main/tutorials/memory_optimizations.html#activation-offloading) | 21.8 | -5.48% | 7301 | -0.11% |\n| [+ 8-bit AdamW](https://pytorch.org/torchtune/main/tutorials/memory_optimizations.html#lower-precision-optimizers) | 17.6 | -19.63% | 6960 | -4.67% |\n| [LoRA](https://pytorch.org/torchtune/main/tutorials/memory_optimizations.html#glossary-lora) | 8.5 | -51.61% | 8210 | +17.96% |\n| [QLoRA](https://pytorch.org/torchtune/main/tutorials/memory_optimizations.html#quantized-low-rank-adaptation-qlora) | 4.6 | -45.71% | 8035 | -2.13% |\n\nThe final row in the table vs baseline + Packed Dataset uses **81.9%** less memory with a **284.3%** increase in tokens per second.\n\n\u003cdetails\u003e\n\u003csummary\u003eCommand to reproduce final row.\u003c/summary\u003e\n\n```bash\ntune run lora_finetune_single_device --config llama3_2/3B_qlora_single_device \\\ndataset.packed=True \\\ncompile=True \\\nloss=torchtune.modules.loss.CEWithChunkedOutputLoss \\\nenable_activation_checkpointing=True \\\noptimizer_in_bwd=False \\\nenable_activation_offloading=True \\\noptimizer=torch.optim.AdamW \\\ntokenizer.max_seq_len=4096 \\\ngradient_accumulation_steps=1 \\\nepochs=1 \\\nbatch_size=2\n```\n\n\u003c/details\u003e\n\n\u0026nbsp;\n\n## Installation 🛠️\n\n\ntorchtune is **only** tested with the latest stable PyTorch release (currently 2.6.0) as well as the preview nightly version, and leverages\ntorchvision for finetuning multimodal LLMs and torchao for the latest in quantization techniques; you should install these as well.\n\n### Install stable release\n\n```bash\n# Install stable PyTorch, torchvision, torchao stable releases\npip install torch torchvision torchao\npip install torchtune\n```\n\n### Install nightly release\n\n```bash\n# Install PyTorch, torchvision, torchao nightlies.\npip install --pre --upgrade torch torchvision torchao --index-url https://download.pytorch.org/whl/nightly/cu126 # full options are cpu/cu118/cu121/cu124/cu126/xpu\npip install --pre --upgrade torchtune --extra-index-url https://download.pytorch.org/whl/nightly/cpu\n```\n\nYou can also check out our [install documentation](https://pytorch.org/torchtune/main/install.html) for more information, including installing torchtune from source.\n\n\u0026nbsp;\n\nTo confirm that the package is installed correctly, you can run the following command:\n\n```bash\ntune --help\n```\n\nAnd should see the following output:\n\n```bash\nusage: tune [-h] {ls,cp,download,run,validate} ...\n\nWelcome to the torchtune CLI!\n\noptions:\n  -h, --help            show this help message and exit\n\n...\n```\n\n\u0026nbsp;\n\n## Get Started 🚀\n\n\nTo get started with torchtune, see our [First Finetune Tutorial](https://pytorch.org/torchtune/main/tutorials/first_finetune_tutorial.html). Our [End-to-End Workflow Tutorial](https://pytorch.org/torchtune/main/tutorials/e2e_flow.html) will show you how to evaluate, quantize, and run inference with a Llama model. The rest of this section will provide a quick overview of these steps with Llama3.1.\n\n\n### Downloading a model\n\nFollow the instructions on the official [`meta-llama`](https://huggingface.co/meta-llama) repository to ensure you have access to the official Llama model weights. Once you have confirmed access, you can run the following command to download the weights to your local machine. This will also download the tokenizer model and a responsible use guide.\n\nTo download Llama3.1, you can run:\n\n```bash\ntune download meta-llama/Meta-Llama-3.1-8B-Instruct \\\n--output-dir /tmp/Meta-Llama-3.1-8B-Instruct \\\n--ignore-patterns \"original/consolidated.00.pth\" \\\n--hf-token \u003cHF_TOKEN\u003e \\\n```\n\n\u003e [!Tip]\n\u003e Set your environment variable `HF_TOKEN` or pass in `--hf-token` to the command in order to validate your access. You can find your token at https://huggingface.co/settings/tokens\n\n### Running finetuning recipes\n\nYou can finetune Llama3.1 8B with LoRA on a single GPU using the following command:\n\n```bash\ntune run lora_finetune_single_device --config llama3_1/8B_lora_single_device\n```\n\nFor distributed training, tune CLI integrates with [torchrun](https://pytorch.org/docs/stable/elastic/run.html).\nTo run a full finetune of Llama3.1 8B on two GPUs:\n\n```bash\ntune run --nproc_per_node 2 full_finetune_distributed --config llama3_1/8B_full\n```\n\n\u003e [!Tip]\n\u003e Make sure to place any torchrun commands **before** the recipe specification. Any CLI args after this will override the config and not impact distributed training.\n\n### Modify Configs\n\nThere are two ways in which you can modify configs:\n\n**Config Overrides**\n\nYou can directly overwrite config fields from the command line:\n\n```bash\ntune run lora_finetune_single_device \\\n--config llama2/7B_lora_single_device \\\nbatch_size=8 \\\nenable_activation_checkpointing=True \\\nmax_steps_per_epoch=128\n```\n\n**Update a Local Copy**\n\nYou can also copy the config to your local directory and modify the contents directly:\n\n```bash\ntune cp llama3_1/8B_full ./my_custom_config.yaml\nCopied to ./my_custom_config.yaml\n```\n\nThen, you can run your custom recipe by directing the `tune run` command to your local files:\n\n```bash\ntune run full_finetune_distributed --config ./my_custom_config.yaml\n```\n\nCheck out `tune --help` for all possible CLI commands and options. For more information on using and updating configs, take a look at our [config deep-dive](https://pytorch.org/torchtune/main/deep_dives/configs.html).\n\n### Custom Datasets\n\ntorchtune supports finetuning on a variety of different datasets, including [instruct-style](https://pytorch.org/torchtune/main/basics/instruct_datasets.html), [chat-style](https://pytorch.org/torchtune/main/basics/chat_datasets.html), [preference datasets](https://pytorch.org/torchtune/main/basics/preference_datasets.html), and more. If you want to learn more about how to apply these components to finetune on your own custom dataset, please check out the provided links along with our [API docs](https://pytorch.org/torchtune/main/api_ref_datasets.html).\n\n\u0026nbsp;\n\n## Community 🌍\n\ntorchtune focuses on integrating with popular tools and libraries from the ecosystem. These are just a few examples, with more under development:\n\n- [Hugging Face Hub](https://huggingface.co/docs/hub/en/index) for [accessing model weights](torchtune/_cli/download.py)\n- [EleutherAI's LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) for [evaluating](recipes/eleuther_eval.py) trained models\n- [Hugging Face Datasets](https://huggingface.co/docs/datasets/en/index) for [access](torchtune/datasets/_instruct.py) to training and evaluation datasets\n- [PyTorch FSDP2](https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md) for distributed training\n- [torchao](https://github.com/pytorch-labs/ao) for lower precision dtypes and [post-training quantization](recipes/quantize.py) techniques\n- [Weights \u0026 Biases](https://wandb.ai/site) for [logging](https://pytorch.org/torchtune/main/deep_dives/wandb_logging.html) metrics and checkpoints, and tracking training progress\n- [Comet](https://www.comet.com/site/) as another option for [logging](https://pytorch.org/torchtune/main/deep_dives/comet_logging.html)\n- [ExecuTorch](https://pytorch.org/executorch-overview) for [on-device inference](https://github.com/pytorch/executorch/tree/main/examples/models/llama2#optional-finetuning) using finetuned models\n- [bitsandbytes](https://huggingface.co/docs/bitsandbytes/main/en/index) for low memory optimizers for our [single-device recipes](recipes/configs/llama2/7B_full_low_memory.yaml)\n- [PEFT](https://github.com/huggingface/peft) for continued finetuning or inference with torchtune models in the Hugging Face ecosystem\n\n\u0026nbsp;\n\n### Community Contributions\n\nWe really value our community and the contributions made by our wonderful users. We'll use this section to call out some of these contributions. If you'd like to help out as well, please see the [CONTRIBUTING](CONTRIBUTING.md) guide.\n\n- [@SalmanMohammadi](https://github.com/salmanmohammadi) for adding a comprehensive end-to-end recipe for [Reinforcement Learning from Human Feedback (RLHF)](recipes/ppo_full_finetune_single_device.py) finetuning with PPO to torchtune\n- [@fyabc](https://github.com/fyabc) for adding Qwen2 models, tokenizer, and recipe integration to torchtune\n- [@solitude-alive](https://github.com/solitude-alive) for adding the [Gemma 2B model](torchtune/models/gemma/) to torchtune, including recipe changes, numeric validations of the models and recipe correctness\n- [@yechenzhi](https://github.com/yechenzhi) for adding [Direct Preference Optimization (DPO)](recipes/lora_dpo_single_device.py) to torchtune, including the recipe and config along with correctness checks\n- [@Optimox](https://github.com/Optimox) for adding all the [Gemma2 variants](torchtune/models/gemma2) to torchtune!\n\n\n\u0026nbsp;\n\n## Acknowledgements 🙏\n\nThe transformer code in this repository is inspired by the original [Llama2 code](https://github.com/meta-llama/llama/blob/main/llama/model.py). We also want to give a huge shout-out to EleutherAI, Hugging Face and\nWeights \u0026 Biases for being wonderful collaborators and for working with us on some of these integrations within torchtune. In addition, we want to acknowledge some other awesome libraries and tools from the ecosystem:\n\n- [gpt-fast](https://github.com/pytorch-labs/gpt-fast) for performant LLM inference techniques which we've adopted out-of-the-box\n- [llama recipes](https://github.com/meta-llama/llama-recipes) for spring-boarding the llama2 community\n- [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) for bringing several memory and performance based techniques to the PyTorch ecosystem\n- [@winglian](https://github.com/winglian/) and [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for early feedback and brainstorming on torchtune's design and feature set.\n- [lit-gpt](https://github.com/Lightning-AI/litgpt) for pushing the LLM finetuning community forward.\n- [HF TRL](https://github.com/huggingface/trl) for making reward modeling more accessible to the PyTorch community.\n\n\u0026nbsp;\n\n## Citing torchtune 📝\n\nIf you find the torchtune library useful, please cite it in your work as below.\n\n```bibtex\n@software{torchtune,\n  title = {torchtune: PyTorch's finetuning library},\n  author = {torchtune maintainers and contributors},\n  url = {https//github.com/pytorch/torchtune},\n  license = {BSD-3-Clause},\n  month = apr,\n  year = {2024}\n}\n```\n\n\u0026nbsp;\n\n## License\n\ntorchtune is released under the [BSD 3 license](./LICENSE). However you may have other legal obligations that govern your use of other content, such as the terms of service for third-party models.\n","funding_links":[],"categories":["Python","LLM训练框架","A01_文本生成_文本对话","LLM Training Frameworks","NLP","Fine-Tuning \u0026 Training","微调 Fine-Tuning","Repos","Training and Fine Tuning Libraries","📋 Contents","LLM Training / Finetuning","Training","9. Fine-Tuning"],"sub_categories":["LLM 评估工具","大语言对话模型及数据","Fine-Tuning Frameworks","3. Pretraining","🛠️ 7. Training \u0026 Fine-tuning Ecosystem","FineTune","Training Frameworks"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytorch%2Ftorchtune","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpytorch%2Ftorchtune","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytorch%2Ftorchtune/lists"}