{"id":14964487,"url":"https://github.com/linkedin/liger-kernel","last_synced_at":"2025-05-13T20:04:47.141Z","repository":{"id":254348676,"uuid":"838970603","full_name":"linkedin/Liger-Kernel","owner":"linkedin","description":"Efficient Triton Kernels for LLM Training","archived":false,"fork":false,"pushed_at":"2025-05-05T19:35:45.000Z","size":16643,"stargazers_count":4973,"open_issues_count":87,"forks_count":317,"subscribers_count":46,"default_branch":"main","last_synced_at":"2025-05-06T19:52:12.816Z","etag":null,"topics":["finetuning","gemma2","llama","llama3","llm-training","llms","mistral","phi3","triton","triton-kernels"],"latest_commit_sha":null,"homepage":"https://arxiv.org/pdf/2410.10989","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linkedin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/contributing.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-08-06T17:47:52.000Z","updated_at":"2025-05-06T12:16:16.000Z","dependencies_parsed_at":null,"dependency_job_id":"b3126b86-be0a-40b1-9ccb-ab88080c3278","html_url":"https://github.com/linkedin/Liger-Kernel","commit_stats":{"total_commits":351,"total_committers":56,"mean_commits":6.25,"dds":0.6314285714285715,"last_synced_commit":"61eefe9a4429459351979dc7fe1de746fd7ca86f"},"previous_names":["linkedin/liger-kernel"],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2FLiger-Kernel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2FLiger-Kernel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2FLiger-Kernel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2FLiger-Kernel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linkedin","download_url":"https://codeload.github.com/linkedin/Liger-Kernel/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254020477,"owners_count":22000750,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["finetuning","gemma2","llama","llama3","llm-training","llms","mistral","phi3","triton","triton-kernels"],"created_at":"2024-09-24T13:33:15.409Z","updated_at":"2025-05-13T20:04:47.133Z","avatar_url":"https://github.com/linkedin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ca name=\"readme-top\"\u003e\u003c/a\u003e\r\n\r\n# Liger Kernel: Efficient Triton Kernels for LLM Training\r\n\r\n\r\n\u003ctable style=\"width: 100%; text-align: center; border-collapse: collapse;\"\u003e\r\n    \u003ctr\u003e\r\n        \u003cth style=\"padding: 10px;\" colspan=\"2\"\u003eStable\u003c/th\u003e\r\n        \u003cth style=\"padding: 10px;\" colspan=\"2\"\u003eNightly\u003c/th\u003e\r\n        \u003cth style=\"padding: 10px;\"\u003eDiscord\u003c/th\u003e\r\n    \u003c/tr\u003e\r\n    \u003ctr\u003e\r\n        \u003ctd style=\"padding: 10px;\"\u003e\r\n            \u003ca href=\"https://pepy.tech/project/liger-kernel\"\u003e\r\n                \u003cimg src=\"https://static.pepy.tech/badge/liger-kernel\" alt=\"Downloads (Stable)\"\u003e\r\n            \u003c/a\u003e\r\n        \u003c/td\u003e\r\n        \u003ctd style=\"padding: 10px;\"\u003e\r\n            \u003ca href=\"https://pypi.org/project/liger-kernel\"\u003e\r\n                \u003cimg alt=\"PyPI - Version\" src=\"https://img.shields.io/pypi/v/liger-kernel?color=green\"\u003e\r\n            \u003c/a\u003e\r\n        \u003c/td\u003e\r\n        \u003ctd style=\"padding: 10px;\"\u003e\r\n            \u003ca href=\"https://pepy.tech/project/liger-kernel-nightly\"\u003e\r\n                \u003cimg src=\"https://static.pepy.tech/badge/liger-kernel-nightly\" alt=\"Downloads (Nightly)\"\u003e\r\n            \u003c/a\u003e\r\n        \u003c/td\u003e\r\n        \u003ctd style=\"padding: 10px;\"\u003e\r\n            \u003ca href=\"https://pypi.org/project/liger-kernel-nightly\"\u003e\r\n                \u003cimg alt=\"PyPI - Version\" src=\"https://img.shields.io/pypi/v/liger-kernel-nightly?color=green\"\u003e\r\n            \u003c/a\u003e\r\n        \u003c/td\u003e\r\n        \u003ctd style=\"padding: 10px;\"\u003e\r\n            \u003ca href=\"https://discord.gg/gpumode\"\u003e\r\n                \u003cimg src=\"https://dcbadge.vercel.app/api/server/gpumode?style=flat\" alt=\"Join Our Discord\"\u003e\r\n            \u003c/a\u003e\r\n        \u003c/td\u003e\r\n    \u003c/tr\u003e\r\n\u003c/table\u003e\r\n\r\n\r\n\r\n\u003cimg src=\"https://raw.githubusercontent.com/linkedin/Liger-Kernel/main/docs/images/logo-banner.png\"\u003e\r\n\r\n[Installation](#installation) | [Getting Started](#getting-started) | [Examples](#examples) | [High-level APIs](#high-level-apis) | [Low-level APIs](#low-level-apis) | [Cite our work](#cite-this-work)\r\n\r\n\u003cdetails\u003e\r\n  \u003csummary\u003eLatest News 🔥\u003c/summary\u003e\r\n\r\n  - [2025/03/06] We release a joint blog post on TorchTune × Liger - [Peak Performance, Minimized Memory: Optimizing torchtune’s performance with torch.compile \u0026 Liger Kernel](https://pytorch.org/blog/peak-performance-minimized-memory/)\r\n  - [2024/12/11] We release [v0.5.0](https://github.com/linkedin/Liger-Kernel/releases/tag/v0.5.0): 80% more memory efficient post training losses (DPO, ORPO, CPO, etc)!\r\n  - [2024/12/5] We release LinkedIn Engineering Blog - [Liger-Kernel: Empowering an open source ecosystem of Triton Kernels for Efficient LLM Training](https://www.linkedin.com/blog/engineering/open-source/liger-kernel-open-source-ecosystem-for-efficient-llm-training)\r\n  - [2024/11/6] We release [v0.4.0](https://github.com/linkedin/Liger-Kernel/releases/tag/v0.4.0): Full AMD support, Tech Report, Modal CI, Llama-3.2-Vision!\r\n  - [2024/10/21] We have released the tech report of Liger Kernel on Arxiv: https://arxiv.org/pdf/2410.10989\r\n  - [2024/9/6] We release v0.2.1 ([X post](https://x.com/liger_kernel/status/1832168197002510649)). 2500+ Stars, 10+ New Contributors, 50+ PRs, 50k Downloads in two weeks!\r\n  - [2024/8/31] CUDA MODE talk, [Liger-Kernel: Real-world Triton kernel for LLM Training](https://youtu.be/gWble4FreV4?si=dxPeIchhkJ36Mbns), [Slides](https://github.com/cuda-mode/lectures?tab=readme-ov-file#lecture-28-liger-kernel)\r\n  - [2024/8/23] Official release: check out our [X post](https://x.com/hsu_byron/status/1827072737673982056)\r\n\r\n\u003c/details\u003e\r\n\r\n\r\n**Liger Kernel** is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU **training throughput by 20%** and reduces **memory usage by 60%**. We have implemented **Hugging Face Compatible** `RMSNorm`, `RoPE`, `SwiGLU`, `CrossEntropy`, `FusedLinearCrossEntropy`, and more to come. The kernel works out of the box with [Flash Attention](https://github.com/Dao-AILab/flash-attention), [PyTorch FSDP](https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html), and [Microsoft DeepSpeed](https://github.com/microsoft/DeepSpeed). We welcome contributions from the community to gather the best kernels for LLM training.\r\n\r\nWe've also added optimized Post-Training kernels that deliver **up to 80% memory savings** for alignment and distillation tasks. We support losses like DPO, CPO, ORPO, SimPO, KTO, JSD, and many more. Check out [how we optimize the memory](https://x.com/hsu_byron/status/1866577403918917655).\r\n\r\n## Supercharge Your Model with Liger Kernel\r\n\r\n![Banner](https://raw.githubusercontent.com/linkedin/Liger-Kernel/main/docs/images/banner.GIF)\r\n\r\nWith one line of code, Liger Kernel can increase throughput by more than 20% and reduce memory usage by 60%, thereby enabling longer context lengths, larger batch sizes, and massive vocabularies.\r\n\r\n\r\n| Speed Up                 | Memory Reduction        |\r\n|--------------------------|-------------------------|\r\n| ![Speed up](https://raw.githubusercontent.com/linkedin/Liger-Kernel/main/docs/images/e2e-tps.png) | ![Memory](https://raw.githubusercontent.com/linkedin/Liger-Kernel/main/docs/images/e2e-memory.png) |\r\n\r\n\u003e **Note:**\r\n\u003e - Benchmark conditions: LLaMA 3-8B, Batch Size = 8, Data Type = `bf16`, Optimizer = AdamW, Gradient Checkpointing = True, Distributed Strategy = FSDP1 on 8 A100s.\r\n\u003e - Hugging Face models start to OOM at a 4K context length, whereas Hugging Face + Liger Kernel scales up to 16K.\r\n\r\n## Optimize Post Training with Liger Kernel\r\n\r\n\u003cp align=\"center\"\u003e\r\n    \u003cimg src=\"https://raw.githubusercontent.com/linkedin/Liger-Kernel/main/docs/images/post-training.png\" width=\"50%\" alt=\"Post Training\"\u003e\r\n\u003c/p\u003e\r\n\r\nWe provide optimized post training kernels like DPO, ORPO, SimPO, and more which can reduce memory usage by up to 80%. You can easily use them as python modules.\r\n\r\n```python\r\nfrom liger_kernel.chunked_loss import LigerFusedLinearORPOLoss\r\norpo_loss = LigerFusedLinearORPOLoss()\r\ny = orpo_loss(lm_head.weight, x, target)\r\n```\r\n\r\n\r\n## Examples\r\n\r\n| **Use Case**                                    | **Description**                                                                                   |\r\n|------------------------------------------------|---------------------------------------------------------------------------------------------------|\r\n| [**Hugging Face Trainer**](https://github.com/linkedin/Liger-Kernel/tree/main/examples/huggingface)      | Train LLaMA 3-8B ~20% faster with over 40% memory reduction on Alpaca dataset using 4 A100s with FSDP |\r\n| [**Lightning Trainer**](https://github.com/linkedin/Liger-Kernel/tree/main/examples/lightning)         | Increase 15% throughput and reduce memory usage by 40% with LLaMA3-8B on MMLU dataset using 8 A100s with DeepSpeed ZeRO3 |\r\n| [**Medusa Multi-head LLM (Retraining Phase)**](https://github.com/linkedin/Liger-Kernel/tree/main/examples/medusa)        | Reduce memory usage by 80% with 5 LM heads and improve throughput by 40% using 8 A100s with FSDP |\r\n| [**Vision-Language Model SFT**](https://github.com/linkedin/Liger-Kernel/tree/main/examples/huggingface/run_qwen2_vl.sh)      | Finetune Qwen2-VL on image-text data using 4 A100s with FSDP |\r\n| [**Liger ORPO Trainer**](https://github.com/linkedin/Liger-Kernel/blob/main/examples/alignment/run_orpo.py)      | Align Llama 3.2 using Liger ORPO Trainer with FSDP with 50% memory reduction |\r\n\r\n## Key Features\r\n\r\n- **Ease of use:** Simply patch your Hugging Face model with one line of code, or compose your own model using our Liger Kernel modules.\r\n- **Time and memory efficient:** In the same spirit as Flash-Attn, but for layers like **RMSNorm**, **RoPE**, **SwiGLU**, and **CrossEntropy**! Increases multi-GPU training throughput by 20% and reduces memory usage by 60% with **kernel fusion**, **in-place replacement**, and **chunking** techniques.\r\n- **Exact:** Computation is exact—no approximations! Both forward and backward passes are implemented with rigorous unit tests and undergo convergence testing against training runs without Liger Kernel to ensure accuracy.\r\n- **Lightweight:** Liger Kernel has minimal dependencies, requiring only Torch and Triton—no extra libraries needed! Say goodbye to dependency headaches!\r\n- **Multi-GPU supported:** Compatible with multi-GPU setups (PyTorch FSDP, DeepSpeed, DDP, etc.).\r\n- **Trainer Framework Integration**: [Axolotl](https://github.com/axolotl-ai-cloud/axolotl), [LLaMa-Factory](https://github.com/hiyouga/LLaMA-Factory), [SFTTrainer](https://github.com/huggingface/trl/releases/tag/v0.10.1), [Hugging Face Trainer](https://github.com/huggingface/transformers/pull/32860), [SWIFT](https://github.com/modelscope/ms-swift), [oumi](https://github.com/oumi-ai/oumi/tree/main)\r\n\r\n## Installation\r\n\r\n### Dependencies\r\n\r\n#### CUDA\r\n\r\n- `torch \u003e= 2.1.2`\r\n- `triton \u003e= 2.3.0`\r\n\r\n#### ROCm\r\n\r\n- `torch \u003e= 2.5.0` Install according to the instruction in Pytorch official webpage.\r\n- `triton \u003e= 3.0.0` Install from pypi. (e.g. `pip install triton==3.0.0`)\r\n\r\n```bash\r\n# Need to pass the url when installing\r\npip install -e .[dev] --extra-index-url https://download.pytorch.org/whl/nightly/rocm6.2\r\n```\r\n\r\n### Optional Dependencies\r\n\r\n- `transformers \u003e= 4.x`: Required if you plan to use the transformers models patching APIs. The specific model you are working will dictate the minimum version of transformers.\r\n\r\n\u003e **Note:**\r\n\u003e Our kernels inherit the full spectrum of hardware compatibility offered by [Triton](https://github.com/triton-lang/triton).\r\n\r\nTo install the stable version:\r\n\r\n```bash\r\n$ pip install liger-kernel\r\n```\r\n\r\nTo install the nightly version:\r\n\r\n```bash\r\n$ pip install liger-kernel-nightly\r\n```\r\n\r\nTo install from source:\r\n\r\n```bash\r\ngit clone https://github.com/linkedin/Liger-Kernel.git\r\ncd Liger-Kernel\r\n\r\n# Install Default Dependencies\r\n# Setup.py will detect whether you are using AMD or NVIDIA\r\npip install -e .\r\n\r\n# Setup Development Dependencies\r\npip install -e \".[dev]\"\r\n```\r\n\r\n\r\n## Getting Started\r\n\r\nThere are a couple of ways to apply Liger kernels, depending on the level of customization required.\r\n\r\n### 1. Use AutoLigerKernelForCausalLM\r\n\r\nUsing the `AutoLigerKernelForCausalLM` is the simplest approach, as you don't have to import a model-specific patching API. If the model type is supported, the modeling code will be automatically patched using the default settings.\r\n\r\n```python\r\nfrom liger_kernel.transformers import AutoLigerKernelForCausalLM\r\n\r\n# This AutoModel wrapper class automatically monkey-patches the\r\n# model with the optimized Liger kernels if the model is supported.\r\nmodel = AutoLigerKernelForCausalLM.from_pretrained(\"path/to/some/model\")\r\n```\r\n\r\n### 2. Apply Model-Specific Patching APIs\r\n\r\nUsing the [patching APIs](#patching), you can swap Hugging Face models with optimized Liger Kernels.\r\n\r\n```python\r\nimport transformers\r\nfrom liger_kernel.transformers import apply_liger_kernel_to_llama\r\n\r\n# 1a. Adding this line automatically monkey-patches the model with the optimized Liger kernels\r\napply_liger_kernel_to_llama()\r\n\r\n# 1b. You could alternatively specify exactly which kernels are applied\r\napply_liger_kernel_to_llama(\r\n  rope=True,\r\n  swiglu=True,\r\n  cross_entropy=True,\r\n  fused_linear_cross_entropy=False,\r\n  rms_norm=False\r\n)\r\n\r\n# 2. Instantiate patched model\r\nmodel = transformers.AutoModelForCausalLM(\"path/to/llama/model\")\r\n```\r\n\r\n### 3. Compose Your Own Model\r\n\r\nYou can take individual [kernels](https://github.com/linkedin/Liger-Kernel?tab=readme-ov-file#model-kernels) to compose your models.\r\n\r\n```python\r\nfrom liger_kernel.transformers import LigerFusedLinearCrossEntropyLoss\r\nimport torch.nn as nn\r\nimport torch\r\n\r\nmodel = nn.Linear(128, 256).cuda()\r\n\r\n# fuses linear + cross entropy layers together and performs chunk-by-chunk computation to reduce memory\r\nloss_fn = LigerFusedLinearCrossEntropyLoss()\r\n\r\ninput = torch.randn(4, 128, requires_grad=True, device=\"cuda\")\r\ntarget = torch.randint(256, (4, ), device=\"cuda\")\r\n\r\nloss = loss_fn(model.weight, input, target)\r\nloss.backward()\r\n```\r\n\r\n## High-level APIs\r\n\r\n### AutoModel\r\n\r\n| **AutoModel Variant** | **API** |\r\n|-----------|---------|\r\n| AutoModelForCausalLM | `liger_kernel.transformers.AutoLigerKernelForCausalLM` |\r\n\r\n\r\n### Patching\r\n\r\n| **Model**   | **API**                                                      | **Supported Operations**                                                |\r\n|-------------|--------------------------------------------------------------|-------------------------------------------------------------------------|\r\n| LLaMA 2 \u0026 3 | `liger_kernel.transformers.apply_liger_kernel_to_llama`   | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy        |\r\n| LLaMA 3.2-Vision | `liger_kernel.transformers.apply_liger_kernel_to_mllama`   | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy        |\r\n| Mistral     | `liger_kernel.transformers.apply_liger_kernel_to_mistral`  | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy        |\r\n| Mixtral     | `liger_kernel.transformers.apply_liger_kernel_to_mixtral`  | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy        |\r\n| Gemma1      | `liger_kernel.transformers.apply_liger_kernel_to_gemma`    | RoPE, RMSNorm, GeGLU, CrossEntropyLoss, FusedLinearCrossEntropy         |\r\n| Gemma2      | `liger_kernel.transformers.apply_liger_kernel_to_gemma2`   | RoPE, RMSNorm, GeGLU, CrossEntropyLoss, FusedLinearCrossEntropy         |\r\n| Gemma3 (Text)      | `liger_kernel.transformers.apply_liger_kernel_to_gemma3_text`   | RoPE, RMSNorm, GeGLU, CrossEntropyLoss, FusedLinearCrossEntropy         |\r\n| Gemma3 (Multimodal)      | `liger_kernel.transformers.apply_liger_kernel_to_gemma3`   | LayerNorm, RoPE, RMSNorm, GeGLU, CrossEntropyLoss, FusedLinearCrossEntropy         |\r\n| Paligemma, Paligemma2, \u0026 Paligemma2 Mix      | `liger_kernel.transformers.apply_liger_kernel_to_paligemma`   | LayerNorm, RoPE, RMSNorm, GeGLU, CrossEntropyLoss, FusedLinearCrossEntropy         |\r\n| Qwen2, Qwen2.5, \u0026 QwQ      | `liger_kernel.transformers.apply_liger_kernel_to_qwen2`    | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy        |\r\n| Qwen2-VL, \u0026 QVQ       | `liger_kernel.transformers.apply_liger_kernel_to_qwen2_vl`    | RMSNorm, LayerNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy        |\r\n| Qwen2.5-VL       | `liger_kernel.transformers.apply_liger_kernel_to_qwen2_5_vl`    | RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy        |\r\n| Qwen3   | `liger_kernel.transformers.apply_liger_kernel_to_qwen3`    |  RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy       |\r\n| Qwen3 MoE | `liger_kernel_transformers.apply_liger_kernel_to_qwen3_moe` | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy       |\r\n| Phi3 \u0026 Phi3.5       | `liger_kernel.transformers.apply_liger_kernel_to_phi3`     | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy         |\r\n| Granite 3.0 \u0026 3.1   | `liger_kernel.transformers.apply_liger_kernel_to_granite`     | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss |\r\n| OLMo2   | `liger_kernel.transformers.apply_liger_kernel_to_olmo2`     | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy |\r\n| GLM-4   | `liger_kernel.transformers.apply_liger_kernel_to_glm4`     | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy |\r\n\r\n\r\n## Low-level APIs\r\n\r\n- `Fused Linear` kernels combine linear layers with losses, reducing memory usage by up to 80% - ideal for HBM-constrained workloads.\r\n- Other kernels use fusion and in-place techniques for memory and performance optimization.\r\n\r\n### Model Kernels\r\n\r\n| **Kernel**                      | **API**                                                     |\r\n|---------------------------------|-------------------------------------------------------------|\r\n| RMSNorm                         | `liger_kernel.transformers.LigerRMSNorm`                    |\r\n| LayerNorm                       | `liger_kernel.transformers.LigerLayerNorm`                  |\r\n| RoPE                            | `liger_kernel.transformers.liger_rotary_pos_emb`            |\r\n| SwiGLU                          | `liger_kernel.transformers.LigerSwiGLUMLP`                  |\r\n| GeGLU                           | `liger_kernel.transformers.LigerGEGLUMLP`                   |\r\n| CrossEntropy                    | `liger_kernel.transformers.LigerCrossEntropyLoss`           |\r\n| Fused Linear CrossEntropy         | `liger_kernel.transformers.LigerFusedLinearCrossEntropyLoss`|\r\n\r\n\r\n### Alignment Kernels\r\n\r\n| **Kernel**                      | **API**                                                     |\r\n|---------------------------------|-------------------------------------------------------------|\r\n| Fused Linear CPO Loss           | `liger_kernel.chunked_loss.LigerFusedLinearCPOLoss`       |\r\n| Fused Linear DPO Loss           | `liger_kernel.chunked_loss.LigerFusedLinearDPOLoss`       |\r\n| Fused Linear ORPO Loss          | `liger_kernel.chunked_loss.LigerFusedLinearORPOLoss`      |\r\n| Fused Linear SimPO Loss         | `liger_kernel.chunked_loss.LigerFusedLinearSimPOLoss`     |\r\n| Fused Linear KTO Loss           | `liger_kernel.chunked_loss.LigerFusedLinearKTOLoss`     |\r\n\r\n### Distillation Kernels\r\n\r\n| **Kernel**                      | **API**                                                     |\r\n|---------------------------------|-------------------------------------------------------------|\r\n| KLDivergence                    | `liger_kernel.transformers.LigerKLDIVLoss`                  |\r\n| JSD                             | `liger_kernel.transformers.LigerJSD`                        |\r\n| Fused Linear JSD                  | `liger_kernel.transformers.LigerFusedLinearJSD`             |\r\n| TVD                             | `liger_kernel.transformers.LigerTVDLoss`                    |\r\n\r\n### Experimental Kernels\r\n\r\n| **Kernel**                      | **API**                                                     |\r\n|---------------------------------|-------------------------------------------------------------|\r\n| Embedding                       | `liger_kernel.transformers.experimental.LigerEmbedding`     |\r\n| Matmul int2xint8                | `liger_kernel.transformers.experimental.matmul` |\r\n\r\n\r\n## Contributing, Acknowledgements, and License\r\n\r\n- [Contributing Guidelines](https://github.com/linkedin/Liger-Kernel/blob/main/docs/contributing.md)\r\n- [Acknowledgements](https://github.com/linkedin/Liger-Kernel/blob/main/docs/acknowledgement.md)\r\n- [License Information](https://github.com/linkedin/Liger-Kernel/blob/main/docs/license.md)\r\n\r\n## Sponsorship and Collaboration\r\n\r\n- [Glows.ai](https://platform.glows.ai/): Sponsoring NVIDIA GPUs for our open source developers.\r\n- [AMD](https://www.amd.com/en.html): Providing AMD GPUs for our AMD CI.\r\n- [Intel](https://www.intel.com/): Providing Intel GPUs for our Intel CI.\r\n- [Modal](https://modal.com/): Free 3000 credits from GPU MODE IRL for our NVIDIA CI.\r\n- [EmbeddedLLM](https://embeddedllm.com/): Making Liger Kernel run fast and stable on AMD.\r\n- [HuggingFace](https://huggingface.co/): Integrating Liger Kernel into Hugging Face Transformers and TRL.\r\n- [Lightning AI](https://lightning.ai/): Integrating Liger Kernel into Lightning Thunder.\r\n- [Axolotl](https://axolotl.ai/): Integrating Liger Kernel into Axolotl.\r\n- [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory): Integrating Liger Kernel into Llama-Factory.\r\n\r\n\r\n## CI status\r\n\r\n\u003ctable style=\"width: 100%; text-align: center; border-collapse: collapse;\"\u003e\r\n    \u003ctr\u003e\r\n        \u003cth style=\"padding: 10px;\"\u003eBuild\u003c/th\u003e\r\n    \u003c/tr\u003e\r\n    \u003ctr\u003e\r\n        \u003ctd style=\"padding: 10px;\"\u003e\r\n            \u003cdiv style=\"display: block;\"\u003e\r\n                \u003ca href=\"https://github.com/linkedin/Liger-Kernel/actions/workflows/nvi-ci.yml\"\u003e\r\n                    \u003cimg src=\"https://github.com/linkedin/Liger-Kernel/actions/workflows/nvi-ci.yml/badge.svg?event=schedule\" alt=\"Build\"\u003e\r\n                \u003c/a\u003e\r\n            \u003c/div\u003e\r\n            \u003cdiv style=\"display: block;\"\u003e\r\n                \u003ca href=\"https://github.com/linkedin/Liger-Kernel/actions/workflows/amd-ci.yml\"\u003e\r\n                    \u003cimg src=\"https://github.com/linkedin/Liger-Kernel/actions/workflows/amd-ci.yml/badge.svg?event=schedule\" alt=\"Build\"\u003e\r\n                \u003c/a\u003e\r\n            \u003c/div\u003e\r\n            \u003cdiv style=\"display: block;\"\u003e\r\n                \u003ca href=\"https://github.com/linkedin/Liger-Kernel/actions/workflows/amd-ci.yml\"\u003e\r\n                    \u003cimg src=\"https://github.com/linkedin/Liger-Kernel/actions/workflows/intel-ci.yml/badge.svg?event=schedule\" alt=\"Build\"\u003e\r\n                \u003c/a\u003e\r\n            \u003c/div\u003e\r\n        \u003c/td\u003e\r\n    \u003c/tr\u003e\r\n\u003c/table\u003e\r\n\r\n\r\n\r\n## Contact\r\n\r\n- For issues, create a Github ticket in this repository\r\n- For open discussion, join [our discord channel on GPUMode](https://discord.com/channels/1189498204333543425/1275130785933951039)\r\n- For formal collaboration, send an email to yannchen@linkedin.com and hning@linkedin.com\r\n\r\n## Cite this work\r\n\r\nBiblatex entry:\r\n```bib\r\n@article{hsu2024ligerkernelefficienttriton,\r\n      title={Liger Kernel: Efficient Triton Kernels for LLM Training},\r\n      author={Pin-Lun Hsu and Yun Dai and Vignesh Kothapalli and Qingquan Song and Shao Tang and Siyu Zhu and Steven Shimizu and Shivam Sahni and Haowen Ning and Yanning Chen},\r\n      year={2024},\r\n      eprint={2410.10989},\r\n      archivePrefix={arXiv},\r\n      primaryClass={cs.LG},\r\n      url={https://arxiv.org/abs/2410.10989},\r\n      journal={arXiv preprint arXiv:2410.10989},\r\n}\r\n```\r\n\r\n## Star History\r\n[![Star History Chart](https://api.star-history.com/svg?repos=linkedin/Liger-Kernel\u0026type=Date)](https://www.star-history.com/#linkedin/Liger-Kernel\u0026Date)\r\n\r\n\u003cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\"\u003e\r\n    \u003ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\"\u003e\r\n        ↑ Back to Top ↑\r\n    \u003c/a\u003e\r\n\u003c/p\u003e\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinkedin%2Fliger-kernel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinkedin%2Fliger-kernel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinkedin%2Fliger-kernel/lists"}