https://github.com/servicenow/fast-llm
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
https://github.com/servicenow/fast-llm
Last synced: 5 months ago
JSON representation
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
- Host: GitHub
- URL: https://github.com/servicenow/fast-llm
- Owner: ServiceNow
- License: other
- Created: 2024-10-11T18:09:30.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-10-02T23:16:12.000Z (5 months ago)
- Last Synced: 2025-10-03T01:21:44.740Z (5 months ago)
- Language: Python
- Homepage: https://servicenow.github.io/Fast-LLM/
- Size: 11.8 MB
- Stars: 237
- Watchers: 19
- Forks: 36
- Open Issues: 70
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
- Security: SECURITY.md
Awesome Lists containing this project
README

[![Docker][ci-badge]][ci-workflow]
[![Documentation][docs-badge]][docs-workflow]
[![License][license-badge]][license]
*Accelerating your LLM training to full speed*
Made with ❤️ by [ServiceNow Research][servicenow-research]
## Overview
Fast-LLM is a cutting-edge open-source library for training large language models with exceptional speed, scalability, and flexibility. Built on [PyTorch][pytorch] and [Triton][triton], Fast-LLM empowers AI teams to push the limits of generative AI, from research to production.
Optimized for training models of all sizes—from small 1B-parameter models to massive clusters with 70B+ parameters—Fast-LLM delivers faster training, lower costs, and seamless scalability. Its fine-tuned kernels, advanced parallelism techniques, and efficient memory management make it the go-to choice for diverse training needs.
As a truly open-source project, Fast-LLM allows full customization and extension without proprietary restrictions. Developed transparently by a community of professionals on GitHub, the library benefits from collaborative innovation, with every change discussed and reviewed in the open to ensure trust and quality. Fast-LLM combines professional-grade tools with unified support for GPT-like architectures, offering the cost efficiency and flexibility that serious AI practitioners demand.
> [!NOTE]
> Fast-LLM is not affiliated with Fast.AI, FastHTML, FastAPI, FastText, or other similarly named projects. Our library's name refers to its speed and efficiency in language model training.
## Why Fast-LLM?
1. 🚀 **Fast-LLM is Blazingly Fast**:
- ⚡️ Optimized kernel efficiency and reduced overheads.
- 🔋 Optimized memory usage for best performance.
- ⏳ Minimizes training time and cost.
2. 📈 **Fast-LLM is Highly Scalable**:
- 📡 Distributed training across multiple GPUs and nodes using 3D parallelism (Data, Tensor, and Pipeline).
- 🔗 Supports sequence length parallelism to handle longer sequences effectively.
- 🧠 ZeRO-1, ZeRO-2, and ZeRO-3 implementations for improved memory efficiency.
- 🎛️ Mixed precision training support for better performance.
- 🏋️♂️ Large batch training and gradient accumulation support.
- 🔄 Reproducible training with deterministic behavior.
3. 🎨 **Fast-LLM is Incredibly Flexible**:
- 🤖 Compatible with all common language model architectures in a unified class.
- ⚡ Efficient dropless Mixture-of-Experts (MoE) implementation with SoTA performance.
- 🧩 Customizable language model architectures, data loaders, loss functions, and optimizers (in progress).
- 🤗 Seamless integration with [Hugging Face Transformers][transformers].
4. 🎯 **Fast-LLM is Super Easy to Use**:
- 📦 [Pre-built Docker images](https://github.com/ServiceNow/Fast-LLM/pkgs/container/fast-llm) for quick deployment.
- 📝 Simple YAML configuration for hassle-free setup.
- 💻 Command-line interface for easy launches.
- 📊 Detailed logging and real-time monitoring features.
- 📚 Extensive [documentation][docs] and practical tutorials (in progress).
5. 🌐 **Fast-LLM is Truly Open Source**:
- ⚖️ Licensed under [Apache 2.0][license] for maximum freedom to use Fast-LLM at work, in your projects, or for research.
- 💻 Transparently developed on GitHub with public [roadmap][roadmap] and [issue tracking][issues].
- 🤝 Contributions and collaboration are always welcome!
## Usage
We'll walk you through how to use Fast-LLM to train a large language model on a cluster with multiple nodes and GPUs. We'll show an example setup using a Slurm cluster and a Kubernetes cluster.
For this demo, we will train a Mistral-7B model from scratch for 100 steps on random data. The config file `examples/mistral-4-node-benchmark.yaml` is pre-configured for a multi-node setup with 4 DGX nodes, each with 8 A100-80GB or H100-80GB GPUs.
> [!NOTE]
> Fast-LLM scales from a single GPU to large clusters. You can start small and expand based on your resources.
Expect to see a significant speedup in training time compared to other libraries! For training Mistral-7B, Fast-LLM is expected to achieve a throughput of **9,800 tokens/s/H100** (batch size 32, sequence length 8k) on a 4-node cluster with 32 H100s.
### Running Fast-LLM on a Slurm Cluster
#### Prerequisites
- A [Slurm](https://slurm.schedmd.com/) cluster with at least 4 DGX nodes with 8 A100-80GB or H100-80GB GPUs each.
- CUDA 12.1 or higher.
- Dependencies: [PyTorch][pytorch], [Triton][triton], and [Apex](https://github.com/NVIDIA/apex) installed on all nodes.
#### Steps
1. Deploy the [nvcr.io/nvidia/pytorch:24.07-py3](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) Docker image to all nodes (recommended), because it contains all the necessary dependencies.
2. Install Fast-LLM on all nodes:
```bash
sbatch <