https://github.com/huggingface/gpt-oss-recipes
Collection of scripts and notebooks for OpenAI's latest GPT OSS models
https://github.com/huggingface/gpt-oss-recipes
Last synced: 8 months ago
JSON representation
Collection of scripts and notebooks for OpenAI's latest GPT OSS models
- Host: GitHub
- URL: https://github.com/huggingface/gpt-oss-recipes
- Owner: huggingface
- License: apache-2.0
- Created: 2025-08-01T12:35:15.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-08-25T17:06:08.000Z (10 months ago)
- Last Synced: 2025-10-12T15:05:46.443Z (8 months ago)
- Language: Jupyter Notebook
- Size: 56.6 KB
- Stars: 454
- Watchers: 7
- Forks: 48
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-gpt-oss - Collection of Hugging Face examples
README
# OpenAI GPT-OSS Recipes

Collection of scripts demonstrating different optimization and fine-tuning techniques for OpenAI's GPT-OSS models (20B and 120B parameters).
**Resources**
- [Blog - Welcome GPT-OSS: the new open-source model family from OpenAI](https://huggingface.co/blog/welcome-openai-gpt-oss)
- [Cookbook - Fine-tuning with GPT-OSS and Hugging Face](https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers)
- [OpenAI GPT-OSS 20B model](https://huggingface.co/openai/gpt-oss-20b)
- [OpenAI GPT-OSS 120B model](https://huggingface.co/openai/gpt-oss-120b)
- [Release collection on Hugging Face](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4)
## Scripts
- `generate_tp.py` - Model with Tensor Parallelism.
- `generate_flash_attention.py` - Model with Flash Attention + Tensor Parallelism.
- `generate_tp_continuous_batching.py` - Model with Flash Attention + Tensor Parallelism and Continuous Batching.
- `generate_all.py` - Model with all optimizations: Expert Parallelism, Tensor Parallelism, Flash Attention.
- `sft.py` - Script for fine-tuning the model using supervised fine-tuning (SFT). Supports both full-parameter training and LoRA training.
### Model Configuration
All generation scripts support both 20B and 120B models. To switch between model sizes, simply edit the `model_path` variable at the top of each script:
```python
# Model configuration - uncomment the model size you want to use
model_path = "openai/gpt-oss-120b" # 120B model (default)
# model_path = "openai/gpt-oss-20b" # 20B model - uncomment this line and comment the line above
```
The scripts automatically configure the appropriate device mapping and settings based on the selected model size.
## Installation
First create a virtual environment using e.g. `uv`:
```sh
uv venv gpt-oss --python 3.11 && source gpt-oss/bin/activate && uv pip install --upgrade pip
```
Next install PyTorch and Triton kernels:
```sh
uv pip install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128
```
If your hardware supports the MXFP4 quantization format, you can also install Triton kernels for optimized performance:
```sh
uv pip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels
```
Finall install the remaining dependencies:
```sh
uv pip install -r requirements.txt
```
## Usage
### Inference
> [!IMPORTANT]
> Before running any script, edit the `model_path` variable to select your desired model size (20B or 120B).
Run a generation script:
```bash
python generate_.py
```
or for distributed:
```bash
torchrun --nproc_per_node=x generate_.py
```
### Training
For full-parameter training on one node of 8 GPUs, run:
```bash
# Eager attention
accelerate launch --config_file configs/zero3.yaml sft.py --config configs/sft_full.yaml
# FlashAttention3
accelerate launch --config_file configs/zero3.yaml sft.py --config configs/sft_full.yaml --attn_implementation kernels-community/vllm-flash-attn3
```
For LoRA training on one GPU, run:
```bash
python sft.py --config configs/sft_lora.yaml
```
To change the dataset or training hyperparameters, either modify the `sft_lora.yaml` or `sft_full.yaml` files or pass them as command line arguments e.g.:
```bash
accelerate launch --config_file configs/zero3.yaml \
sft.py --config configs/sft_full.yaml \
--dataset_name DATASET_NAME
```