An open API service indexing awesome lists of open source software.

https://github.com/pavansomisetty21/supervised-fine-tuning-of-gpt-oss-20b-on-openai-s-gsm8k-reasoning-with-lora

In this we finetune GPT-OSS-20B on OpenAI's gsm8k dataset
https://github.com/pavansomisetty21/supervised-fine-tuning-of-gpt-oss-20b-on-openai-s-gsm8k-reasoning-with-lora

finetuning gpt-oss gpt-oss-20b gsm8k unsloth

Last synced: 27 days ago
JSON representation

In this we finetune GPT-OSS-20B on OpenAI's gsm8k dataset

Awesome Lists containing this project

README

          

# Supervised-Fine-Tuning-of-GPT-OSS-20B-on-OpenAI-s-gsm8k-reasoning-with-LoRA

## ๐Ÿ“Œ Overview
This project demonstrates **efficient supervised fine-tuning** of the [GPT-OSS-20B](https://huggingface.co/unsloth/gpt-oss-20b) model using the [OpenAI GSM8K dataset](https://huggingface.co/datasets/openai/gsm8k).
We leverage **LoRA (Low-Rank Adaptation)** with [Unsloth](https://github.com/unslothai/unsloth) to make fine-tuning large models practical on limited GPU resources.

The goal is to enhance **mathematical reasoning and step-by-step problem solving** in large language models, without requiring full-scale retraining.

---

## โœจ Features
- ๐Ÿงฎ **Dataset**: GSM8K โ€“ 7.4k high-quality grade-school math problems
- โšก **Model**: GPT-OSS-20B with 4-bit quantization for memory efficiency
- ๐Ÿ”ง **Fine-tuning**: LoRA adapters applied to key transformer layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`)
- ๐Ÿ› ๏ธ **Training Framework**: TRL SFTTrainer + Hugging Face Datasets
- ๐Ÿ”‹ **Memory Optimization**: Gradient checkpointing (`unsloth` mode) and 8-bit optimizer
- ๐ŸŽฏ **Objective**: Supervised Fine-Tuning (SFT) for improved reasoning and problem-solving

---

## ๐Ÿ“– Notebook Walkthrough

### 1. Importing Libraries & Model Setup

* Import **PyTorch** and **Unsloth** to handle model loading and optimization.
* Load **GPT-OSS-20B** with:

* 4-bit quantization (saves memory)
* max sequence length = 1024 tokens
* LoRA adapters applied on key transformer layers (`q_proj`, `k_proj`, `v_proj`, etc.)
* Gradient checkpointing (`unsloth`) for efficient training.

### 2. Loading the GSM8K Dataset

* Load **GSM8K train split** (7.4k math word problems).
* Convert data into **ShareGPT-style conversations** (`user` โ†’ question, `assistant` โ†’ answer).
* Apply the modelโ€™s **chat template** to convert into proper training text format.

### 3. Training Setup with TRL

* Use **TRLโ€™s SFTTrainer** for supervised fine-tuning.
* Configure training with:

* Batch size = 1 (with gradient accumulation)
* Optimizer = AdamW (8-bit)
* Learning rate = 2e-4
* Training steps = 30 (demo run)
* Run training with `trainer.train()`.

### 4. Saving the Model

* Save fine-tuned model + tokenizer into `outputs/` folder.
* Print training logs (loss, metrics).

### 5. Running Inference

* Format a user query with `apply_chat_template`.
* Generate a response with sampling (`temperature=0.7`, `top_p=0.9`).
* Decode and print the modelโ€™s prediction.

โœ… In summary:

1. **Load GPT-OSS-20B + LoRA**
2. **Prepare GSM8K dataset in chat format**
3. **Fine-tune using SFTTrainer**
4. **Save model & tokenizer**
5. **Run inference on math questions**

---

## ๐Ÿ“– Notebook Walkthrough

### 1. Importing Libraries & Model Setup

```python
import torch
from unsloth import FastLanguageModel

max_seq_length = 1024
dtype = None

model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/gpt-oss-20b",
dtype=dtype,
max_seq_length=max_seq_length,
load_in_4bit=True,
full_finetuning=False,
)

model = FastLanguageModel.get_peft_model(
model,
r=8,
target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
random_state=3407,
)
```

---

### 2. Loading the GSM8K Dataset

We load the **GSM8K** dataset and convert it into ShareGPT conversation format.

```python
from datasets import load_dataset
from unsloth.chat_templates import standardize_sharegpt

# Load dataset
ds = load_dataset("openai/gsm8k", "main")["train"]

# Convert format
def convert_to_sharegpt(example):
return {
"conversations": [
{"from": "user", "value": example["question"]},
{"from": "assistant", "value": example["answer"]}
]
}

ds = ds.map(convert_to_sharegpt)
ds = standardize_sharegpt(ds)

# Apply tokenizer template
def formatting_prompts_func(examples):
texts = [
tokenizer.apply_chat_template(
convo, tokenize=False, add_generation_prompt=False
)
for convo in examples["conversations"]
]
return {"text": texts}

ds = ds.map(formatting_prompts_func, batched=True)
```

---

### 3. Training Setup with TRL

We configure supervised fine-tuning with **TRLโ€™s SFTTrainer**.

```python
from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=ds,
args=SFTConfig(
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
warmup_steps=5,
max_steps=30,
learning_rate=2e-4,
logging_steps=1,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
output_dir="outputs",
report_to="none",
),
)

# Start training
train_result = trainer.train()
```

---

### 4. Saving the Model

We save the fine-tuned model and tokenizer into the `outputs/` directory.

```python
# Save model and tokenizer
trainer.save_model("outputs")
tokenizer.save_pretrained("outputs")

# Optional: print metrics
metrics = trainer.state.log_history
print(metrics)
```

---

### 5. Running Inference

We test the fine-tuned model with a sample math reasoning question.

```python
messages = [
{"role": "user", "content": "If you have 3 apples and eat 1, how many remain?"}
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to(model.device)

outputs = model.generate(
input_ids,
max_new_tokens=100,
temperature=0.7,
top_p=0.9
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

---

โœ… **Summary:**

1. Load GPT-OSS-20B + LoRA
2. Prepare GSM8K dataset in chat format
3. Fine-tune using SFTTrainer
4. Save model & tokenizer
5. Run inference on math questions

## ๐Ÿค– Inference

Example usage after training:

```python
from transformers import AutoTokenizer
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained("outputs", load_in_4bit=True)

messages = [
{"role": "user", "content": "If you have 3 apples and eat 1, how many remain?"}
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to(model.device)

outputs = model.generate(
input_ids,
max_new_tokens=100,
temperature=0.7,
top_p=0.9
)

response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print("Model:", response)
```

---

## ๐Ÿงฉ Key Takeaways

* **LoRA + Quantization = Training 20B models on single GPUs**
* **GSM8K is an excellent benchmark** for reasoning-focused finetuning
* **Unsloth greatly simplifies efficient fine-tuning** with minimal code changes

---

## ๐Ÿ™Œ Acknowledgements

* [Unsloth](https://github.com/unslothai/unsloth) for efficient LLM training
* [Hugging Face](https://huggingface.co/) ecosystem for datasets + transformers
* [OpenAI GSM8K dataset](https://huggingface.co/datasets/openai/gsm8k) for high-quality math reasoning tasks

## ๐Ÿ“Š Results & Next Steps

โœ… Successfully fine-tuned GPT-OSS-20B on GSM8K using LoRA.
๐Ÿ“ˆ Expected improvements in **step-by-step reasoning** for math problems.
๐Ÿ”œ Future Work:

* Train for full epochs instead of demo steps
* Evaluate on GSM8K test set
* Experiment with higher LoRA ranks (`r=16` or `32`)
* Compare with baseline models (GPT-3.5, LLaMA, etc.)

---