https://github.com/pavansomisetty21/supervised-fine-tuning-of-gpt-oss-20b-on-openai-s-gsm8k-reasoning-with-lora
In this we finetune GPT-OSS-20B on OpenAI's gsm8k dataset
https://github.com/pavansomisetty21/supervised-fine-tuning-of-gpt-oss-20b-on-openai-s-gsm8k-reasoning-with-lora
finetuning gpt-oss gpt-oss-20b gsm8k unsloth
Last synced: 27 days ago
JSON representation
In this we finetune GPT-OSS-20B on OpenAI's gsm8k dataset
- Host: GitHub
- URL: https://github.com/pavansomisetty21/supervised-fine-tuning-of-gpt-oss-20b-on-openai-s-gsm8k-reasoning-with-lora
- Owner: Pavansomisetty21
- License: mit
- Created: 2025-08-23T06:12:03.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2025-08-23T06:57:10.000Z (about 2 months ago)
- Last Synced: 2025-08-24T09:33:20.724Z (about 2 months ago)
- Topics: finetuning, gpt-oss, gpt-oss-20b, gsm8k, unsloth
- Language: Jupyter Notebook
- Homepage:
- Size: 30.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Supervised-Fine-Tuning-of-GPT-OSS-20B-on-OpenAI-s-gsm8k-reasoning-with-LoRA
## ๐ Overview
This project demonstrates **efficient supervised fine-tuning** of the [GPT-OSS-20B](https://huggingface.co/unsloth/gpt-oss-20b) model using the [OpenAI GSM8K dataset](https://huggingface.co/datasets/openai/gsm8k).
We leverage **LoRA (Low-Rank Adaptation)** with [Unsloth](https://github.com/unslothai/unsloth) to make fine-tuning large models practical on limited GPU resources.The goal is to enhance **mathematical reasoning and step-by-step problem solving** in large language models, without requiring full-scale retraining.
---
## โจ Features
- ๐งฎ **Dataset**: GSM8K โ 7.4k high-quality grade-school math problems
- โก **Model**: GPT-OSS-20B with 4-bit quantization for memory efficiency
- ๐ง **Fine-tuning**: LoRA adapters applied to key transformer layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`)
- ๐ ๏ธ **Training Framework**: TRL SFTTrainer + Hugging Face Datasets
- ๐ **Memory Optimization**: Gradient checkpointing (`unsloth` mode) and 8-bit optimizer
- ๐ฏ **Objective**: Supervised Fine-Tuning (SFT) for improved reasoning and problem-solving---
## ๐ Notebook Walkthrough
### 1. Importing Libraries & Model Setup
* Import **PyTorch** and **Unsloth** to handle model loading and optimization.
* Load **GPT-OSS-20B** with:* 4-bit quantization (saves memory)
* max sequence length = 1024 tokens
* LoRA adapters applied on key transformer layers (`q_proj`, `k_proj`, `v_proj`, etc.)
* Gradient checkpointing (`unsloth`) for efficient training.### 2. Loading the GSM8K Dataset
* Load **GSM8K train split** (7.4k math word problems).
* Convert data into **ShareGPT-style conversations** (`user` โ question, `assistant` โ answer).
* Apply the modelโs **chat template** to convert into proper training text format.### 3. Training Setup with TRL
* Use **TRLโs SFTTrainer** for supervised fine-tuning.
* Configure training with:* Batch size = 1 (with gradient accumulation)
* Optimizer = AdamW (8-bit)
* Learning rate = 2e-4
* Training steps = 30 (demo run)
* Run training with `trainer.train()`.### 4. Saving the Model
* Save fine-tuned model + tokenizer into `outputs/` folder.
* Print training logs (loss, metrics).### 5. Running Inference
* Format a user query with `apply_chat_template`.
* Generate a response with sampling (`temperature=0.7`, `top_p=0.9`).
* Decode and print the modelโs prediction.โ In summary:
1. **Load GPT-OSS-20B + LoRA**
2. **Prepare GSM8K dataset in chat format**
3. **Fine-tune using SFTTrainer**
4. **Save model & tokenizer**
5. **Run inference on math questions**---
## ๐ Notebook Walkthrough
### 1. Importing Libraries & Model Setup
```python
import torch
from unsloth import FastLanguageModelmax_seq_length = 1024
dtype = Nonemodel, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/gpt-oss-20b",
dtype=dtype,
max_seq_length=max_seq_length,
load_in_4bit=True,
full_finetuning=False,
)model = FastLanguageModel.get_peft_model(
model,
r=8,
target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
random_state=3407,
)
```---
### 2. Loading the GSM8K Dataset
We load the **GSM8K** dataset and convert it into ShareGPT conversation format.
```python
from datasets import load_dataset
from unsloth.chat_templates import standardize_sharegpt# Load dataset
ds = load_dataset("openai/gsm8k", "main")["train"]# Convert format
def convert_to_sharegpt(example):
return {
"conversations": [
{"from": "user", "value": example["question"]},
{"from": "assistant", "value": example["answer"]}
]
}ds = ds.map(convert_to_sharegpt)
ds = standardize_sharegpt(ds)# Apply tokenizer template
def formatting_prompts_func(examples):
texts = [
tokenizer.apply_chat_template(
convo, tokenize=False, add_generation_prompt=False
)
for convo in examples["conversations"]
]
return {"text": texts}ds = ds.map(formatting_prompts_func, batched=True)
```---
### 3. Training Setup with TRL
We configure supervised fine-tuning with **TRLโs SFTTrainer**.
```python
from trl import SFTConfig, SFTTrainertrainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=ds,
args=SFTConfig(
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
warmup_steps=5,
max_steps=30,
learning_rate=2e-4,
logging_steps=1,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
output_dir="outputs",
report_to="none",
),
)# Start training
train_result = trainer.train()
```---
### 4. Saving the Model
We save the fine-tuned model and tokenizer into the `outputs/` directory.
```python
# Save model and tokenizer
trainer.save_model("outputs")
tokenizer.save_pretrained("outputs")# Optional: print metrics
metrics = trainer.state.log_history
print(metrics)
```---
### 5. Running Inference
We test the fine-tuned model with a sample math reasoning question.
```python
messages = [
{"role": "user", "content": "If you have 3 apples and eat 1, how many remain?"}
]input_ids = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=100,
temperature=0.7,
top_p=0.9
)response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```---
โ **Summary:**
1. Load GPT-OSS-20B + LoRA
2. Prepare GSM8K dataset in chat format
3. Fine-tune using SFTTrainer
4. Save model & tokenizer
5. Run inference on math questions## ๐ค Inference
Example usage after training:
```python
from transformers import AutoTokenizer
from unsloth import FastLanguageModelmodel, tokenizer = FastLanguageModel.from_pretrained("outputs", load_in_4bit=True)
messages = [
{"role": "user", "content": "If you have 3 apples and eat 1, how many remain?"}
]input_ids = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=100,
temperature=0.7,
top_p=0.9
)response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print("Model:", response)
```---
## ๐งฉ Key Takeaways
* **LoRA + Quantization = Training 20B models on single GPUs**
* **GSM8K is an excellent benchmark** for reasoning-focused finetuning
* **Unsloth greatly simplifies efficient fine-tuning** with minimal code changes---
## ๐ Acknowledgements
* [Unsloth](https://github.com/unslothai/unsloth) for efficient LLM training
* [Hugging Face](https://huggingface.co/) ecosystem for datasets + transformers
* [OpenAI GSM8K dataset](https://huggingface.co/datasets/openai/gsm8k) for high-quality math reasoning tasks## ๐ Results & Next Steps
โ Successfully fine-tuned GPT-OSS-20B on GSM8K using LoRA.
๐ Expected improvements in **step-by-step reasoning** for math problems.
๐ Future Work:* Train for full epochs instead of demo steps
* Evaluate on GSM8K test set
* Experiment with higher LoRA ranks (`r=16` or `32`)
* Compare with baseline models (GPT-3.5, LLaMA, etc.)---