https://github.com/pavansomisetty21/supervised-fine-tuning-of-gpt-oss-20b-on-openai-s-gsm8k-reasoning-with-lora

In this we finetune GPT-OSS-20B on OpenAI's gsm8k dataset
https://github.com/pavansomisetty21/supervised-fine-tuning-of-gpt-oss-20b-on-openai-s-gsm8k-reasoning-with-lora

finetuning gpt-oss gpt-oss-20b gsm8k unsloth

Last synced: 27 days ago
JSON representation

In this we finetune GPT-OSS-20B on OpenAI's gsm8k dataset

Host: GitHub
URL: https://github.com/pavansomisetty21/supervised-fine-tuning-of-gpt-oss-20b-on-openai-s-gsm8k-reasoning-with-lora
Owner: Pavansomisetty21
License: mit
Created: 2025-08-23T06:12:03.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2025-08-23T06:57:10.000Z (about 2 months ago)
Last Synced: 2025-08-24T09:33:20.724Z (about 2 months ago)
Topics: finetuning, gpt-oss, gpt-oss-20b, gsm8k, unsloth
Language: Jupyter Notebook
Homepage:
Size: 30.3 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          
# Supervised-Fine-Tuning-of-GPT-OSS-20B-on-OpenAI-s-gsm8k-reasoning-with-LoRA

## 📌 Overview

This project demonstrates **efficient supervised fine-tuning** of the [GPT-OSS-20B](https://huggingface.co/unsloth/gpt-oss-20b) model using the [OpenAI GSM8K dataset](https://huggingface.co/datasets/openai/gsm8k).

We leverage **LoRA (Low-Rank Adaptation)** with [Unsloth](https://github.com/unslothai/unsloth) to make fine-tuning large models practical on limited GPU resources.

The goal is to enhance **mathematical reasoning and step-by-step problem solving** in large language models, without requiring full-scale retraining.

---

## ✨ Features

- 🧮 **Dataset**: GSM8K – 7.4k high-quality grade-school math problems

- ⚡ **Model**: GPT-OSS-20B with 4-bit quantization for memory efficiency

- 🔧 **Fine-tuning**: LoRA adapters applied to key transformer layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`)

- 🛠️ **Training Framework**: TRL SFTTrainer + Hugging Face Datasets

- 🔋 **Memory Optimization**: Gradient checkpointing (`unsloth` mode) and 8-bit optimizer

- 🎯 **Objective**: Supervised Fine-Tuning (SFT) for improved reasoning and problem-solving

---

## 📖 Notebook Walkthrough

### 1. Importing Libraries & Model Setup

* Import **PyTorch** and **Unsloth** to handle model loading and optimization.

* Load **GPT-OSS-20B** with:

  * 4-bit quantization (saves memory)

  * max sequence length = 1024 tokens

  * LoRA adapters applied on key transformer layers (`q_proj`, `k_proj`, `v_proj`, etc.)

  * Gradient checkpointing (`unsloth`) for efficient training.

### 2. Loading the GSM8K Dataset

* Load **GSM8K train split** (7.4k math word problems).

* Convert data into **ShareGPT-style conversations** (`user` → question, `assistant` → answer).

* Apply the model’s **chat template** to convert into proper training text format.

### 3. Training Setup with TRL

* Use **TRL’s SFTTrainer** for supervised fine-tuning.

* Configure training with:

  * Batch size = 1 (with gradient accumulation)

  * Optimizer = AdamW (8-bit)

  * Learning rate = 2e-4

  * Training steps = 30 (demo run)

* Run training with `trainer.train()`.

### 4. Saving the Model

* Save fine-tuned model + tokenizer into `outputs/` folder.

* Print training logs (loss, metrics).

### 5. Running Inference

* Format a user query with `apply_chat_template`.

* Generate a response with sampling (`temperature=0.7`, `top_p=0.9`).

* Decode and print the model’s prediction.

✅ In summary:

1. **Load GPT-OSS-20B + LoRA**

2. **Prepare GSM8K dataset in chat format**

3. **Fine-tune using SFTTrainer**

4. **Save model & tokenizer**

5. **Run inference on math questions**

---

## 📖 Notebook Walkthrough

### 1. Importing Libraries & Model Setup

```python

import torch

from unsloth import FastLanguageModel

max_seq_length = 1024

dtype = None

model, tokenizer = FastLanguageModel.from_pretrained(

    model_name="unsloth/gpt-oss-20b",

    dtype=dtype,

    max_seq_length=max_seq_length,

    load_in_4bit=True,

    full_finetuning=False,

)

model = FastLanguageModel.get_peft_model(

    model,

    r=8,

    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],

    lora_alpha=16,

    lora_dropout=0,

    bias="none",

    use_gradient_checkpointing="unsloth",

    random_state=3407,

)

```

---

### 2. Loading the GSM8K Dataset

We load the **GSM8K** dataset and convert it into ShareGPT conversation format.

```python

from datasets import load_dataset

from unsloth.chat_templates import standardize_sharegpt

# Load dataset

ds = load_dataset("openai/gsm8k", "main")["train"]

# Convert format

def convert_to_sharegpt(example):

    return {

        "conversations": [

            {"from": "user", "value": example["question"]},

            {"from": "assistant", "value": example["answer"]}

        ]

    }

ds = ds.map(convert_to_sharegpt)

ds = standardize_sharegpt(ds)

# Apply tokenizer template

def formatting_prompts_func(examples):

    texts = [

        tokenizer.apply_chat_template(

            convo, tokenize=False, add_generation_prompt=False

        )

        for convo in examples["conversations"]

    ]

    return {"text": texts}

ds = ds.map(formatting_prompts_func, batched=True)

```

---

### 3. Training Setup with TRL

We configure supervised fine-tuning with **TRL’s SFTTrainer**.

```python

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(

    model=model,

    tokenizer=tokenizer,

    train_dataset=ds,

    args=SFTConfig(

        per_device_train_batch_size=1,

        gradient_accumulation_steps=4,

        warmup_steps=5,

        max_steps=30,

        learning_rate=2e-4,

        logging_steps=1,

        optim="adamw_8bit",

        weight_decay=0.01,

        lr_scheduler_type="linear",

        seed=3407,

        output_dir="outputs",

        report_to="none",

    ),

)

# Start training

train_result = trainer.train()

```

---

### 4. Saving the Model

We save the fine-tuned model and tokenizer into the `outputs/` directory.

```python

# Save model and tokenizer

trainer.save_model("outputs")

tokenizer.save_pretrained("outputs")

# Optional: print metrics

metrics = trainer.state.log_history

print(metrics)

```

---

### 5. Running Inference

We test the fine-tuned model with a sample math reasoning question.

```python

messages = [

    {"role": "user", "content": "If you have 3 apples and eat 1, how many remain?"}

]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to(model.device)

outputs = model.generate(

    input_ids,

    max_new_tokens=100,

    temperature=0.7,

    top_p=0.9

)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)

```

---

✅ **Summary:**

1. Load GPT-OSS-20B + LoRA

2. Prepare GSM8K dataset in chat format

3. Fine-tune using SFTTrainer

4. Save model & tokenizer

5. Run inference on math questions

## 🤖 Inference

Example usage after training:

```python

from transformers import AutoTokenizer

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained("outputs", load_in_4bit=True)

messages = [

    {"role": "user", "content": "If you have 3 apples and eat 1, how many remain?"}

]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to(model.device)

outputs = model.generate(

    input_ids,

    max_new_tokens=100,

    temperature=0.7,

    top_p=0.9

)

response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)

print("Model:", response)

```

---

## 🧩 Key Takeaways

* **LoRA + Quantization = Training 20B models on single GPUs**

* **GSM8K is an excellent benchmark** for reasoning-focused finetuning

* **Unsloth greatly simplifies efficient fine-tuning** with minimal code changes

---

## 🙌 Acknowledgements

* [Unsloth](https://github.com/unslothai/unsloth) for efficient LLM training

* [Hugging Face](https://huggingface.co/) ecosystem for datasets + transformers

* [OpenAI GSM8K dataset](https://huggingface.co/datasets/openai/gsm8k) for high-quality math reasoning tasks

## 📊 Results & Next Steps

✅ Successfully fine-tuned GPT-OSS-20B on GSM8K using LoRA.

📈 Expected improvements in **step-by-step reasoning** for math problems.

🔜 Future Work:

* Train for full epochs instead of demo steps

* Evaluate on GSM8K test set

* Experiment with higher LoRA ranks (`r=16` or `32`)

* Compare with baseline models (GPT-3.5, LLaMA, etc.)

---

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pavansomisetty21/supervised-fine-tuning-of-gpt-oss-20b-on-openai-s-gsm8k-reasoning-with-lora

Awesome Lists containing this project

README