https://github.com/sbeierle/gpt-oss-20b-lora-dua
Training new OpenAi GPT-OSS 20B with Server B200 V180RAM to Arabic Dua Hisnul Muslim
https://github.com/sbeierle/gpt-oss-20b-lora-dua
arabic arabicllm chatgpt dua gpt gptoss hisnulmuslim lora openai oss20b
Last synced: about 1 month ago
JSON representation
Training new OpenAi GPT-OSS 20B with Server B200 V180RAM to Arabic Dua Hisnul Muslim
- Host: GitHub
- URL: https://github.com/sbeierle/gpt-oss-20b-lora-dua
- Owner: sbeierle
- Created: 2025-08-24T15:04:42.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2025-08-24T16:21:54.000Z (about 1 month ago)
- Last Synced: 2025-08-24T20:27:40.925Z (about 1 month ago)
- Topics: arabic, arabicllm, chatgpt, dua, gpt, gptoss, hisnulmuslim, lora, openai, oss20b
- Language: Python
- Homepage:
- Size: 41.4 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# π LoRA Training Showcase β GPT-OSS-20B + Custom Dua Dataset
## π Overview
This project documents a **showcase fine-tuning experiment** of the **GPT-OSS-20B model** with **LoRA** on a **custom DuΚΏΔΚΎ dataset (inspired by αΈ€iαΉ£n al-Muslim)**.π Focus: **Arabic language, authentic Islamic supplications**
π Uniqueness: Built entirely from scratch β dataset prep, debugging, training, inference, and visualization
π Goal: Provide a **transparent research-style workflow** that others can replicate or extendβ οΈ **Note:** Final model weights are not released.
This repo serves as a **technical documentation & showcase**.---
## β±οΈ Project Facts
- **Total time spent:** ~12β14h
- Debugging: ~4h (dataset fixes, rsync sync issues, initial CPU-only runs π )
- Training + Inference: ~6β8h
- Misc (setup, cleanup, monitoring): ~2h- **Hardware Environment:**
- **RunPod B200 Instance**
- 28 vCPU, 180 GB RAM, 50 GB Disk, 150 GB Pod Volume
- NVIDIA GPU (CUDA Capability `sm_100`) β shown as **B200**
- **PyTorch CUDA 12.1** (Torch available: `True`)
- Container: `runpod/pytorch:2.8.0-py3.11-cuda12.8.1-cudnn-devel`- **Frameworks:** HuggingFace Transformers, PEFT, PyTorch (CUDA), custom Python scripts
- **Specialty:** OSS-20B with LoRA β **rarely documented on B200 hardware**
---
## π Repository Structure
```tree
gpt-oss-20b-lora-dua/
βββ datasets/ # Training data (JSONL, CSV, tokenizer)
βββ results/ # Inference results & comparisons
βββ images/ # Screenshots & debug visuals
βββ videos/ # Training & inference demos (via Git LFS)
βββ scripts/ # Organized experiment scripts
β βββ training/ # Training pipelines
β βββ inference/ # Inference tests
β βββ dataset_tools/ # Dataset checks & fixes
β βββ compare/ # Compare runs & Gradio UI
β βββ tools/ # Utilities & helpers
βββ utils/ # Environment configs & scanners
```## π οΈ Workflow
### 1. Dataset Preparation
- Base dataset curated from **αΈ€iαΉ£n al-Muslim DuΚΏΔΚΎ**
- Fixes applied using:
- `fix_training_entry.py`
- `check_dataset.py`
- `convert_json_to_jsonl.py`### 2. Debugging Phase
- **Issue:** GPU not used (ran on CPU by mistake π )
- **Fix:** Verified ROCm setup, ensured `torch.cuda.is_available() = True`
- Extra ~1h wasted on **rsync retries** β included here to show real-world overhead### 3. Mini-Test
- Ran LoRA on ~100 samples
- Verified that adapters trained & merged properly
- β Confirmed inference pipeline working### 4. Full Training
- Trained on full dua dataset (`datasets/knigge_dua_dataset.jsonl`)
- Saved LoRA adapters & merged back into base### 5. Merge & Export
- Used `merge_lora.py` to combine base + adapters
- Exported in multiple quantized formats (**Q4, Q5, Q8**) locally
- Files intentionally **not pushed to GitHub** (too large)### 6. Inference Showcase
- Tested with authentic DuΚΏΔΚΎ prompts
- Model produced **Arabic text, transliteration, and partial English gloss**
- Outputs documented in `results/` and via screenshots in `images/`---
## π Results
### β Successes
- First documented LoRA fine-tune of **OSS-20B** on **RunPod B200 (CUDA 12)**
- Dataset correction pipeline works robustly
- Training reproducible (mini + full runs)
- Model improved on Arabic + Islamic contexts### β οΈ Limitations
- Dataset small (~100β200 examples)
- Religious accuracy still requires scholar review
- Cloud quirks β some wasted time (initial CPU-only runs, rsync overhead)
---
## π· Media
### Screenshots
- **GPU Setup:** 
- **Mini-Test Success:** 
- **Inference Example:** 
- **Dataset Fix:** ### Videos (via LFS in `/videos/`)
- **Training runs (mini + full):**
- [Mini-Test](videos/b200_training_MINITEST.mp4)
- [Full Training](videos/b200_training_FULL.mp4)- **Debugging sessions:**
- [Dataset Fix](videos/b200_FIX_data_entry-2025-08-20_13.41.22.mp4)
- [Merge Inference](videos/b200_MERGE_INFERENCE_LAST.mp4)- **Inference showcases:**
- [5 Prompts Before/After](videos/b200_5promptstraining_before_after.mp4)
- [Inference Run](videos/b200_INFERENCE_Test.mp4)
- [Quantization Showcase](videos/B200_MXFP4_quantization_6prompts_after.mp4)---
## π‘ Lessons Learned
- **RunPod B200 (CUDA 12)** works reliably once set up correctly
- **LoRA is efficient** even on 20B parameter models
- Debugging + real-world overhead (CPU fallback, rsync) matter just as much as training itself
- Transparency (keeping even βmistakesβ) helps others learn---
## π― Conclusion
This repo demonstrates:
- How to structure a **real LoRA fine-tune project** end-to-end
- How to handle **dataset debugging, training, merging, inference**
- How to use **cloud GPU instances (RunPod B200)** for large-scale experimentsπ A **hands-on showcase**, not a polished product β built for **education, research, and reproducibility**.