https://github.com/samarth2001/llm-fine-tuning
Parameter-efficient fine-tuning experiments for 7B LLMs on consumer hardware. QLoRA implementations, memory optimization strategies, and reproducible benchmarks for Mistral, Llama-2, and other models on Google Colab T4 GPUs.
https://github.com/samarth2001/llm-fine-tuning
fine-tuning huggingface-transformers llama llm lora mistral peft qlora
Last synced: 3 months ago
JSON representation
Parameter-efficient fine-tuning experiments for 7B LLMs on consumer hardware. QLoRA implementations, memory optimization strategies, and reproducible benchmarks for Mistral, Llama-2, and other models on Google Colab T4 GPUs.
- Host: GitHub
- URL: https://github.com/samarth2001/llm-fine-tuning
- Owner: Samarth2001
- License: mit
- Created: 2025-07-02T13:19:19.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-07-02T14:17:15.000Z (3 months ago)
- Last Synced: 2025-07-02T15:29:58.963Z (3 months ago)
- Topics: fine-tuning, huggingface-transformers, llama, llm, lora, mistral, peft, qlora
- Language: Jupyter Notebook
- Homepage:
- Size: 11.7 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# LLM Fine-tuning Experiments
Systematic evaluation of parameter-efficient fine-tuning methods for 7B language models on consumer hardware.
## Overview
This repository contains experiments and benchmarks for fine-tuning large language models (7B parameters) using QLoRA on limited compute resources (Google Colab T4 GPU, 15GB VRAM). The focus is on practical implementations that balance memory efficiency with model performance.
## Key Results
| Model | Dataset | Samples | LoRA Config | Memory Usage | Training Time | Final Loss |
|-------|---------|---------|-------------|--------------|---------------|------------|
| microsoft/phi-2 | openassistant-guanaco | 1000 | r=16, α=32 | 6.2 GB | 15 min | 1.42 |
| mistralai/Mistral-7B-v0.1 | openassistant-guanaco | 300 | r=8, α=16 | 11.3 GB | 20 min | 1.89 |
| mistralai/Mistral-7B-v0.1 | openassistant-guanaco | 500 | r=16, α=32 | 13.1 GB | 30 min | 1.65 |
| meta-llama/Llama-2-7b-hf | openassistant-guanaco | 500 | r=8, α=16 | 12.8 GB | 28 min | 1.72 |## Technical Stack
- **Framework**: Transformers 4.41.2, PEFT 0.11.1, TRL 0.8.6
- **Quantization**: bitsandbytes 4-bit QLoRA
- **Hardware**: NVIDIA T4 GPU (15GB VRAM)
- **Models**: Mistral-7B, Llama-2-7B, Gemma-7B, Phi-2
- **Datasets**: OpenAssistant, CodeAlpaca, GSM8K
## Memory Requirements
Estimated GPU memory usage for 7B models with 4-bit quantization:
```
Base model (4-bit): 3.5 GB
LoRA parameters (r=16): 0.5 GB
Training overhead: 7-10 GB
Total: 11-14 GB
```## Configuration Guidelines
Recommended settings for T4 GPU (15GB VRAM):
- **Batch size**: 1
- **Sequence length**: 512
- **LoRA rank**: 8-16
- **Dataset size**: 300-1000 samples
- **Gradient accumulation**: 8-16 steps
## LicenseMIT License. See LICENSE file for details.
## References
- [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)
- [PEFT: Parameter-Efficient Fine-Tuning](https://github.com/huggingface/peft)
- [OpenAssistant Conversations Dataset](https://huggingface.co/datasets/OpenAssistant/oasst1)