https://github.com/helpingai/llm-trainer
A complete framework for training Large Language Models from scratch
https://github.com/helpingai/llm-trainer
Last synced: 10 months ago
JSON representation
A complete framework for training Large Language Models from scratch
- Host: GitHub
- URL: https://github.com/helpingai/llm-trainer
- Owner: HelpingAI
- License: apache-2.0
- Created: 2025-07-22T04:35:46.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-08-28T08:00:29.000Z (10 months ago)
- Last Synced: 2025-08-28T14:44:11.929Z (10 months ago)
- Language: Python
- Size: 223 KB
- Stars: 7
- Watchers: 0
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# LLM Trainer
[](https://www.python.org/downloads/)
[](https://pytorch.org/)
[](https://opensource.org/licenses/Apache-2.0)
[](https://github.com/HelpingAI/llm-trainer)
[](https://github.com/huggingface/safetensors)
[](https://github.com/HelpingAI/llm-trainer/releases)
*A production-ready framework for training Large Language Models from scratch with modern PyTorch*
## What's New in v0.2.4
- **Memory Optimizations**: Efficient training with kernel optimizations
- **SafeTensors Support**: Secure model serialization with automatic sharding for large models
- **HuggingFace Integration**: Use any pretrained tokenizer via `HFTokenizerWrapper`
- **Accelerate Support**: Distributed training with `use_accelerate=true`
- **LoRA/PEFT**: Parameter-efficient fine-tuning with `use_peft=true`
- **Backward Compatible**: Existing PyTorch models continue to work
- **Patching System**: Kernel optimizations and memory-efficient training
## Features
### Core Architecture
- **Custom Transformer Implementation**: Multi-head attention, feed-forward networks, positional encodings
- **SafeTensors Integration**: Secure model serialization with automatic sharding
- **Modular Design**: Easy to extend and customize for research and production
### Tokenization
- **BPE Tokenizer**: From-scratch BPE with Unicode and emoji support
- **HuggingFace Integration**: Use any pretrained tokenizer (Mistral, Llama, GPT-2, etc.)
- **WordPiece Support**: Alternative tokenization strategies
### Data Pipeline
- **HuggingFace Datasets**: Efficient loading with preprocessing and batching
- **Memory Optimization**: Smart sequence packing and data streaming
- **Multi-Processing**: Parallel data preprocessing for faster training
### Training & Inference
- **CPU/GPU Support**: Optimized configurations for both CPU and GPU training
- **Distributed Training**: Multi-GPU support via Accelerate and DeepSpeed
- **Parameter-Efficient**: LoRA/PEFT adapters for memory-efficient fine-tuning
- **Mixed Precision**: FP16/BF16 automatic mixed precision
- **Multiple Decoding Strategies**: Greedy, beam search, nucleus (top-p), top-k sampling
- **Enhanced Trainer**: TRL-style training methods with familiar APIs
- **Memory-Efficient Optimizers**: Optimized implementations for better performance
- **Kernel Optimizations**: Fused operations for better performance
- **Low VRAM Training**: Gradient checkpointing and memory-efficient techniques
### Monitoring & Evaluation
- **TensorBoard Integration**: Real-time training metrics and visualizations
- **Weights & Biases**: Experiment tracking and hyperparameter optimization
- **Comprehensive Metrics**: Perplexity, cross-entropy loss, generation quality
## Requirements
- Python 3.8 or higher
- PyTorch 2.0 or higher
- GPU: CUDA-compatible GPU (recommended) or CPU-only mode
- Memory: 8GB RAM minimum (16GB+ recommended)
## Installation
### Basic Installation
```bash
git clone https://github.com/HelpingAI/llm-trainer.git
cd llm-trainer
pip install -e .
```
### Optional Dependencies
```bash
# Development tools
pip install -e ".[dev]"
# SafeTensors support (recommended)
pip install -e ".[safetensors]"
# Distributed training
pip install -e ".[distributed]"
# All features
pip install -e ".[full]"
```
## Quick Start
### Python API - Enhanced Training
```python
from llm_trainer import Trainer, TrainingConfig
from llm_trainer.models import TransformerLM
from llm_trainer.config import ModelConfig
from llm_trainer.tokenizer import BPETokenizer
# Create model and tokenizer
model_config = ModelConfig(
vocab_size=32000,
d_model=512,
n_heads=8,
n_layers=6,
max_seq_len=1024
)
model = TransformerLM(model_config)
tokenizer = BPETokenizer()
# Configure training with TRL-style parameters
training_config = TrainingConfig(
per_device_train_batch_size=4,
gradient_accumulation_steps=2,
learning_rate=2e-5,
num_train_epochs=3,
logging_steps=10,
save_steps=100,
optim="adamw" # TRL-style parameter
)
# Create trainer and train
trainer = Trainer(
model=model,
tokenizer=tokenizer,
config=training_config
)
# TRL-style training methods
trainer.train() # Standard training
trainer.sft_train() # Supervised fine-tuning
trainer.dpo_train() # Direct preference optimization
```
### HuggingFace Integration with PEFT
```python
from llm_trainer import Trainer, TrainingConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, TaskType
# Load pretrained model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
# Configure LoRA (PEFT)
lora_config = LoraConfig(
r=8,
lora_alpha=16,
lora_dropout=0.05,
task_type=TaskType.CAUSAL_LM
)
# Create trainer with PEFT
trainer = Trainer(
model=model,
tokenizer=tokenizer,
config=TrainingConfig(),
peft_config=lora_config # Pass PEFT config directly
)
# Show parameter efficiency
trainer.print_trainable_parameters()
# Train with familiar API
trainer.train()
```
### Memory-Efficient Optimizers
```python
from llm_trainer.training import create_optimizer
# Create memory-efficient optimizer
optimizer = create_optimizer(
model,
optimizer_name="adamw",
learning_rate=5e-5,
weight_decay=0.01
)
```
### Patching for Transformers/TRL
```python
from llm_trainer import patch_transformers, patch_trl
# Patch Hugging Face Transformers with memory-efficient optimizations
patch_transformers()
# Patch TRL with memory-efficient optimizations
patch_trl()
# Now you can use enhanced Transformers/TRL classes with memory-efficient methods
from transformers import Trainer, TrainingArguments
from trl import SFTTrainer
# These trainers now have enhanced methods
trainer = SFTTrainer(...)
trainer.print_trainable_parameters() # Added by patching
trainer.prepare_model_for_kbit_training() # Added by patching
```
### Kernel Optimizations for Fast Training
```python
from llm_trainer.kernels import (
FusedLinear, FusedRMSNorm, fused_cross_entropy,
gradient_checkpointing, LowVRAMLinear, empty_cache
)
# Use fused operations for better performance
fused_linear = FusedLinear(in_features=512, out_features=512)
fused_norm = FusedRMSNorm(dim=512)
# Use gradient checkpointing to reduce memory usage
def forward_pass_with_checkpointing(model, inputs):
return gradient_checkpointing(model, inputs)
# Use low VRAM linear layers for memory-efficient training
low_vram_linear = LowVRAMLinear(in_features=512, out_features=512)
# Clear cache to free up memory
empty_cache()
```
### Command Line
```bash
# GPU Training
python scripts/train.py --config configs/small_model.yaml --output_dir ./output
# CPU Training (no GPU required)
python scripts/train.py --config configs/cpu_small_model.yaml --output_dir ./output
# Text Generation
python scripts/generate.py --model_path ./output --prompts "The quick brown fox" --interactive
# Model Evaluation
python scripts/evaluate.py --model_path ./output --dataset_config configs/eval_config.json
```
## Configuration
The framework uses YAML/JSON configuration files for reproducible experiments:
### Small Model (Quick Start)
```yaml
model:
d_model: 512
n_heads: 8
n_layers: 6
vocab_size: 32000
max_seq_len: 1024
training:
batch_size: 16
learning_rate: 1e-4
num_epochs: 3
use_amp: true
gradient_accumulation_steps: 4
```
### CPU-Optimized Training
```yaml
device: "cpu"
model:
d_model: 256
n_heads: 4
n_layers: 4
max_seq_len: 512
training:
batch_size: 2
use_amp: false
gradient_accumulation_steps: 8
dataloader_num_workers: 2
```
### Advanced Configuration
```yaml
model:
d_model: 768
n_heads: 12
n_layers: 12
training:
use_accelerate: true
accelerate_mixed_precision: "fp16"
use_peft: true
peft_type: "lora"
peft_r: 8
peft_alpha: 16
# SafeTensors settings
save_format: "safetensors"
max_shard_size: "2GB"
```
## Project Structure
```
llm-trainer/
├── src/llm_trainer/ # Main package
│ ├── models/ # Model architectures
│ │ ├── base_model.py # Base model interface
│ │ ├── transformer.py # Custom Transformer implementation
│ │ ├── safetensors_utils.py # SafeTensors utilities
│ │ └── attention.py # Attention mechanisms
│ ├── tokenizer/ # Tokenization
│ │ ├── bpe_tokenizer.py # BPE implementation
│ │ ├── hf_tokenizer.py # HuggingFace wrapper
│ │ └── wordpiece_tokenizer.py # WordPiece implementation
│ ├── data/ # Data pipeline
│ │ ├── dataset.py # Dataset classes
│ │ ├── dataloader.py # Data loading
│ │ └── preprocessing.py # Data preprocessing
│ ├── training/ # Training infrastructure
│ │ ├── trainer.py # Enhanced trainer with TRL-style APIs
│ │ ├── optimizer.py # Standard optimizers
│ │ └── scheduler.py # Learning rate schedulers
│ ├── kernels/ # Kernel optimizations
│ │ ├── fused_ops.py # Fused operations
│ │ └── memory_efficient.py # Memory-efficient operations
│ ├── patching/ # Patching system
│ │ ├── patch_transformers.py # Transformers patching
│ │ └── patch_trl.py # TRL patching
│ ├── utils/ # Utilities
│ │ ├── generation.py # Text generation
│ │ ├── inference.py # Inference utilities
│ │ └── metrics.py # Evaluation metrics
│ └── config/ # Configuration
│ ├── model_config.py # Model configuration
│ └── training_config.py # Training configuration
├── scripts/ # CLI tools
│ ├── train.py # Training script
│ ├── generate.py # Text generation
│ └── evaluate.py # Model evaluation
├── configs/ # Pre-configured setups
│ ├── small_model.yaml # Small GPU model
│ ├── medium_model.yaml # Medium GPU model
│ ├── cpu_small_model.yaml # CPU-optimized small
│ └── cpu_medium_model.yaml # CPU-optimized medium
├── examples/ # Usage examples
│ ├── complete_pipeline.py # End-to-end example
│ ├── safetensors_example.py # SafeTensors demo
│ └── train_small_model.py # Quick start example
└── docs/ # Documentation
```
## Documentation
- [Getting Started Guide](docs/getting_started.md) — Complete setup and first steps
- [Model Architecture](docs/architecture.md) — Transformer implementation details
- [Training Guide](docs/training.md) — Comprehensive training tutorial
- [CPU Training Guide](docs/cpu_training.md) — Dedicated CPU training documentation
- [Tokenizer Details](docs/tokenizer.md) — BPE tokenizer documentation
- [API Reference](docs/api.md) — Complete API documentation
## Development
### Running Tests
```bash
pip install -e ".[dev]"
pytest tests/
```
### Code Quality
```bash
black src/ scripts/ examples/
flake8 src/ scripts/ examples/
mypy src/
```
## Contributing
We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## License
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
## Support
- **Bug Reports**: [GitHub Issues](https://github.com/HelpingAI/llm-trainer/issues)
- **Discussions**: [GitHub Discussions](https://github.com/HelpingAI/llm-trainer/discussions)
- **Documentation**: [Read the Docs](https://github.com/HelpingAI/llm-trainer/tree/main/docs)