https://github.com/helpingai/llm-trainer

A complete framework for training Large Language Models from scratch
https://github.com/helpingai/llm-trainer
Last synced: 10 months ago
JSON representation
A complete framework for training Large Language Models from scratch
Host: GitHub
URL: https://github.com/helpingai/llm-trainer
Owner: HelpingAI
License: apache-2.0
Created: 2025-07-22T04:35:46.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-08-28T08:00:29.000Z (10 months ago)
Last Synced: 2025-08-28T14:44:11.929Z (10 months ago)
Language: Python
Size: 223 KB
Stars: 7
Watchers: 0
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project

README

          # LLM Trainer



[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)

[![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red.svg)](https://pytorch.org/)

[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](https://opensource.org/licenses/Apache-2.0)

[![GitHub](https://img.shields.io/badge/GitHub-HelpingAI/llm--trainer-black.svg)](https://github.com/HelpingAI/llm-trainer)

[![SafeTensors](https://img.shields.io/badge/SafeTensors-Supported-brightgreen.svg)](https://github.com/huggingface/safetensors)

[![Version](https://img.shields.io/badge/version-0.2.4-blue.svg)](https://github.com/HelpingAI/llm-trainer/releases)

*A production-ready framework for training Large Language Models from scratch with modern PyTorch*



## What's New in v0.2.4

- **Memory Optimizations**: Efficient training with kernel optimizations

- **SafeTensors Support**: Secure model serialization with automatic sharding for large models

- **HuggingFace Integration**: Use any pretrained tokenizer via `HFTokenizerWrapper`

- **Accelerate Support**: Distributed training with `use_accelerate=true`

- **LoRA/PEFT**: Parameter-efficient fine-tuning with `use_peft=true`

- **Backward Compatible**: Existing PyTorch models continue to work

- **Patching System**: Kernel optimizations and memory-efficient training

## Features

### Core Architecture

- **Custom Transformer Implementation**: Multi-head attention, feed-forward networks, positional encodings

- **SafeTensors Integration**: Secure model serialization with automatic sharding

- **Modular Design**: Easy to extend and customize for research and production

### Tokenization

- **BPE Tokenizer**: From-scratch BPE with Unicode and emoji support

- **HuggingFace Integration**: Use any pretrained tokenizer (Mistral, Llama, GPT-2, etc.)

- **WordPiece Support**: Alternative tokenization strategies

### Data Pipeline

- **HuggingFace Datasets**: Efficient loading with preprocessing and batching

- **Memory Optimization**: Smart sequence packing and data streaming

- **Multi-Processing**: Parallel data preprocessing for faster training

### Training & Inference

- **CPU/GPU Support**: Optimized configurations for both CPU and GPU training

- **Distributed Training**: Multi-GPU support via Accelerate and DeepSpeed

- **Parameter-Efficient**: LoRA/PEFT adapters for memory-efficient fine-tuning

- **Mixed Precision**: FP16/BF16 automatic mixed precision

- **Multiple Decoding Strategies**: Greedy, beam search, nucleus (top-p), top-k sampling

- **Enhanced Trainer**: TRL-style training methods with familiar APIs

- **Memory-Efficient Optimizers**: Optimized implementations for better performance

- **Kernel Optimizations**: Fused operations for better performance

- **Low VRAM Training**: Gradient checkpointing and memory-efficient techniques

### Monitoring & Evaluation

- **TensorBoard Integration**: Real-time training metrics and visualizations

- **Weights & Biases**: Experiment tracking and hyperparameter optimization

- **Comprehensive Metrics**: Perplexity, cross-entropy loss, generation quality

## Requirements

- Python 3.8 or higher

- PyTorch 2.0 or higher

- GPU: CUDA-compatible GPU (recommended) or CPU-only mode

- Memory: 8GB RAM minimum (16GB+ recommended)

## Installation

### Basic Installation

```bash

git clone https://github.com/HelpingAI/llm-trainer.git

cd llm-trainer

pip install -e .

```

### Optional Dependencies

```bash

# Development tools

pip install -e ".[dev]"

# SafeTensors support (recommended)

pip install -e ".[safetensors]"

# Distributed training

pip install -e ".[distributed]"

# All features

pip install -e ".[full]"

```

## Quick Start

### Python API - Enhanced Training

```python

from llm_trainer import Trainer, TrainingConfig

from llm_trainer.models import TransformerLM

from llm_trainer.config import ModelConfig

from llm_trainer.tokenizer import BPETokenizer

# Create model and tokenizer

model_config = ModelConfig(

    vocab_size=32000,

    d_model=512,

    n_heads=8,

    n_layers=6,

    max_seq_len=1024

)

model = TransformerLM(model_config)

tokenizer = BPETokenizer()

# Configure training with TRL-style parameters

training_config = TrainingConfig(

    per_device_train_batch_size=4,

    gradient_accumulation_steps=2,

    learning_rate=2e-5,

    num_train_epochs=3,

    logging_steps=10,

    save_steps=100,

    optim="adamw"  # TRL-style parameter

)

# Create trainer and train

trainer = Trainer(

    model=model,

    tokenizer=tokenizer,

    config=training_config

)

# TRL-style training methods

trainer.train()  # Standard training

trainer.sft_train()  # Supervised fine-tuning

trainer.dpo_train()  # Direct preference optimization

```

### HuggingFace Integration with PEFT

```python

from llm_trainer import Trainer, TrainingConfig

from transformers import AutoModelForCausalLM, AutoTokenizer

from peft import LoraConfig, TaskType

# Load pretrained model and tokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2")

tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Configure LoRA (PEFT)

lora_config = LoraConfig(

    r=8,

    lora_alpha=16,

    lora_dropout=0.05,

    task_type=TaskType.CAUSAL_LM

)

# Create trainer with PEFT

trainer = Trainer(

    model=model,

    tokenizer=tokenizer,

    config=TrainingConfig(),

    peft_config=lora_config  # Pass PEFT config directly

)

# Show parameter efficiency

trainer.print_trainable_parameters()

# Train with familiar API

trainer.train()

```

### Memory-Efficient Optimizers

```python

from llm_trainer.training import create_optimizer

# Create memory-efficient optimizer

optimizer = create_optimizer(

    model,

    optimizer_name="adamw",

    learning_rate=5e-5,

    weight_decay=0.01

)

```

### Patching for Transformers/TRL

```python

from llm_trainer import patch_transformers, patch_trl

# Patch Hugging Face Transformers with memory-efficient optimizations

patch_transformers()

# Patch TRL with memory-efficient optimizations

patch_trl()

# Now you can use enhanced Transformers/TRL classes with memory-efficient methods

from transformers import Trainer, TrainingArguments

from trl import SFTTrainer

# These trainers now have enhanced methods

trainer = SFTTrainer(...)

trainer.print_trainable_parameters()  # Added by patching

trainer.prepare_model_for_kbit_training()  # Added by patching

```

### Kernel Optimizations for Fast Training

```python

from llm_trainer.kernels import (

    FusedLinear, FusedRMSNorm, fused_cross_entropy,

    gradient_checkpointing, LowVRAMLinear, empty_cache

)

# Use fused operations for better performance

fused_linear = FusedLinear(in_features=512, out_features=512)

fused_norm = FusedRMSNorm(dim=512)

# Use gradient checkpointing to reduce memory usage

def forward_pass_with_checkpointing(model, inputs):

    return gradient_checkpointing(model, inputs)

# Use low VRAM linear layers for memory-efficient training

low_vram_linear = LowVRAMLinear(in_features=512, out_features=512)

# Clear cache to free up memory

empty_cache()

```

### Command Line

```bash

# GPU Training

python scripts/train.py --config configs/small_model.yaml --output_dir ./output

# CPU Training (no GPU required)

python scripts/train.py --config configs/cpu_small_model.yaml --output_dir ./output

# Text Generation

python scripts/generate.py --model_path ./output --prompts "The quick brown fox" --interactive

# Model Evaluation

python scripts/evaluate.py --model_path ./output --dataset_config configs/eval_config.json

```

## Configuration

The framework uses YAML/JSON configuration files for reproducible experiments:

### Small Model (Quick Start)

```yaml

model:

  d_model: 512

  n_heads: 8

  n_layers: 6

  vocab_size: 32000

  max_seq_len: 1024

training:

  batch_size: 16

  learning_rate: 1e-4

  num_epochs: 3

  use_amp: true

  gradient_accumulation_steps: 4

```

### CPU-Optimized Training

```yaml

device: "cpu"

model:

  d_model: 256

  n_heads: 4

  n_layers: 4

  max_seq_len: 512

training:

  batch_size: 2

  use_amp: false

  gradient_accumulation_steps: 8

  dataloader_num_workers: 2

```

### Advanced Configuration

```yaml

model:

  d_model: 768

  n_heads: 12

  n_layers: 12

training:

  use_accelerate: true

  accelerate_mixed_precision: "fp16"

  use_peft: true

  peft_type: "lora"

  peft_r: 8

  peft_alpha: 16

# SafeTensors settings

save_format: "safetensors"

max_shard_size: "2GB"

```

## Project Structure

```

llm-trainer/

├── src/llm_trainer/              # Main package

│   ├── models/                   # Model architectures

│   │   ├── base_model.py         # Base model interface

│   │   ├── transformer.py        # Custom Transformer implementation

│   │   ├── safetensors_utils.py  # SafeTensors utilities

│   │   └── attention.py          # Attention mechanisms

│   ├── tokenizer/                # Tokenization

│   │   ├── bpe_tokenizer.py      # BPE implementation

│   │   ├── hf_tokenizer.py       # HuggingFace wrapper

│   │   └── wordpiece_tokenizer.py # WordPiece implementation

│   ├── data/                     # Data pipeline

│   │   ├── dataset.py            # Dataset classes

│   │   ├── dataloader.py         # Data loading

│   │   └── preprocessing.py      # Data preprocessing

│   ├── training/                 # Training infrastructure

│   │   ├── trainer.py            # Enhanced trainer with TRL-style APIs

│   │   ├── optimizer.py          # Standard optimizers

│   │   └── scheduler.py          # Learning rate schedulers

│   ├── kernels/                  # Kernel optimizations

│   │   ├── fused_ops.py          # Fused operations

│   │   └── memory_efficient.py   # Memory-efficient operations

│   ├── patching/                 # Patching system

│   │   ├── patch_transformers.py # Transformers patching

│   │   └── patch_trl.py          # TRL patching

│   ├── utils/                    # Utilities

│   │   ├── generation.py         # Text generation

│   │   ├── inference.py          # Inference utilities

│   │   └── metrics.py            # Evaluation metrics

│   └── config/                   # Configuration

│       ├── model_config.py       # Model configuration

│       └── training_config.py    # Training configuration

├── scripts/                      # CLI tools

│   ├── train.py                  # Training script

│   ├── generate.py               # Text generation

│   └── evaluate.py               # Model evaluation

├── configs/                      # Pre-configured setups

│   ├── small_model.yaml          # Small GPU model

│   ├── medium_model.yaml         # Medium GPU model

│   ├── cpu_small_model.yaml      # CPU-optimized small

│   └── cpu_medium_model.yaml     # CPU-optimized medium

├── examples/                     # Usage examples

│   ├── complete_pipeline.py      # End-to-end example

│   ├── safetensors_example.py    # SafeTensors demo

│   └── train_small_model.py      # Quick start example

└── docs/                         # Documentation

```

## Documentation

- [Getting Started Guide](docs/getting_started.md) — Complete setup and first steps

- [Model Architecture](docs/architecture.md) — Transformer implementation details

- [Training Guide](docs/training.md) — Comprehensive training tutorial

- [CPU Training Guide](docs/cpu_training.md) — Dedicated CPU training documentation

- [Tokenizer Details](docs/tokenizer.md) — BPE tokenizer documentation

- [API Reference](docs/api.md) — Complete API documentation

## Development

### Running Tests

```bash

pip install -e ".[dev]"

pytest tests/

```

### Code Quality

```bash

black src/ scripts/ examples/

flake8 src/ scripts/ examples/

mypy src/

```

## Contributing

We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.

1. Fork the repository

2. Create a feature branch (`git checkout -b feature/amazing-feature`)

3. Commit your changes (`git commit -m 'Add amazing feature'`)

4. Push to the branch (`git push origin feature/amazing-feature`)

5. Open a Pull Request

## License

This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

## Support

- **Bug Reports**: [GitHub Issues](https://github.com/HelpingAI/llm-trainer/issues)

- **Discussions**: [GitHub Discussions](https://github.com/HelpingAI/llm-trainer/discussions)

- **Documentation**: [Read the Docs](https://github.com/HelpingAI/llm-trainer/tree/main/docs)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/helpingai/llm-trainer

Awesome Lists containing this project

README