https://github.com/anthony-maio/fitcheck
Know before you train — VRAM estimation for LLM fine-tuning.
https://github.com/anthony-maio/fitcheck
fine-tuning-llm gpu training vram-optimization
Last synced: 4 months ago
JSON representation
Know before you train — VRAM estimation for LLM fine-tuning.
- Host: GitHub
- URL: https://github.com/anthony-maio/fitcheck
- Owner: anthony-maio
- Created: 2026-02-12T19:45:57.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-02-14T09:30:12.000Z (4 months ago)
- Last Synced: 2026-02-14T15:45:59.685Z (4 months ago)
- Topics: fine-tuning-llm, gpu, training, vram-optimization
- Language: Python
- Homepage: https://making-minds.ai
- Size: 157 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# fitcheck
> **Know before you train** — VRAM estimation for LLM fine-tuning.
fitcheck predicts GPU memory usage from first principles. Given a model, GPU, and training method, it tells you whether your config will fit — before you spend an hour discovering it won't.
## Why fitcheck?
Fine-tuning LLMs means guessing at batch sizes and hoping you don't OOM. The feedback loop is brutal: pick a config, wait for the run to start, crash 2 minutes in, adjust, repeat.
fitcheck collapses that loop. It computes each VRAM component — model weights, optimizer states, gradients, activations, the logits buffer, eval KV-cache spikes — and produces a breakdown with confidence bounds.
## What it computes
| Component | What it is | Why it matters |
|-----------|-----------|----------------|
| **Model weights** | Base params in training dtype (bf16/NF4) | 4.2 GB for QLoRA 8B, 16 GB for full bf16 |
| **Optimizer states** | AdamW momentum + variance per trainable param | Dominates full fine-tune (~60 GB for 8B) |
| **Gradients** | One gradient per trainable param | Small for LoRA, huge for full FT |
| **Activations** | Per-layer stored tensors for backward pass | Flash-attention-aware, scales with batch × seq |
| **Logits buffer** | batch × seq × vocab × 4 bytes (float32) | The surprise OOM — 2 GB at bs=4 with 128k vocab |
| **Eval KV-cache** | Spike during evaluation steps | Can exceed training steady-state |
## Quick Start
```bash
pip install -r requirements.txt
# Run tests
pytest
```
```python
from fitcheck.hub.resolver import resolve_from_config
from fitcheck.hardware.registry import get_hardware
from fitcheck.profilers.vram.engine import VRAMEstimator
from fitcheck.models.profiles import TrainingMethod, LoRAConfig
# QLoRA Llama 8B on an RTX 3090
estimator = VRAMEstimator()
breakdown = estimator.estimate(
model=resolve_from_config("meta-llama/Llama-3.1-8B", config),
hardware=get_hardware("3090"),
method=TrainingMethod.QLORA,
batch_size=4,
seq_len=1024,
lora_config=LoRAConfig(rank=16),
)
print(f"Steady-state: {breakdown.steady_state_gb:.1f} GB")
print(f"Usable VRAM: {get_hardware('3090').usable_vram_gb} GB")
# Steady-state: 16.6 GB
# Usable VRAM: 22.8 GB ← fits with 6 GB headroom
```
## Development
```bash
pytest # Run all tests (51)
pytest tests/fitcheck/profilers/test_estimator.py -v # End-to-end estimator tests
ruff format fitcheck tests # Format code
```
## License
MIT