{"id":50883980,"url":"https://github.com/headless-start/peft-lora-vit","last_synced_at":"2026-06-15T15:01:49.413Z","repository":{"id":361596896,"uuid":"1254408901","full_name":"headless-start/peft-lora-vit","owner":"headless-start","description":"This repository contains LoRA fine-tuning of a Vision Transformer on Oxford-IIIT Pets and Flowers-102.","archived":false,"fork":false,"pushed_at":"2026-06-10T11:41:42.000Z","size":1270,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-10T13:18:23.167Z","etag":null,"topics":["computer-vision","deep-learning","fine-tuning","hydra","image-classification","lora","peft","python","pytorch","timm","transfer-learning","vision-transformer","vit","weights-and-biases"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/headless-start.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-30T14:27:01.000Z","updated_at":"2026-06-10T11:41:46.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/headless-start/peft-lora-vit","commit_stats":null,"previous_names":["headless-start/peft-vit-demo","headless-start/peft-lora-vit"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/headless-start/peft-lora-vit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/headless-start%2Fpeft-lora-vit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/headless-start%2Fpeft-lora-vit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/headless-start%2Fpeft-lora-vit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/headless-start%2Fpeft-lora-vit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/headless-start","download_url":"https://codeload.github.com/headless-start/peft-lora-vit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/headless-start%2Fpeft-lora-vit/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34367696,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-15T02:00:07.085Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","deep-learning","fine-tuning","hydra","image-classification","lora","peft","python","pytorch","timm","transfer-learning","vision-transformer","vit","weights-and-biases"],"created_at":"2026-06-15T15:01:48.359Z","updated_at":"2026-06-15T15:01:49.404Z","avatar_url":"https://github.com/headless-start.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Parameter-Efficient Fine-Tuning of a Vision Transformer (LoRA)\n\n## 📌 Project Overview\nThis project demonstrates **parameter-efficient fine-tuning** of a **Vision Transformer (ViT-B/16)** for image classification using **LoRA** — strictly low-rank updates, no other PEFT method. An ImageNet-pretrained backbone is adapted to a new dataset by learning small low-rank deltas on the attention **query/value** projections, while the backbone itself stays frozen. This reaches near full fine-tuning accuracy while updating only a tiny fraction of the weights.\n\n**Datasets**: Oxford-IIIT Pets (37 cat and dog breeds) and Oxford Flowers-102.  \n**Backbone**: `vit_base_patch16_224`, pretrained on ImageNet via `timm`.  \n**Goal**: Strong top-1 accuracy while training well under 5% of the model's parameters.\n\nI built this as hands-on preparation for the PEFT/LoRA side of my thesis; everything here is a standalone prototype on public data and public weights.\n\n![Dataset Samples](results/pet_samples.png)\n\n---\n\n## 🚀 Key Features\n1. **Hand-Written LoRA**:\n   - Low-rank matrices injected into the fused q/v attention projections (`B · A · x · α/r`, with `α = 2r` and `B` zero-initialised so training starts exactly from the pretrained model).\n   - Placement follows the original [LoRA paper (Hu et al., 2022)](https://arxiv.org/abs/2106.09685), whose placement study (§7.1) found adapting **q and v** the best use of a fixed parameter budget — k contributes least.\n   - Only the LoRA matrices and the classifier head are trainable; the backbone is fully frozen.\n2. **Rank Ablation**:\n   - One command sweeps the LoRA rank over {4, 8, 16, 32} and plots accuracy and cost against rank.\n3. **Tiny Checkpoints**:\n   - Only the LoRA weights and head are saved — a few MB instead of the full 344 MB backbone. Inference rebuilds the model from public pretrained weights and loads the LoRA weights on top.\n4. **Solid Training Recipe**:\n   - AdamW with a 2-epoch linear warmup into cosine decay, drop-path 0.1, mixed precision.\n5. **Configurable with Hydra**:\n   - Data, model, and training settings live in `configs/` and can be overridden straight from the command line.\n6. **Experiment Tracking**:\n   - Metrics are logged to Weights \u0026 Biases in **offline** mode by default, so it runs without an account.\n\n---\n\n## 🔍 Findings\n- **Top-1 Accuracy**: **95.2%** on the Pets validation set (weighted average recall, WAR), best run with rank 8 on q/v.\n- **Trainable Parameters**: 323K out of 86.1M — just **0.38%** of the model.\n- **Setup**: LoRA rank 8 on q/v, 25 epochs, AdamW with warmup + cosine decay, mixed precision.\n- **Takeaway**: LoRA matches — here slightly beats — full fine-tuning while training under half a percent of the weights.\n\n![Training Curves](results/training_curve.png)\n\n### Baselines: how much does LoRA actually buy?\nThe comparison that matters: LoRA against a frozen-backbone **linear probe** (lower bound) and **full fine-tuning** (upper bound), all under the same protocol:\n\n| method | top-1 acc (WAR) | trainable params | checkpoint | s/epoch | peak VRAM |\n|------------------|-----------------|------------------|------------|---------|-----------|\n| linear probe     | 93.5%           | 28K (0.03%)      | 0.1 MB     | 20      | 0.7 GB    |\n| LoRA r=8 (ours)  | **94.9%**       | 323K (0.38%)     | 1.2 MB     | 31      | 3.7 GB    |\n| full fine-tuning | 93.9%           | 85.8M (100%)     | 327 MB     | 41      | 2.5 GB*   |\n\n\\* full fine-tuning runs at batch 16 (the others at 64) to fit optimizer states for all 86M parameters into 8 GB — its per-sample memory is far higher.\n\nLoRA beats the linear probe by **+1.4 points**, so the frozen features alone are not enough — and it even edges out full fine-tuning by **+1.0** while training **265× fewer parameters** with a **270× smaller checkpoint**. On a 3.7K-image dataset, updating all 86M weights overfits where the low-rank update acts as a regulariser; this mirrors the LoRA paper, which reports LoRA matching or outperforming full fine-tuning on most benchmarks.\n\n![Baselines](results/baselines.png)\n\n### Placement Ablation\nWhich projections should carry the LoRA update? Sweeping every q/k/v subset at rank 8:\n\n| placement | top-1 acc (WAR) | trainable params | % of total |\n|-----------|-----------------|------------------|------------|\n| q         | 94.3%           | 176K             | 0.21%      |\n| k         | 94.1%           | 176K             | 0.21%      |\n| v         | 94.7%           | 176K             | 0.21%      |\n| q + k     | 94.3%           | 323K             | 0.38%      |\n| q + v     | **94.9%**       | 323K             | 0.38%      |\n| q + k + v | 94.7%           | 471K             | 0.55%      |\n\n**q + v wins.** k is the weakest single placement and adding it to q+v helps nothing — q and k only shape the attention pattern through their inner product, so adapting q already covers it, while v changes the content being mixed and is complementary. This reproduces the placement study in [the LoRA paper](https://arxiv.org/abs/2106.09685) (§7.1, Table 5).\n\n![Placement Ablation](results/placement.png)\n\n### Rank Ablation\nWith placement fixed at q+v, sweeping the rank shows accuracy saturates almost immediately — rank 4 is already within 0.1 points of the best, and rank 32 buys nothing for 7× the parameters:\n\n| rank | top-1 acc (WAR) | trainable params | % of total |\n|------|-----------------|------------------|------------|\n| 4    | 94.8%           | 176K             | 0.21%      |\n| 8    | 94.9%           | 323K             | 0.38%      |\n| 16   | 94.6%           | 618K             | 0.72%      |\n| 32   | 94.9%           | 1.21M            | 1.39%      |\n\n![Rank Ablation](results/ablation.png)\n\nAblation numbers are single runs with the default recipe; reruns move individual cells by ±0.3 points. The repo default (rank 8 on q/v) is the configuration both sweeps select.\n\n---\n\n## ⚙️ How to Run\nWorks on Linux, macOS and Windows.\n\n```bash\ngit clone https://github.com/headless-start/peft-lora-vit.git\ncd peft-lora-vit\n\npython -m venv .venv\nsource .venv/bin/activate          # linux / macos\n# .venv\\Scripts\\activate           # windows\n\npip install -r requirements.txt\n```\n\nFor GPU training install the CUDA build of PyTorch from [pytorch.org](https://pytorch.org/get-started/locally/) first; the plain `pip install` gives you a CPU build on some platforms.\n\n```bash\n# full run on Oxford-IIIT Pets (downloads on first use)\npython train.py\n\n# or train on Flowers-102 instead\npython train.py data=flowers\n\n# override anything from the command line\npython train.py train.epochs=40 data.batch_size=32 model.lora.r=16\n```\n\nSweep the LoRA rank (writes `results/ablation.json` and `results/ablation.png`):\n\n```bash\npython ablate.py                    # ranks 4, 8, 16, 32\npython ablate.py --ranks 4,8 data=flowers\n```\n\nCompare LoRA against the baselines — linear probe and full fine-tuning (writes `results/baselines.json` and `results/baselines.png`):\n\n```bash\npython baselines.py\n```\n\nClassify your own images with a trained checkpoint:\n\n```bash\npython predict.py path/to/cat.jpg path/to/dog.jpg\n# path/to/cat.jpg: Abyssinian (100.0%), Russian Blue (0.0%), Shiba Inu (0.0%)\n```\n\nQuick smoke test (CPU, small backbone, no downloads):\n\n```bash\npython train.py +experiment=smoke\n```\n\nRuns are logged to Weights \u0026 Biases offline by default; to sync to the cloud:\n\n```bash\nwandb login\npython train.py wandb.mode=online\n```\n\nTraining curves and `metrics.json` are written to `results/`; checkpoints go to `outputs/`.\n\n---\n\n## 🛠 System Requirements\n### Dependencies\n- Python 3.10+\n- Libraries: `torch`, `torchvision`, `timm`, `hydra-core`, `wandb`, `matplotlib`\n- Hardware: CUDA GPU recommended (a CPU smoke run is supported)\n\n### Reproducibility\n- Runs on Linux, macOS and Windows; all paths and commands are OS-agnostic.\n- Seeds are fixed (`seed: 42`). Reported numbers came from Python 3.13, `torch` 2.12, `torchvision` 0.27, `timm` 1.0.27 on a single RTX 4060; expect individual cells to move by ±0.3 points across reruns and library versions due to GPU non-determinism.\n- On machines with little RAM, add `data.num_workers=0` to any command.\n\n---\n\n## 📄 License\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheadless-start%2Fpeft-lora-vit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fheadless-start%2Fpeft-lora-vit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheadless-start%2Fpeft-lora-vit/lists"}