{"id":50097145,"url":"https://github.com/viftode4/scalable-learning","last_synced_at":"2026-05-23T04:11:50.928Z","repository":{"id":359556265,"uuid":"1238817115","full_name":"viftode4/scalable-learning","owner":"viftode4","description":null,"archived":false,"fork":false,"pushed_at":"2026-05-22T11:29:38.000Z","size":36746,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-22T17:35:54.373Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/viftode4.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-05-14T13:35:44.000Z","updated_at":"2026-05-22T11:29:42.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/viftode4/scalable-learning","commit_stats":null,"previous_names":["viftode4/scalable-learning"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/viftode4/scalable-learning","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viftode4%2Fscalable-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viftode4%2Fscalable-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viftode4%2Fscalable-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viftode4%2Fscalable-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/viftode4","download_url":"https://codeload.github.com/viftode4/scalable-learning/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viftode4%2Fscalable-learning/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33382143,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-23T01:21:08.577Z","status":"online","status_checked_at":"2026-05-23T02:00:05.530Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-23T04:11:49.084Z","updated_at":"2026-05-23T04:11:50.920Z","avatar_url":"https://github.com/viftode4.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Scalable Learning Systems — RoLoRA Reproduction \u0026 Extension\n\nTU Delft **CS 4725** research seminar (Spring 2026). 9 course weeks, currently in **week 3**. Hard end: week 9 final report + presentation (the only graded deliverables).\n\n## Team\n- Vlad Iftode\n- Daniel Popovici\n- Sorin Zele\n\n## Course staff\n- **Dr. Kubilay Atasu** — Associate Professor, coordinator (lectures, projects, presentations, homeworks).\n- **Dr. Rui Wang** — Postdoctoral researcher (projects, guest lecture).\n- **Dennis Heijmans** — MSc thesis student (homeworks, presentations).\n\n## Paper\nChen, Guo, Ju, Dalal, Zhu, Khisti. *Robust Federated Finetuning of LLMs via Alternating Optimization of LoRA.* NeurIPS 2025.\nLocal copy: [`docs/research/paper-rolora.pdf`](docs/research/paper-rolora.pdf) · OpenReview: `u4mobiHTJl`.\n\n## Assessment\n| Component | Weight | When |\n|---|---|---|\n| Paper presentation (on RoLoRA itself) | 20% | weeks 7–8; 10–12 min present + 5–6 min Q\u0026A; rubric 20/40/25/15 |\n| Research project (reproduction + improvement) | 60% | weeks 4–9; deliverables below |\n| Individual homeworks | 20% | due weeks 4 and 5 |\n\n## Deadlines\n| Week | Deliverable | Status |\n|---|---|---|\n| W4 | Project proposal (mandatory, ungraded) | ✅ submitted 12 May 2026 |\n| W6 | Midterm review meeting (mandatory, ungraded) | pending |\n| W8 | Draft project report (mandatory, ungraded) | pending |\n| W9 | Final project report + final presentation (mandatory, **graded**) | pending |\n\n## Committed improvement directions (per submitted proposal)\n1. **Improved initialization** — orthogonal / SVD-based init for the down-projection matrix A.\n2. **Separate learning rates for A and B** — LoRA+-style asymmetric LRs, enabled by RoLoRA's per-round factor isolation.\n3. **Adaptive server-side optimization** — lightweight federated optimizer in place of plain averaging.\n\nAll three preserve RoLoRA's alternating structure.\n\n## Source documents\n- [`docs/research/paper-rolora.pdf`](docs/research/paper-rolora.pdf) — the paper.\n- [`docs/research/project-proposal.pdf`](docs/research/project-proposal.pdf) — our submitted proposal.\n- [`docs/research/lecture-01-introduction.pdf`](docs/research/lecture-01-introduction.pdf) — CS 4725 lecture-1 slides.\n- [`docs/research/deep-research-plan.md`](docs/research/deep-research-plan.md) — independent technical-decision document (compute budget, roadmap, risks).\n\n## Layout\n```\ndocs/        Source documents, kickoff agenda, decision log, setup guides, templates\ncode/        Our code + harness checkouts (FedSA-LoRA submodule, RoLoRA supplement)\nexperiments/ YAML configs that map to runs\nnotebooks/   MNIST Figure-2 sanity check and exploration\nscripts/     Setup and run utilities (dataset prep, supplement extraction/smoke/summary)\nslurm/       DelftBlue / DAIC job templates\nresults/     Output artifacts (gitignored)\nreport/      LaTeX writeup\ntests/       pytest suite (aggregation math, invariants)\n```\n\n## Quickstart\n```bash\ngit clone \u003cthis-repo\u003e\ncd scalable-learning\ngit submodule update --init --recursive\nmake sync\nmake check\nmake mnist-smoke\n```\n\nThe authors' OpenReview supplement is vendored in this repo. To enable its isolated runtime:\n```bash\nmake install-supplement\nmake supplement-smoke-all\n```\n\n`make supplement` remains available only to refresh the vendored copy from the original OpenReview zip.\n\nSee [`docs/setup/environment.md`](docs/setup/environment.md) for the full setup, [`docs/setup/openreview-supplement.md`](docs/setup/openreview-supplement.md) for fetching the authors' code, [`docs/setup/delftblue.md`](docs/setup/delftblue.md) for cluster access (TA-driven), and [`experiments/ledger/README.md`](experiments/ledger/README.md) for run evidence.\n\n\n## Project control docs\n\nUse these docs to keep the final project execution visible to humans and agents:\n\n- [`docs/progress.md`](docs/progress.md) — live owner/status/next-action board.\n- [`docs/experiment-matrix.md`](docs/experiment-matrix.md) — reproduction and improvement run matrix.\n- [`docs/plans/12-10-paper-track-rolora.md`](docs/plans/12-10-paper-track-rolora.md) — 12/10 + paper-track execution plan and critique.\n- [`docs/research/literature-snapshot-2026-05-20.md`](docs/research/literature-snapshot-2026-05-20.md) — external literature positioning for the improvement story.\n- [`docs/decisions/0005-unified-phase-dynamics-thesis.md`](docs/decisions/0005-unified-phase-dynamics-thesis.md) — ADR pinning the unified phase-specific thesis.\n\n## Tracking snapshot — 2026-05-20\n\n### Done / visible now\n\n- Paper-track strategy is locked: reproduce first, diagnose phase-specific A/B dynamics, then test proposal-compatible improvements.\n- `docs/progress.md` is the live dashboard and claim ledger.\n- `docs/experiment-matrix.md` defines dataset rules, compute gates, reproduction rows, improvement rows, and stop/fallback criteria.\n- `report/README.md` is the report skeleton with figure/table placeholders mapped to claim IDs.\n- `experiments/configs/roberta_large_feasibility.yaml` defines the GPU-only RoBERTa-Large feasibility gate.\n- `scripts/summarize_supplement.py --diagnostics` and `make diagnostics-summary` parse manifest, per-result metrics, and phase markers from logs.\n\n### Left / next evidence gates\n\n1. Run `make table1-medium-all`, then `make table1-medium-summary` and `make diagnostics-summary PREFIX=table1_medium`.\n2. Ledger the medium all-mode result or failure in `experiments/ledger/README.md`.\n3. Run `make roberta-large-feasibility MODE=rolora` on a GPU-capable machine.\n4. Add stronger supplement instrumentation for update norms, frozen-factor equality markers, wall time, and memory.\n5. Implement the improvement knobs in order: orthogonal A init, A/B LR split, active-factor server momentum.\n6. Fill the report skeleton continuously; no claim should enter final prose without a claim-ledger evidence path.\n\n## Local commands\n| Command | Purpose |\n|---|---|\n| `make check` | Run first-party tests + lint. |\n| `make mnist-smoke` | Fast MNIST sanity check. |\n| `make mnist` | Default local MNIST Figure-2 run. |\n| `make mnist-paper` | Stronger 200-round MNIST Figure-2 run used as the local paper-sanity check. |\n| `make supplement-smoke-all` | Run the tiny supplement smoke config in `rolora`, `lora`, and `ffa_lora` modes. |\n| `make table1-pilot MODE=rolora` | Run a 3-client QNLI RoBERTa-base local pilot for one mode. |\n| `make table1-pilot-all` | Run the local Table-1-shaped pilot for all three modes. |\n| `make table1-pilot-summary` | Parse `results/table1_pilot_*.log` into a metrics table. |\n| `make table1-medium MODE=rolora` | Stronger local pilot: 3-client QNLI RoBERTa-base, 10 rounds, 5 local batches. |\n| `make table1-medium-all` | Run the stronger local pilot for all three modes. |\n| `make table1-medium-summary` | Parse `results/table1_medium_*.log` into a metrics table. |\n| `make roberta-large-feasibility MODE=rolora` | Run the tiny GPU-only RoBERTa-Large feasibility gate before cluster reproduction. |\n| `make roberta-large-feasibility-summary` | Parse feasibility logs into a metrics table. |\n| `make diagnostics-summary PREFIX=table1_medium` | Parse manifest, per-round metrics, and phase markers from supplement logs. |\n| `make cluster-dry-run` | Print the current cluster gate and feasibility config without submitting Slurm jobs. |\n| `make local-smoke` | Full fast local evidence chain: checks, MNIST smoke, supplement smoke-all. |\n| `make full-local` | Strongest laptop-feasible evidence chain: checks, 200-round MNIST, supplement smoke-all. |\n| `make clean` | Remove local outputs/caches while preserving tracked placeholders. |\n\n## What works locally now\n\n- `make check` — first-party tests and lint.\n- `make mnist-paper` — 200-round MNIST paper-sanity run; latest local result: RoLoRA `0.4794` \u003e LoRA `0.4631` \u003e FFA-LoRA `0.3767`.\n- `make supplement-smoke-all` — authors' FederatedScope supplement runs locally in `rolora`, `lora`, and `ffa_lora` modes.\n- `make table1-pilot-all` — Table-1-shaped local QNLI pilot: RoBERTa-base, 3 clients, 3 rounds, 3 local batches.\n- `make table1-pilot-summary` — parses local pilot logs into a metrics table.\n- `make table1-medium MODE=rolora` — stronger local pilot; verified for `rolora` on 2026-05-14. Run `make table1-medium-all` next to close the local all-mode rung.\n- `make diagnostics-summary PREFIX=table1_medium` — parses existing medium logs into the diagnostic table shape; richer update-norm/frozen-factor fields still need supplement instrumentation.\n- `make cluster-dry-run` — documents that cluster execution remains TA-gated rather than silently pretending Slurm is ready.\n\n## What is not local yet\n\nFull paper Table 1 is RoBERTa-Large across MNLI/QQP/QNLI, 3/20/50 clients, three methods, and multiple seeds. The plan estimates hundreds of GPU-hours, so local runs are pipeline and mechanism evidence, not paper-comparable Table 1 numbers. The explicit next gate is `make roberta-large-feasibility MODE=rolora` on a GPU-capable machine.\n\n## Status\n**Week 3 — pre-launch.** Main env is pinned, MNIST sanity checks run locally, the authors' supplement is installed in an isolated Python 3.9 env, and local Table-1-shaped pilots are runnable. Next local step: `make table1-medium-all` if runtime is acceptable; after that, run the explicit `make roberta-large-feasibility MODE=rolora` gate on a GPU-capable machine. Full RoBERTa-Large reproduction starts once DelftBlue/DAIC access is available. See [`docs/kickoff.md`](docs/kickoff.md) for the remaining team/process items.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fviftode4%2Fscalable-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fviftode4%2Fscalable-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fviftode4%2Fscalable-learning/lists"}