{"id":13992585,"url":"https://github.com/Guitaricet/relora","last_synced_at":"2025-07-22T16:30:50.685Z","repository":{"id":163213084,"uuid":"633579562","full_name":"Guitaricet/relora","owner":"Guitaricet","description":"Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates","archived":false,"fork":false,"pushed_at":"2024-04-21T20:09:58.000Z","size":1984,"stargazers_count":458,"open_issues_count":5,"forks_count":39,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-07-09T22:08:03.321Z","etag":null,"topics":["deep-learning","distributed-training","llama","nlp","peft","transformer"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2307.05695","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Guitaricet.png","metadata":{"files":{"readme":"README.dev.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-04-27T20:26:35.000Z","updated_at":"2025-07-06T05:43:24.000Z","dependencies_parsed_at":"2024-01-15T17:33:17.871Z","dependency_job_id":"184f29b0-9970-4f3b-bbb0-bec90ad724f3","html_url":"https://github.com/Guitaricet/relora","commit_stats":{"total_commits":214,"total_committers":3,"mean_commits":71.33333333333333,"dds":"0.19626168224299068","last_synced_commit":"176f37633fe02019835387258ddabcf6d91e328d"},"previous_names":["guitaricet/relora","guitaricet/peft_pretraining"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/Guitaricet/relora","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Guitaricet%2Frelora","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Guitaricet%2Frelora/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Guitaricet%2Frelora/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Guitaricet%2Frelora/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Guitaricet","download_url":"https://codeload.github.com/Guitaricet/relora/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Guitaricet%2Frelora/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264664041,"owners_count":23646317,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","distributed-training","llama","nlp","peft","transformer"],"created_at":"2024-08-09T14:02:02.954Z","updated_at":"2025-07-22T16:30:50.088Z","avatar_url":"https://github.com/Guitaricet.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"Some script to check that the most common training reigmes work.\n\n```\ntorchrun --nproc-per-node 2 torchrun_main.py \\\n    --dataset_path preprocessed_data/wikitext_wikitext-2-v1_EleutherAI_pythia-1.4b_512 \\\n    --model_name_or_path EleutherAI/pythia-1.4b \\\n    --use_peft \\\n    --relora 10 \\\n    --model_revision step1000 \\\n    --batch_size 4 \\\n    --total_batch_size 96 \\\n    --lr 5e-4 \\\n    --max_length 512 \\\n    --eval_every 20 \\\n    --save_every 20 \\\n    --num_training_steps 40 \\\n    --distributed_type ddp \\\n    --optimizer adam_zero \\\n    --tags debug\n\n\ntorchrun --nproc-per-node 2 torchrun_main.py \\\n    --dataset_path preprocessed_data/wikitext_wikitext-2-v1_EleutherAI_pythia-1.4b_512 \\\n    --model_name_or_path EleutherAI/pythia-1.4b \\\n    --model_revision step1000 \\\n    --batch_size 6 \\\n    --total_batch_size 96 \\\n    --lr 5e-4 \\\n    --max_length 512 \\\n    --eval_every 2 \\\n    --save_every 10 \\\n    --num_training_steps 20 \\\n    --distributed_type ddp \\\n    --tags debug,fsdp_debug\n\n\ntorchrun --nproc-per-node 2 torchrun_main.py \\\n    --dataset_path preprocessed_data/wikitext_wikitext-2-v1_t5-base_512 \\\n    --model_config configs/llama_250m.json \\\n    --batch_size 24 \\\n    --total_batch_size 96 \\\n    --lr 5e-4 \\\n    --max_length 512 \\\n    --eval_every 2 \\\n    --save_every 10 \\\n    --num_training_steps 20 \\\n    --distributed_type ddp \\\n    --tags debug,fsdp_debug\n\n\ntorchrun --nproc-per-node 2 torchrun_main.py \\\n    --dataset_path preprocessed_data/wikitext_wikitext-2-v1_t5-base_512 \\\n    --model_config configs/llama_250m.json \\\n    --batch_size 24 \\\n    --total_batch_size 96 \\\n    --lr 5e-4 \\\n    --max_length 512 \\\n    --eval_every 2 \\\n    --save_every 10 \\\n    --num_training_steps 20 \\\n    --distributed_type fsdp \\\n    --tags debug,fsdp_debug\n\n\ntorchrun --nproc-per-node 2 torchrun_main.py \\\n    --dataset_path preprocessed_data/wikitext_wikitext-2-v1_gpt2_512 \\\n    --model_config configs/llama_250m_50K.json \\\n    --batch_size 24 \\\n    --total_batch_size 96 \\\n    --lr 5e-4 \\\n    --max_length 512 \\\n    --eval_every 2 \\\n    --save_every 10 \\\n    --num_training_steps 20 \\\n    --distributed_type ddp \\\n    --dtype float32 \\\n    --tags debug,fsdp_debug\n\n\ntorchrun --nproc-per-node 2 torchrun_main.py \\\n    --model_config configs/llama_250m.json \\\n    --batch_size 24 \\\n    --total_batch_size 96 \\\n    --lr 5e-4 \\\n    --max_length 512 \\\n    --eval_every 2 \\\n    --save_every 10 \\\n    --num_training_steps 20000 \\\n    --distributed_type fsdp \\\n    --tags debug,fsdp_debug\n\n\ntorchrun --nproc-per-node 2 torchrun_main.py \\\n    --model_config configs/llama_250m.json \\\n    --batch_size 24 \\\n    --total_batch_size 96 \\\n    --lr 5e-4 \\\n    --max_length 512 \\\n    --eval_every 2 \\\n    --save_every 10 \\\n    --num_training_steps 20000 \\\n    --distributed_type fsdp \\\n    --tags debug,fsdp_debug\n\n\ntorchrun --nproc-per-node 2 torchrun_main.py \\\n    --model_config configs/llama_250m.json \\\n    --batch_size 24 \\\n    --total_batch_size 96 \\\n    --lr 1e-3 \\\n    --max_length 512 \\\n    --use_peft \\\n    --relora 10 \\\n    --cycle_length 10 \\\n    --restart_warmup_steps 5 \\\n    --scheduler cosine_restarts \\\n    --warmup_steps 5 \\\n    --reset_optimizer_on_relora False \\\n    --optimizer_magnitude_pruning 0.9 \\\n    --num_training_steps 20000 \\\n    --save_every 5000 \\\n    --eval_every 5000 \\\n    --warmed_up_model checkpoints/llama_250m-2023-06-09-11-29-56/model_5000 \\\n    --distributed_type fsdp \\\n    --tags debug,fsdp_debug\n\n\ntorchrun --nproc-per-node 2 torchrun_main.py \\\n    --model_config configs/llama_250m.json \\\n    --batch_size 24 \\\n    --total_batch_size 96 \\\n    --lr 1e-3 \\\n    --max_length 512 \\\n    --use_peft \\\n    --relora 10 \\\n    --cycle_length 10 \\\n    --restart_warmup_steps 5 \\\n    --scheduler cosine_restarts \\\n    --warmup_steps 5 \\\n    --reset_optimizer_on_relora False \\\n    --optimizer_magnitude_pruning 0.9 \\\n    --num_training_steps 20000 \\\n    --save_every 5000 \\\n    --eval_every 5000 \\\n    --warmed_up_model checkpoints/llama_250m-2023-06-09-11-29-56/model_5000 \\\n    --distributed_type fsdp \\\n    --tags debug,fsdp_debug\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGuitaricet%2Frelora","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FGuitaricet%2Frelora","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGuitaricet%2Frelora/lists"}