Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Guitaricet/relora
Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
https://github.com/Guitaricet/relora
deep-learning distributed-training llama nlp peft transformer
Last synced: 29 days ago
JSON representation
Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
- Host: GitHub
- URL: https://github.com/Guitaricet/relora
- Owner: Guitaricet
- License: apache-2.0
- Created: 2023-04-27T20:26:35.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-21T20:09:58.000Z (8 months ago)
- Last Synced: 2024-11-29T12:06:48.034Z (30 days ago)
- Topics: deep-learning, distributed-training, llama, nlp, peft, transformer
- Language: Jupyter Notebook
- Homepage: https://arxiv.org/abs/2307.05695
- Size: 1.89 MB
- Stars: 436
- Watchers: 7
- Forks: 39
- Open Issues: 5
-
Metadata Files:
- Readme: README.dev.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
Some script to check that the most common training reigmes work.
```
torchrun --nproc-per-node 2 torchrun_main.py \
--dataset_path preprocessed_data/wikitext_wikitext-2-v1_EleutherAI_pythia-1.4b_512 \
--model_name_or_path EleutherAI/pythia-1.4b \
--use_peft \
--relora 10 \
--model_revision step1000 \
--batch_size 4 \
--total_batch_size 96 \
--lr 5e-4 \
--max_length 512 \
--eval_every 20 \
--save_every 20 \
--num_training_steps 40 \
--distributed_type ddp \
--optimizer adam_zero \
--tags debugtorchrun --nproc-per-node 2 torchrun_main.py \
--dataset_path preprocessed_data/wikitext_wikitext-2-v1_EleutherAI_pythia-1.4b_512 \
--model_name_or_path EleutherAI/pythia-1.4b \
--model_revision step1000 \
--batch_size 6 \
--total_batch_size 96 \
--lr 5e-4 \
--max_length 512 \
--eval_every 2 \
--save_every 10 \
--num_training_steps 20 \
--distributed_type ddp \
--tags debug,fsdp_debugtorchrun --nproc-per-node 2 torchrun_main.py \
--dataset_path preprocessed_data/wikitext_wikitext-2-v1_t5-base_512 \
--model_config configs/llama_250m.json \
--batch_size 24 \
--total_batch_size 96 \
--lr 5e-4 \
--max_length 512 \
--eval_every 2 \
--save_every 10 \
--num_training_steps 20 \
--distributed_type ddp \
--tags debug,fsdp_debugtorchrun --nproc-per-node 2 torchrun_main.py \
--dataset_path preprocessed_data/wikitext_wikitext-2-v1_t5-base_512 \
--model_config configs/llama_250m.json \
--batch_size 24 \
--total_batch_size 96 \
--lr 5e-4 \
--max_length 512 \
--eval_every 2 \
--save_every 10 \
--num_training_steps 20 \
--distributed_type fsdp \
--tags debug,fsdp_debugtorchrun --nproc-per-node 2 torchrun_main.py \
--dataset_path preprocessed_data/wikitext_wikitext-2-v1_gpt2_512 \
--model_config configs/llama_250m_50K.json \
--batch_size 24 \
--total_batch_size 96 \
--lr 5e-4 \
--max_length 512 \
--eval_every 2 \
--save_every 10 \
--num_training_steps 20 \
--distributed_type ddp \
--dtype float32 \
--tags debug,fsdp_debugtorchrun --nproc-per-node 2 torchrun_main.py \
--model_config configs/llama_250m.json \
--batch_size 24 \
--total_batch_size 96 \
--lr 5e-4 \
--max_length 512 \
--eval_every 2 \
--save_every 10 \
--num_training_steps 20000 \
--distributed_type fsdp \
--tags debug,fsdp_debugtorchrun --nproc-per-node 2 torchrun_main.py \
--model_config configs/llama_250m.json \
--batch_size 24 \
--total_batch_size 96 \
--lr 5e-4 \
--max_length 512 \
--eval_every 2 \
--save_every 10 \
--num_training_steps 20000 \
--distributed_type fsdp \
--tags debug,fsdp_debugtorchrun --nproc-per-node 2 torchrun_main.py \
--model_config configs/llama_250m.json \
--batch_size 24 \
--total_batch_size 96 \
--lr 1e-3 \
--max_length 512 \
--use_peft \
--relora 10 \
--cycle_length 10 \
--restart_warmup_steps 5 \
--scheduler cosine_restarts \
--warmup_steps 5 \
--reset_optimizer_on_relora False \
--optimizer_magnitude_pruning 0.9 \
--num_training_steps 20000 \
--save_every 5000 \
--eval_every 5000 \
--warmed_up_model checkpoints/llama_250m-2023-06-09-11-29-56/model_5000 \
--distributed_type fsdp \
--tags debug,fsdp_debugtorchrun --nproc-per-node 2 torchrun_main.py \
--model_config configs/llama_250m.json \
--batch_size 24 \
--total_batch_size 96 \
--lr 1e-3 \
--max_length 512 \
--use_peft \
--relora 10 \
--cycle_length 10 \
--restart_warmup_steps 5 \
--scheduler cosine_restarts \
--warmup_steps 5 \
--reset_optimizer_on_relora False \
--optimizer_magnitude_pruning 0.9 \
--num_training_steps 20000 \
--save_every 5000 \
--eval_every 5000 \
--warmed_up_model checkpoints/llama_250m-2023-06-09-11-29-56/model_5000 \
--distributed_type fsdp \
--tags debug,fsdp_debug```