https://github.com/epfml/gradientalignment
https://github.com/epfml/gradientalignment
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/epfml/gradientalignment
- Owner: epfml
- Created: 2021-05-27T15:01:40.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2022-03-31T06:58:59.000Z (almost 4 years ago)
- Last Synced: 2025-01-02T14:28:37.071Z (about 1 year ago)
- Language: Python
- Size: 5.87 MB
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Implicit Gradient Alignment in Distributed and Federated Learning
**Requirements:**
It requires Python 3 with the installation of torch, torchvision and wandb is recommended.
We recommend to use pip to install them
pip install torch
pip install torchvision
**Weights and Biases:**
To log the experiments using wandb, a free account of Weights and biases is needed. Please go to https://www.wandb.com to create a free account.
Once your account is created you need to instal wandb.
With pip3 it requires to run:
pip install wandb
Then login in your account by running in the terminal the line:
wandb login WANDB_API_KEY
where WANDB_API_KEY can be found in https://app.wandb.ai/settings after loging in.
You are done and ready to run experiments.
**Code samples:**
For FedAvg run it like:
python interface.py --batch_size=100 --gpu_ids=[0] --learning_rate=0.2 --model=cnn_cifar --nsubsets=10 --nprocesses=10 --nrounds=2001 --nlogging_steps=10 --port=28 --nsteps=20 --task=CIFAR10 --wandb_exp_name=test_cifar --wandb --weight_decay=0.001 --percentage_iid=1.0
For MiniBatch SGD (largeBatchSGD) run it like:
python interface.py --batch_size=2400 --gpu_ids=[0,1,3,4] --learning_rate=0.2 --model=cnn_mnist --nsubsets=47 --nprocesses=10 --nrounds=2001 --nlogging_steps=5 --port=83--task=EMNIST --wandb_exp_name=test_emnist --wandb --weight_decay=0.001 --percentage_iid=0.0 --largeBatchSGD
For SCAFFOLD run like:
python interface.py --batch_size=2400 --gpu_ids=[0] --learning_rate=0.05 --model=cnn_mnist --nsubsets=47 --nprocesses=10 --nrounds=2001 --nlogging_steps=5 --port=49 --server_momentum=0.0 --nsteps=10 --task=EMNIST --wandb_exp_name=test_emnist --wandb --weight_decay=0.001 --percentage_iid=0.0 --model=cnn_mnist --scaffold
For FedGA run like:
python interface.py --batch_size=2400 --gpu_ids=[0] --learning_rate=0.05 --model=cnn_mnist --nprocesses=10 --nrounds=2001 --nsubsets=47 --nlogging_steps=5 --port=59 --server_momentum=0.0 --nsteps=10 --task=EMNIST --wandb_exp_name=test_emnist --wandb --weight_decay=0.001 --percentage_iid=0.0 --grad_align --beta=0.025
**Possible variables:**
--**nsubsets**, default=100, type=int, choices=range(1, 1001), help="Total number of clients to be used"
--**nprocesses**, default=1, type=int, choices=range(1, 48), help="number of processes, i.e., number of clients sampled in each round"
--**batch_size**, default=50, type=int, choices=range(1, 5001), help="batch size of every process"
--**nsteps**, default=5, type=int, choices=range(0, 2001), help="number of local steps per round"
--**nrounds**, default=1001, type=int, choices=range(1, 400002), help="number of rounds"
--**nlogging_steps**, default=100, type=int, choices=range(1, 1001), help="number of rounds for logging, i.e., there is logging whenever round % nlogging_steps == 0"
--**learning_rate**, default=0.01, type=float, help="learning rate for each of the local steps"
--**task**, default="MNIST, type=str, choices={"MNIST", "EMNIST", "CIFAR10", "CIFAR100"}, help="Dataset to be used"
--**model**, default="cnn_mnist", choices={"cnn_mnist", "cnn_cifar", "resnet18"}, type=str, help="model to be used"
--**gradient_clipping**, boolean, help="Clip gradients of workers"
--**clipping_norm**, default=1.0, type=float,
help="max magnitud of L_2 norm when using gradient clipping with full gradient"
--**target_acc**, default=100.0, type=float, help="Target Accuracy of this run, it stops when reaching this accuracy if the parameter is set"
--**wandb**, boolean, help="Use weights and biases logging system. You need an account and to be logged in for this to work."
--**wandb_exp_name**, default='random_experiments', type=str, help="Experiment name in wandb"
--**load_seed, type=int**, default=-1, help="Used to set the random seed of the algorithm."
--**scheduler**, default="[1000000000], type=str, help="scheduler decreases the learning rates when the the round is in the given list"
--**lr_decay**, default=0.1, type=float, help="Learning rate decay"
--**port**, default=21, type=int, choices=range(0, 100), help="final digit of port number for the distributed process. No two processes with the same port can run simultaneously. "
--**gpu_ids**, default="[0,1], type=str, help="a list with the indices of the gpus to run the experiment. Processes will be distributed evenly among them."
--**percentage_iid**, default=1.0, type=float, help="percentage of iid data in workers' partition. 1.0 for fully IID data and 0.0 for completely heterogeneous data."
--**weight_decay**, default=0.001, type=float, help="Optimizer's weight decay"
--**server_momentum**, default=0.0, type=float, help="server momentum"
--**momentum**, default=0.0, type=float, help="local momentum"
--**server_learning_rate**, default=1.0, type=float, help="server step learning rate**, default is 1.0"
--**server_nesterov**, boolean, help="Use Nesterov's server momentum"
--**batchnorm**, boolean, help="Use batch normalization (only available in Resnet Models)"
--**beta**, default=0.1, type=float, help="Corresponds to the beta constant in the FedGA algorithm"
--**fg_batch_size**, default=-1, type=int, help="Full gradient Batch size"
--**plot_grad_alignment**, boolean, help="Set to true to plot the gradient variance after each round."
--**fedavg**, boolean, help="Use FedAvg algorithm, it is also used by default if no other is set"
--**grad_align**, boolean, help="Use the FedGA algorithm"
--**scaffold**, boolean, help="Use SCAFFOLD algorithm."
--**largeBatchSGD**, boolean, help="Use largeBatch SGD"