An open API service indexing awesome lists of open source software.

https://github.com/xtra-computing/fedsim

A coupled vertical federated learning framework that boosts the model performance with record similarities (NeurIPS 2022)
https://github.com/xtra-computing/fedsim

federated-learning pytorch vertical-federated-learning

Last synced: 7 months ago
JSON representation

A coupled vertical federated learning framework that boosts the model performance with record similarities (NeurIPS 2022)

Awesome Lists containing this project

README

          

# FedSim
[![GitHub license](https://img.shields.io/github/license/Xtra-Computing/FedSim)](https://github.com/Xtra-Computing/FedSim/edit/main/LICENSE)
![PyTorch](https://img.shields.io/badge/torch-1.8.2-orange)

FedSim is a **coupled vertical federated learning framework** that boosts the training with record similarities.

## Requirements
1. Install conda 4.14 following https://www.anaconda.com/products/distribution
2. Clone this repo by
```bash
git clone https://github.com/JerryLife/FedSim.git
```
3. Create environment (named `fedsim`) and install required basic modules.
```bash
conda env create -f environment.yml
conda activate fedsim
```
4. Install `torch` and `torchvision` according to your cuda version with `pip`. For RTX 3090, we installed `torch==1.8.2` and `torchvision==0.9.2` as below.
```bash
pip3 install torch==1.8.2 torchvision==0.9.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111
```
5. Ensure all the required folders are created (which should exist upon git clone).
```bash
mkdir -p runs ckp log cache
```
## Datasets
In this repo, due to the size limit, we include two datasets `house` and `game` in the `data/` folder.
```
data
├── beijing (house)
│ ├── airbnb_clean.csv (Secondary)
│ └── house_clean.csv (Primary)
└── hdb (hdb)
├── hdb_clean.csv (Primary)
└── school_clean.csv (Secondary)
```
## Linkage and Training
The linkage and training of each dataset is combined in a single script.
### FedSim without adding noise
The scripts without adding noise are located under `src/` in the format of `src/train__.py`. You can run each script by

> python src/train__.py [-g gpu_index] [-p perturbed_noise_on_similarity] [-k number_of_neighbors] [--mlp-merge] [-ds] [-dw]

* `-g/--gpu`: GPU index to run this script. If GPU of this index is not available, CPU will be used instead.
* `-k/--top-k`: Number of neighbors to extract from possible matches, which should be less than the value of "knn_k". ($K$ in the paper)
* `-p/--leak-p`: The probability of leakage of bloom filters. ($\tau$ in the paper)
* `--mlp-merge`: whether to replace CNN merge model with MLP merge model
* `-ds/--disable-sort`: whether to distable the sort gate
* `-dw/--disable-weight`: whether to disable the weight gate

Taking house dataset dataset as an example:
```bash
python src/train_beijing_fedsim.py -g 1 -p 1e0 -k 5 -ds
```
runs FedSim on house dataset with $\tau=1$ (no added noise), $K=5$, merging with CNN, disabling sort gate, enabling weight gate.

### FedSim with noise added
The scripts with adding noise are located in `src/priv_scripts` in the same format as the scripts without noise. The only difference are some hyperparamter settings. You may run these scripts by similar command. For example,
```bash
python src/train_beijing_fedsim.py -g 1 -p 1e-2 -k 5 -ds
```
runs FedSim on house dataset with noise satisfying $\tau=0.01$ added, $K=5$, merging with CNN, disabling sort gate, enabling weight gate.

## Citation
```bib
@inproceedings{NEURIPS2022_84b74416,
author = {Wu, Zhaomin and Li, Qinbin and He, Bingsheng},
booktitle = {Advances in Neural Information Processing Systems},
editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
pages = {21087--21100},
publisher = {Curran Associates, Inc.},
title = {A Coupled Design of Exploiting Record Similarity for Practical Vertical Federated Learning},
url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/84b744165a0597360caad96b06e69313-Paper-Conference.pdf},
volume = {35},
year = {2022}
}
```