https://github.com/koukyosyumei/eliminating-label-leakage-in-tree-based-vertical-federated-learning
Code for "Eliminating Label Leakage in Tree-based Vertical Federated Learning"
https://github.com/koukyosyumei/eliminating-label-leakage-in-tree-based-vertical-federated-learning
Last synced: 4 months ago
JSON representation
Code for "Eliminating Label Leakage in Tree-based Vertical Federated Learning"
- Host: GitHub
- URL: https://github.com/koukyosyumei/eliminating-label-leakage-in-tree-based-vertical-federated-learning
- Owner: Koukyosyumei
- Created: 2023-09-27T05:22:23.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-09-27T05:26:21.000Z (about 2 years ago)
- Last Synced: 2025-02-28T22:51:51.093Z (8 months ago)
- Language: C++
- Homepage:
- Size: 122 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Eliminating Label Leakage in Tree-based Vertical Federated Learning
## 1. Dependencies
- C++11 compatible compiler
- Boost 1.65 or later
- Python 3.7 or later
We use the following version for the experiments.
```
- Python 3.10.11
- g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
- Boost 1.71
```
## 2. Usage
### 2.1. Build from source
```shell
pip install -e .
./script/build.sh
```
### 2.2. Download datasets
```shell
./script/download.sh
```
### 2.3. Run experiments
- Example
```shell
for VAL_D in breastcancer parkinson phishing obesity avila
do
sudo ./script/run.sh -d ${VAL_D} -m r -r 5 -a 1.0 -c 0 -h 6 -e 1.0 -f 0.5 -n -1 -p 5 -z 5 -u ${OUTPUT_DIR} -t result/temp
sudo ./script/run.sh -d ${VAL_D} -m x -r 5 -a 0.3 -c 0 -h 6 -e 0.6 -f 0.5 -n -1 -p 5 -z 5 -u ${OUTPUT_DIR} -t result/temp
done
for VAL_D in drive fars pucrio fmnist
do
sudo ./script/run.sh -d ${VAL_D} -m r -r 5 -a 1.0 -c 0 -h 6 -e 1.0 -f 0.5 -n -1 -p 1 -z 5 -w 1000 -y 100 -u ${OUTPUT_DIR} -t result/temp
sudo ./script/run.sh -d ${VAL_D} -m x -r 5 -a 0.3 -c 0 -h 6 -e 0.6 -f 0.5 -n -1 -p 1 -z 5 -w 1000 -y 100 -u ${OUTPUT_DIR} -t result/temp
done
```
- Basic Arguments
```
-u : (str) path to the folder to save the final results.
-d : (str) name of dataset.
-m : (str) type of training algorithm. `r`: Random Forest, `x`: XGBoost, `g`: GraftingForest
```
- Advanced Arguments
```
-t : (str) path to the folder to save the temporary results.
-z : (int) number of trials.
-p : (int) number of parallelly executed experiments.
-n : (int) number of data records sampled for training.
-f : (float) ratio of features owned by the active party.
-i : (int) setting of feature importance. -1: normal, 1: unbalance
-r : (int) total number of rounds for training.
-j : (int) minimum number of samples within a leaf.
-h : (int) maximum depth.
-a : (float) learning rate of XGBoost.
-e : (float) coefficient of edge weight (tau in our paper).
-k : (float) weight for community variables.
-v : (str) clustering type. `kmeans` or `xmeans`
-l : (int) maximum number of iterations of Louvain
-x : (optional) baseline union attack
-w : chunk size for memory efficient adjacency matrix
-y : intra-chunk edge weight for memory efficient adjacency matrix
-b : (float) epsilon of ID-LMID.
-c : (int) number of completely secure rounds.
-o : (float) epsilon of LP-MST.
-g : (optional) draw the extracted clusters.
-q : (optional) draw trees as html files.
```
### 3. Note
The above simulations utilize computation on the plaintext. Experiments using Paiilier Encryption can also be performed with `-m sr` and `-m sx`, but they are time-consuming. The results are identical to those obtained with `-m r` and `-m x`, respectively. We also provide MPI backend as a reference implementation in `mpiscipt` folder, but it does not support some features.