An open API service indexing awesome lists of open source software.

https://github.com/xuefeng-xu/fedps

Federated data Preprocessing via aggregated Statistics
https://github.com/xuefeng-xu/fedps

data-preprocessing federated-learning python scikit-learn statistics

Last synced: 12 months ago
JSON representation

Federated data Preprocessing via aggregated Statistics

Awesome Lists containing this project

README

          

# FedPS

FedPS is a Python module designed for data preprocessing in Federated Learning, primarily leveraging aggregated statistics. The preprocessing workflow involves the following five steps:

1. Local Statistics Estimation: Clients estimate local statistics from their local data.
2. Aggregation: The server receives the local statistics and performs aggregation.
3. Global Parameter Calculation: The server calculates the global preprocessing parameters.
4. Parameter Distribution: The global parameters are then sent back to the clients.
5. Data Preprocessing: Clients apply the preprocessing to their local data.


Overview

## Installation

### Dependencies

- Python (>= 3.9)
- Scikit-learn (~= 1.6)
- NumPy (>= 1.20)
- DataSketches
- PyZMQ

### Building from source

1. Create a Python env

```bash
conda create --name fedps python=3.9
conda activate fedps
```

2. Clone this project

```bash
git clone https://github.com/xuefeng-xu/fedps.git
```

3. Build the project

```bash
cd fedps
pip install .
```

## Usage

1. Set up communication channels

```python
# Client1 channel
from fedps.channel import ClientChannel

channel = ClientChannel(
local_ip="127.0.0.1", local_port=5556,
remote_ip="127.0.0.1", remote_port=5555,
)
```

```python
# Client2 channel
from fedps.channel import ClientChannel

channel = ClientChannel(
local_ip="127.0.0.1", local_port=5557,
remote_ip="127.0.0.1", remote_port=5555,
)
```

```python
# Server channel
from fedps.channel import ServerChannel

channel = ServerChannel(
local_ip="127.0.0.1", local_port=5555,
remote_ip=["127.0.0.1", "127.0.0.1"],
remote_port=[5556, 5557],
)
```

2. Specify `FL_type` and `role` in the preprocessor

- `FL_type`: "H" (Horizontal) or "V" (Vertical)

- `role`: "client" or "server"

```python
# Client1 code example
from fedps.preprocessing import MinMaxScaler

X = [[-1, 2], [-0.5, 6]]
est = MinMaxScaler(FL_type="H", role="client", channel=channel)
Xt = est.fit_transform(X)
print(Xt)
```

```python
# Client2 code example
from fedps.preprocessing import MinMaxScaler

X = [[0, 10], [1, 18]]
est = MinMaxScaler(FL_type="H", role="client", channel=channel)
Xt = est.fit_transform(X)
print(Xt)
```

```python
# Server code example
from fedps.preprocessing import MinMaxScaler

est = MinMaxScaler(FL_type="H", role="server", channel=channel)
est.fit()
```

3. Run the script

```bash
# Run in three terminals
python client1.py
python client2.py
python server.py
```

PS: See more cases in the [example](example) folder.

## Available preprocessing modules

- Discretization
- [`KBinsDiscretizer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html)

- Encoding
- [`LabelEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html)
- [`LabelBinarizer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelBinarizer.html)
- [`MultiLabelBinarizer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html)
- [`OneHotEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html)
- [`OrdinalEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html)
- [`TargetEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.TargetEncoder.html)

- Scaling
- [`MaxAbsScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html)
- [`MinMaxScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html)
- [`Normalizer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html)
- [`RobustScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html)
- [`StandardScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)

- Transformation
- [`PowerTransformer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PowerTransformer.html)
- [`QuantileTransformer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html)
- [`SplineTransformer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.SplineTransformer.html)

- Imputation
- [`IterativeImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html) (experimental)
- [`KNNImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html)
- [`SimpleImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html)

## Differences from Scikit-learn

- Currently, this library does not support sparse data.
- [`KBinsDiscretizer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html), [`StandardScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html), and [`SplineTransformer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.SplineTransformer.html) cannot set the `sample_weight` parameter in their fit methods.
- [`IterativeImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html) does not support the `sample_posterior` and `n_nearest_features` parameters.
- [`KNNImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html) does not support custom weight funtion and distance metric.

## Acknowledgement

This project is build on [Scikit-learn](https://github.com/scikit-learn/scikit-learn).