https://github.com/TalwalkarLab/leaf

Leaf: A Benchmark for Federated Settings
https://github.com/TalwalkarLab/leaf

Last synced: about 1 month ago
JSON representation

Leaf: A Benchmark for Federated Settings

Host: GitHub
URL: https://github.com/TalwalkarLab/leaf
Owner: TalwalkarLab
License: bsd-2-clause
Created: 2018-10-26T15:16:27.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2023-03-24T22:54:33.000Z (about 2 years ago)
Last Synced: 2024-11-12T15:43:12.454Z (7 months ago)
Language: Python
Size: 6.63 MB
Stars: 850
Watchers: 22
Forks: 244
Open Issues: 33
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

FLsystem-paper - LEAF
awesome-federated-computing - LEAF

README

# LEAF: A Benchmark for Federated Settings

## Resources

* **Homepage:** [leaf.cmu.edu](https://leaf.cmu.edu)
* **Paper:** ["LEAF: A Benchmark for Federated Settings"](https://arxiv.org/abs/1812.01097)

## Datasets

1. FEMNIST

* **Overview:** Image Dataset
* **Details:** 62 different classes (10 digits, 26 lowercase, 26 uppercase), images are 28 by 28 pixels (with option to make them all 128 by 128 pixels), 3500 users
* **Task:** Image Classification

2. Sentiment140

* **Overview:** Text Dataset of Tweets
* **Details** 660120 users
* **Task:** Sentiment Analysis

3. Shakespeare

* **Overview:** Text Dataset of Shakespeare Dialogues
* **Details:** 1129 users (reduced to 660 with our choice of sequence length. See [bug](https://github.com/TalwalkarLab/leaf/issues/19).)
* **Task:** Next-Character Prediction

4. Celeba

* **Overview:** Image Dataset based on the [Large-scale CelebFaces Attributes Dataset](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)
* **Details:** 9343 users (we exclude celebrities with less than 5 images)
* **Task:** Image Classification (Smiling vs. Not smiling)

5. Synthetic Dataset

* **Overview:** We propose a process to generate synthetic, challenging federated datasets. The high-level goal is to create devices whose true models are device-dependant. To see a description of the whole generative process, please refer to the paper
* **Details:** The user can customize the number of devices, the number of classes and the number of dimensions, among others
* **Task:** Classification

6. Reddit

* **Overview:** We preprocess the Reddit data released by [pushshift.io](https://files.pushshift.io/reddit/) corresponding to December 2017.
* **Details:** 1,660,820 users with a total of 56,587,343 comments.
* **Task:** Next-word Prediction.

## Notes

- Install the libraries listed in ```requirements.txt```
- I.e. with pip: run ```pip3 install -r requirements.txt```
- Go to directory of respective dataset for instructions on generating data
- in MacOS check if ```wget``` is installed and working
- ```models``` directory contains instructions on running baseline reference implementations

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/TalwalkarLab/leaf

Awesome Lists containing this project

README