Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/brunocampos01/federated-learning-for-text-generation
Machine learning project using federated learning for text generation
https://github.com/brunocampos01/federated-learning-for-text-generation
federated-learning gutenberg-project machine-learning natural-language-processing next-word-prediction python shakespeare text-generation tff
Last synced: about 2 months ago
JSON representation
Machine learning project using federated learning for text generation
- Host: GitHub
- URL: https://github.com/brunocampos01/federated-learning-for-text-generation
- Owner: brunocampos01
- License: mit
- Created: 2021-12-05T17:50:16.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2024-05-05T03:31:23.000Z (8 months ago)
- Last Synced: 2024-05-07T18:20:16.955Z (8 months ago)
- Topics: federated-learning, gutenberg-project, machine-learning, natural-language-processing, next-word-prediction, python, shakespeare, text-generation, tff
- Language: Python
- Homepage:
- Size: 9.42 MB
- Stars: 10
- Watchers: 4
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# (WIP) Federated Learning for Text Generation
![Python 3](https://img.shields.io/badge/Python-3-red.svg)
![License](https://img.shields.io/badge/Code%20License-MIT-red.svg)## Describe Project
When a user is typing text on a mobile device it can be useful to suggest the next word as this will optimize typing time and also avoid possible errors. However, this data has private information, which limits its movement to a centralized environment. In this paper we will demonstrate how to predict the next word while guaranteeing users privacy without moving the data around.## Objectives
This work has how goals to predict the next word, ensuring data privacy. Sheakpeare pieces will be used as input data. These will be obtained, described, pre-processed and explored for a better understanding. Next, the federation environment will be created where each character in Sheakpeare's works will be a participating user, and their speeches will be the input dataset in the prediction model. From this scenario, in which data is only found on mobile devices, Federated Learning will be used to perform the model training in a shared way without moving the data to a centralized environment. For this, steps will be carried out to load a pre-trained global model from a central server, data pre-processing and model training on the user's own device. Then the model parameters will be forwarded to the central server to update the federated average and train the global model.## Data Source
Data from [Project Gutenberg](https://www.gutenberg.org/): [Shakespeare play](http://www.gutenberg.org/files/100/old/1994-01-100.zip)## Algorithms
TODO## Quickstart
- [Data Undertanding, data cleansing, data exploration](https://github.com/brunocampos01/federated-Learning-for-text-generation/tree/master/notebooks)
- [Prepare federation, preprocess data, model, fed-avg, retraining and evaluate](https://github.com/brunocampos01/federated-Learning-for-text-generation/tree/master/notebooks)## Requirements
This project is tested with:| Requisite | Version |
|----------------|----------|
| Python | 3.8.10 |
| Pip | 21.2.4 |
| CUDA (optional)| 11.0 |- [Install CUDA](https://www.tensorflow.org/install/gpu#install_cuda_with_apt)
## Image Display
#### WordCloud
#### N-grams
---