https://github.com/mar-muel/artificial-self-amld-2020

Workshop material for the AMLD 2020 workshop on "Meet your Artificial Self: Generate text that sounds like you"
https://github.com/mar-muel/artificial-self-amld-2020

datasets language-model textgeneration transfer-learning workshop

Last synced: 2 months ago
JSON representation

Workshop material for the AMLD 2020 workshop on "Meet your Artificial Self: Generate text that sounds like you"

Host: GitHub
URL: https://github.com/mar-muel/artificial-self-amld-2020
Owner: mar-muel
License: mit
Created: 2019-11-26T15:52:23.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2023-06-12T21:27:28.000Z (almost 2 years ago)
Last Synced: 2025-02-27T21:48:03.913Z (3 months ago)
Topics: datasets, language-model, textgeneration, transfer-learning, workshop
Language: Jupyter Notebook
Size: 42.2 MB
Stars: 81
Watchers: 4
Forks: 16
Open Issues: 8
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Meet your Artificial Self: Generate text that sounds like you

This repository contains all resources for the [Applied Machine Learning Days](https://appliedmldays.org/) workshop [Meet your Artificial Self: Generate text that sounds like you](https://appliedmldays.org/events/amld-epfl-2020/workshops/meet-your-artificial-self-generate-text-that-sounds-like-you).

In this workshop, participants are tasked to download their own chat logs and build a chat bot that generates text similar to their writing. As an alternative to using chat logs, we provide a number of other conversational (and non-conversational datasets) datasets in this repository.

## Gitter

Feel free to join our Gitter during the workshop:

[![Gitter](https://badges.gitter.im/artificial-self-AMLD-2020/community.svg)](https://gitter.im/artificial-self-AMLD-2020/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge)

## Slides

Find the workshop slides [here](https://docs.google.com/presentation/d/1-aU5fSWyQN4GwP3KFDy5KorM7c-FJJFjiRp3bJ2sqIY/edit?usp=sharing).

# Usage

The workshop is split in 3 tasks. You can run each task locally (by cloning this repository) or by running the Colab notebook (see links below). If you run locally, make sure you have access to GPU(s) and you are running Python 3.6+ (also make sure you have sufficient storage space). More detailed instructions are provided in the different subfolders.

## Task 1

Fine-tune GPT-2 on various [datasets](datasets) (including tweets, poetry, programming code, chess, music and more!). Thanks to [@manueth](https://github.com/manueth) for compiling the datasets! 

:arrow_right: [Read more](1) 

[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1lk9iZnD5mkAf29FCN3QmcSssFDrWjE8W)

## Task 2

We use the same approach of style transfer to train a conversational model from our chat logs. You can either use [Chatistics](https://github.com/MasterScrat/Chatistics) to parse your own chat logs or you can use some of the provided resources. Thanks to [@MasterScrat](https://github.com/MasterScrat) for compiling the conversational datasets!

:arrow_right: [Read more](2) 

[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1iHcQ8_K0cfRE3v8QX6FMKAzdSSGtf5IX)

## Task 3

We extend the approach in task 2 by introducing multi-task learning, improving data preprocessing, and adding token types.

:arrow_right: [Read more](3) 

[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1XYNef9zcHhTjt6kM6ydL9oXTshoRknIV)

# Credits

* [@manueth](https://github.com/manueth) and [@MasterScrat](https://github.com/MasterScrat)

* [minimaxir/gpt-2-simple](https://github.com/minimaxir/gpt-2-simple)

* [hunggingface/transformers](https://github.com/huggingface/transformers)

* [huggingface/transfer-learning-conv-ai](https://github.com/huggingface/transfer-learning-conv-ai)

* [MasterScrat/Chatistics](https://github.com/MasterScrat/Chatistics)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mar-muel/artificial-self-amld-2020

Awesome Lists containing this project

README