https://github.com/mar-muel/artificial-self-amld-2020
Workshop material for the AMLD 2020 workshop on "Meet your Artificial Self: Generate text that sounds like you"
https://github.com/mar-muel/artificial-self-amld-2020
datasets language-model textgeneration transfer-learning workshop
Last synced: 2 months ago
JSON representation
Workshop material for the AMLD 2020 workshop on "Meet your Artificial Self: Generate text that sounds like you"
- Host: GitHub
- URL: https://github.com/mar-muel/artificial-self-amld-2020
- Owner: mar-muel
- License: mit
- Created: 2019-11-26T15:52:23.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-06-12T21:27:28.000Z (almost 2 years ago)
- Last Synced: 2025-02-27T21:48:03.913Z (3 months ago)
- Topics: datasets, language-model, textgeneration, transfer-learning, workshop
- Language: Jupyter Notebook
- Size: 42.2 MB
- Stars: 81
- Watchers: 4
- Forks: 16
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Meet your Artificial Self: Generate text that sounds like you
This repository contains all resources for the [Applied Machine Learning Days](https://appliedmldays.org/) workshop [Meet your Artificial Self: Generate text that sounds like you](https://appliedmldays.org/events/amld-epfl-2020/workshops/meet-your-artificial-self-generate-text-that-sounds-like-you).In this workshop, participants are tasked to download their own chat logs and build a chat bot that generates text similar to their writing. As an alternative to using chat logs, we provide a number of other conversational (and non-conversational datasets) datasets in this repository.
## Gitter
Feel free to join our Gitter during the workshop:[](https://gitter.im/artificial-self-AMLD-2020/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge)
## Slides
Find the workshop slides [here](https://docs.google.com/presentation/d/1-aU5fSWyQN4GwP3KFDy5KorM7c-FJJFjiRp3bJ2sqIY/edit?usp=sharing).# Usage
The workshop is split in 3 tasks. You can run each task locally (by cloning this repository) or by running the Colab notebook (see links below). If you run locally, make sure you have access to GPU(s) and you are running Python 3.6+ (also make sure you have sufficient storage space). More detailed instructions are provided in the different subfolders.## Task 1
Fine-tune GPT-2 on various [datasets](datasets) (including tweets, poetry, programming code, chess, music and more!). Thanks to [@manueth](https://github.com/manueth) for compiling the datasets!:arrow_right: [Read more](1)
[](https://colab.research.google.com/drive/1lk9iZnD5mkAf29FCN3QmcSssFDrWjE8W)
## Task 2
We use the same approach of style transfer to train a conversational model from our chat logs. You can either use [Chatistics](https://github.com/MasterScrat/Chatistics) to parse your own chat logs or you can use some of the provided resources. Thanks to [@MasterScrat](https://github.com/MasterScrat) for compiling the conversational datasets!:arrow_right: [Read more](2)
[](https://colab.research.google.com/drive/1iHcQ8_K0cfRE3v8QX6FMKAzdSSGtf5IX)
## Task 3
We extend the approach in task 2 by introducing multi-task learning, improving data preprocessing, and adding token types.:arrow_right: [Read more](3)
[](https://colab.research.google.com/drive/1XYNef9zcHhTjt6kM6ydL9oXTshoRknIV)
# Credits
* [@manueth](https://github.com/manueth) and [@MasterScrat](https://github.com/MasterScrat)
* [minimaxir/gpt-2-simple](https://github.com/minimaxir/gpt-2-simple)
* [hunggingface/transformers](https://github.com/huggingface/transformers)
* [huggingface/transfer-learning-conv-ai](https://github.com/huggingface/transfer-learning-conv-ai)
* [MasterScrat/Chatistics](https://github.com/MasterScrat/Chatistics)