https://github.com/losttech/billionsongs

Gradient + TensorFlow + GPT-2-powered "AI" song lyrics generator, built as an ASP.NET Core website.
https://github.com/losttech/billionsongs

Last synced: about 1 year ago
JSON representation

Gradient + TensorFlow + GPT-2-powered "AI" song lyrics generator, built as an ASP.NET Core website.

Host: GitHub
URL: https://github.com/losttech/billionsongs
Owner: losttech
License: mit
Created: 2019-04-26T17:34:34.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2023-04-07T20:56:49.000Z (about 3 years ago)
Last Synced: 2025-04-19T06:03:28.686Z (about 1 year ago)
Language: C#
Size: 1.03 MB
Stars: 66
Watchers: 5
Forks: 12
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

# Billion Songs, AI-powered song lyrics generator

[Generate a song](https://billionsongs.azurewebsites.net/)

> NOTE: this repository has git submodules. So clone with --recurse-submodules.
> Learn about them [here](https://git-scm.com/book/en/v2/Git-Tools-Submodules).

See the blog post
[Writing billion songs with C# and Deep Learning](https://lostmsu.github.io/BillionSongs/)
for a detailed explanation how it works.

This project mainly serves as a demonstration of
[Gradient](https://losttech.software/gradient.html),
our TensorFlow binding for C# and other .NET languages.

[Other Gradient samples](https://github.com/losttech/Gradient-Samples).

# What is it, and how does it work?

This is a deep learning-powered song lyrics generator, based on
[GPT-2](https://github.com/openai/gpt-2), wrapped as a ASP.NET Core website.

It generates songs word by word (or rather token by token), using
the statistical relationships learned by a deep learning model, called
[GPT-2](https://github.com/openai/gpt-2).
The actual generator code is in
[GradientTextGenerator class](Web/GradientTextGenerator.cs).

Text generation is pretty slow even with a powerful GPU,
so we have a bunch of caches in /Web to provide a better user experience.
There is also [PregeneratedSongProvider](Web/PregeneratedSongProvider.cs),
which continuously creates new texts in the background to ensure clicking
"Make Random" button gives an instant result.

Detailed explanation in a [blog post](https://lostmsu.github.io/BillionSongs/)

# Prerequisites

1. Download and install Python and TensorFlow as described in
[Gradient documentation](https://github.com/losttech/Gradient/#install-python-tensorflow)
2. Install Python package, called `regex` (`python -m pip install regex --user`)
3. Install the latest .NET Core SDK

# Run instructions

1. Clone the repository and enter the `Web` folder
2. .NET Core 3+ only: ensure you have Entity Framework tool installed:
`dotnet tool install --global dotnet-ef`
3. After cloning the repository, enter the `Web` folder and run `dotnet ef database update`.
That should create `songs.db` file in the same directory.
4. Edit `appsettings.json` (see `appsettings.Development.json` for an example):
- add `"DB": "sqlite"`
- modify `DefaultConnection` to `"DefaultConnection": "Data Source=songs.db"`
- ensure that `Generator` is **not** `dummy`, if you want lyrics to actually be generated
5. Run `dotnet run web`. This should print some logs.
Wait for `Now listening on: http://`, then open that URL in the browser.
It will take up to 4 minutes to generate the first song.

> NOTE: if you see "Can't choose between the following Python environments, as they are equally matching",
set `PYTHON_CONDA_ENV_NAME` to the name of Conda environment where you installed TensorFlow and `regex` modules.

# Train instructions

> NOTE: training requires a lot of RAM (>16GB), and will be slow on non-GPU

1. Download the original 117M GPT-2 model by running
one of download_model.* scripts in External/Gradient-Samples/GPT-2
from the same directory.
2. Download and extract any lyrics dataset (I used
[Every song you have heard (almost)!](https://www.kaggle.com/artimous/every-song-you-have-heard-almost)),
and unpack it if needed.
3. From the command line in the same directory (GPT-2), run
`dotnet run train --include *.csv --column Lyrics path/to/lyrics/folder --run Lyrics`
(change the `column` parameter to the name of the lyrics column in you dataset)
> NOTE: dev instance was trained with `train -i "*.csv" --column=Lyrics Downloads\every-song-you-have-heard-almost -r Lyrics --checkpoint=fresh --save-every=100 -n 3`.
> If training from IDE, set working directory to GPT-2
> (which should contain `models` subfolder downloaded previously).
4. Interrupt training process, when samples start looking good.
5. Try the trained model by running `dotnet run --run Lyrics`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/losttech/billionsongs

Awesome Lists containing this project

README