https://github.com/losttech/billionsongs
Gradient + TensorFlow + GPT-2-powered "AI" song lyrics generator, built as an ASP.NET Core website.
https://github.com/losttech/billionsongs
Last synced: 9 months ago
JSON representation
Gradient + TensorFlow + GPT-2-powered "AI" song lyrics generator, built as an ASP.NET Core website.
- Host: GitHub
- URL: https://github.com/losttech/billionsongs
- Owner: losttech
- License: mit
- Created: 2019-04-26T17:34:34.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2023-04-07T20:56:49.000Z (almost 3 years ago)
- Last Synced: 2025-04-19T06:03:28.686Z (10 months ago)
- Language: C#
- Size: 1.03 MB
- Stars: 66
- Watchers: 5
- Forks: 12
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Billion Songs, AI-powered song lyrics generator
[Generate a song](https://billionsongs.azurewebsites.net/)
> NOTE: this repository has git submodules. So clone with --recurse-submodules.
> Learn about them [here](https://git-scm.com/book/en/v2/Git-Tools-Submodules).
See the blog post
[Writing billion songs with C# and Deep Learning](https://lostmsu.github.io/BillionSongs/)
for a detailed explanation how it works.
This project mainly serves as a demonstration of
[Gradient](https://losttech.software/gradient.html),
our TensorFlow binding for C# and other .NET languages.
[Other Gradient samples](https://github.com/losttech/Gradient-Samples).
# What is it, and how does it work?
This is a deep learning-powered song lyrics generator, based on
[GPT-2](https://github.com/openai/gpt-2), wrapped as a ASP.NET Core website.
It generates songs word by word (or rather token by token), using
the statistical relationships learned by a deep learning model, called
[GPT-2](https://github.com/openai/gpt-2).
The actual generator code is in
[GradientTextGenerator class](Web/GradientTextGenerator.cs).
Text generation is pretty slow even with a powerful GPU,
so we have a bunch of caches in /Web to provide a better user experience.
There is also [PregeneratedSongProvider](Web/PregeneratedSongProvider.cs),
which continuously creates new texts in the background to ensure clicking
"Make Random" button gives an instant result.
Detailed explanation in a [blog post](https://lostmsu.github.io/BillionSongs/)
# Prerequisites
1. Download and install Python and TensorFlow as described in
[Gradient documentation](https://github.com/losttech/Gradient/#install-python-tensorflow)
2. Install Python package, called `regex` (`python -m pip install regex --user`)
3. Install the latest .NET Core SDK
# Run instructions
1. Clone the repository and enter the `Web` folder
2. .NET Core 3+ only: ensure you have Entity Framework tool installed:
`dotnet tool install --global dotnet-ef`
3. After cloning the repository, enter the `Web` folder and run `dotnet ef database update`.
That should create `songs.db` file in the same directory.
4. Edit `appsettings.json` (see `appsettings.Development.json` for an example):
- add `"DB": "sqlite"`
- modify `DefaultConnection` to `"DefaultConnection": "Data Source=songs.db"`
- ensure that `Generator` is **not** `dummy`, if you want lyrics to actually be generated
5. Run `dotnet run web`. This should print some logs.
Wait for `Now listening on: http://`, then open that URL in the browser.
It will take up to 4 minutes to generate the first song.
> NOTE: if you see "Can't choose between the following Python environments, as they are equally matching",
set `PYTHON_CONDA_ENV_NAME` to the name of Conda environment where you installed TensorFlow and `regex` modules.
# Train instructions
> NOTE: training requires a lot of RAM (>16GB), and will be slow on non-GPU
1. Download the original 117M GPT-2 model by running
one of download_model.* scripts in External/Gradient-Samples/GPT-2
from the same directory.
2. Download and extract any lyrics dataset (I used
[Every song you have heard (almost)!](https://www.kaggle.com/artimous/every-song-you-have-heard-almost)),
and unpack it if needed.
3. From the command line in the same directory (GPT-2), run
`dotnet run train --include *.csv --column Lyrics path/to/lyrics/folder --run Lyrics`
(change the `column` parameter to the name of the lyrics column in you dataset)
> NOTE: dev instance was trained with `train -i "*.csv" --column=Lyrics Downloads\every-song-you-have-heard-almost -r Lyrics --checkpoint=fresh --save-every=100 -n 3`.
> If training from IDE, set working directory to GPT-2
> (which should contain `models` subfolder downloaded previously).
4. Interrupt training process, when samples start looking good.
5. Try the trained model by running `dotnet run --run Lyrics`