https://github.com/ivanlmh/murgan
Simple singing style transfer project using GANs
https://github.com/ivanlmh/murgan
generative-adversarial-network murga-singing singing-voice-conversion style-transfer
Last synced: 4 months ago
JSON representation
Simple singing style transfer project using GANs
- Host: GitHub
- URL: https://github.com/ivanlmh/murgan
- Owner: ivanlmh
- License: apache-2.0
- Created: 2023-10-30T11:41:39.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-01T10:26:45.000Z (almost 2 years ago)
- Last Synced: 2024-01-29T06:17:28.019Z (over 1 year ago)
- Topics: generative-adversarial-network, murga-singing, singing-voice-conversion, style-transfer
- Language: Python
- Homepage:
- Size: 22.5 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# murGAN: Singing Voice to Murga singing Style Transfer
This repository contains a simple approach to transfer the singing style of classical or pop singers to the unique style of Uruguayan Murga (of course, it can actually be trained to transfer to any style).
*Murga is a modern musical expression native to Uruguay. The singing has distinctive characteristics, especially when compared to pop singers or classical European choral singing. There are significant differences in spectral centroid and spectral flatness. There are significant differences in spectral centroid and spectral flatness. Murga singing often presents deviation from the fundamental frequency, and not much vibrato, among other particularities in terms of intonation and vocal expression.*
I use a Generative Adversarial Network architecture, copying the idea from the StarGAN architecture, that the discriminator can not only learn to distinguish real from fake audio, but also of the output of the generator is from a desired domain.
## Getting Started
1. Clone the repository:
```bash
git clone git@github.com:ivanlmh/murGAN.git
cd murGAN
```2. Set up the environment:
```bash
conda env create -f environment.yml
conda activate murGAN
```3. Install project for development
```bash
pip install -e .
```## Dataset Preparation
The datasets I use consist of two folders: one with "Murga" style singing and another with non-murga (classical, pop, etc) singing. Each track in the dataset should be at least 10 seconds long to ensure consistent input size for the model. For non-murga I use the vocals from MUSDB18 and the stems from the Choral Singing Dataset (see referances).
In the ```scripts``` folder you will find a script that creates symlinks of audios, so that you can add files from other projects to a local ```data``` folder, and keep things clean while not taking up more space.
## Model Architecture
The model leverages the StarGAN paradigm. For more details on the architecture, refer to the original [StarGAN paper](https://arxiv.org/abs/1711.09020).
## Training
To train the model, run:
```bash
python src/train.py
```Ensure that both murga and classic datasets are prepared and placed in their respective directories.
## Inference
Once the model is trained, convert classical singing to the "Murga" style using:
```bash
python src/inference.py --input "path_to_input_classical_audio.wav" --output "path_to_output_murga_audio.wav"
```## Tests
To run unit tests```bash
python -m unittest discover tests
```## Acknowledgments
- ChoralSingingDataset https://zenodo.org/records/2649950
- MUSDB18 https://sigsep.github.io/datasets/musdb.html#musdb18-compressed-stems
- StarGAN https://arxiv.org/abs/1711.09020