https://github.com/luickk/gan-audio-generator

Generating audio using a Generative Adversarial Network
https://github.com/luickk/gan-audio-generator

deep-learning gan general-adversarial-network keras python

Last synced: about 1 year ago
JSON representation

Generating audio using a Generative Adversarial Network

Host: GitHub
URL: https://github.com/luickk/gan-audio-generator
Owner: luickk
Created: 2018-04-28T10:52:52.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2020-08-29T10:13:28.000Z (almost 6 years ago)
Last Synced: 2025-04-12T07:12:02.781Z (about 1 year ago)
Topics: deep-learning, gan, general-adversarial-network, keras, python
Language: Python
Homepage:
Size: 1.2 MB
Stars: 14
Watchers: 3
Forks: 4
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

Generative Adversarial Network Audio generator
===================

The aim is to generate audio based on the [Common Voice](https://voice.mozilla.org/en/data) dataset using a
[Generative adversarial network](https://en.wikipedia.org/wiki/Generative_adversarial_network).

----------

#### Résumé

The projects as a whole works quite good, both the generator and the discriminator are training and competing
against each other. But to achieve acceptable results the generator has to be better than the discriminator, which is not he case.
Even after 12 Gb of data the discriminator is still way better than the generator which basically means that the generator couldn't
imitate the sound samples good enough. The expectable result is a monotonous sough. The inability of the generator to get better
than the discriminator can be traced back to the data, an image(grayscale) imitating GAN for example, works with a scalar from 0-10
per pixel. One tone(compared to pixel) has 256 16bit values, with a 44 Mhz sample rate, there are a whole of 44000 * 256 * 5 values
to change in a 5 second sound sample, a img generator in comparison has only 400x400 values to adopt.
The complexity of the data thus would have to be reduced in either frequency or quality which both leeds to an unauthentic imitation.

----------

#### Installation

> - Clone Repository
> - Install Dependencies
> - Train

#### Training
> - Convert . files to .wav files using tools/reformat.py

> - python main.py -m train

----------

Dependencies
-------------------

> - numpy
> - Keras
> - matplotlib
> - librosa
> - OptionParser
> - uuid
> - tqdm
> - tensorflow
> - scipy
> - sklearn
> - h5py

Audio Data
-------------------

Samplerate: 44,1 kHz

Audiotype: Mono

Recommended dataset: Common Voice by Mozilla

File format: .wav

Inspired Paper
-------------------

[Continuous recurrent neural networks with adversarial training](https://arxiv.org/pdf/1611.09904.pdf) by

Olof Mogren Chalmers *University of Technology, Sweden*

Dependency specific issues
-------------------

- librosa

`raise NoBackendError() `

`audioread.NoBackendError` :
install [ffmpeg](https://ffmpeg.zeranoe.com/builds/) and

add environment variable for ffmpeg

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/luickk/gan-audio-generator

Awesome Lists containing this project

README