{"id":18694524,"url":"https://github.com/luickk/gan-audio-generator","last_synced_at":"2025-04-12T07:12:08.298Z","repository":{"id":122825129,"uuid":"131401586","full_name":"luickk/gan-audio-generator","owner":"luickk","description":"Generating audio using a Generative Adversarial Network","archived":false,"fork":false,"pushed_at":"2020-08-29T10:13:28.000Z","size":1263,"stargazers_count":14,"open_issues_count":1,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-12T07:12:02.781Z","etag":null,"topics":["deep-learning","gan","general-adversarial-network","keras","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/luickk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-04-28T10:52:52.000Z","updated_at":"2024-11-18T14:41:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"9ebb79bc-9fbc-41b6-903d-83d7e2e15c09","html_url":"https://github.com/luickk/gan-audio-generator","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luickk%2Fgan-audio-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luickk%2Fgan-audio-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luickk%2Fgan-audio-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luickk%2Fgan-audio-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/luickk","download_url":"https://codeload.github.com/luickk/gan-audio-generator/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248530574,"owners_count":21119600,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","gan","general-adversarial-network","keras","python"],"created_at":"2024-11-07T11:11:13.576Z","updated_at":"2025-04-12T07:12:08.284Z","avatar_url":"https://github.com/luickk.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"Generative Adversarial Network Audio generator\n===================\n\nThe aim is to generate audio based on the [Common Voice](https://voice.mozilla.org/en/data) dataset using a\n[Generative adversarial network](https://en.wikipedia.org/wiki/Generative_adversarial_network).\n\n----------\n\n#### Résumé\n\nThe projects as a whole works quite good, both the generator and the discriminator are training and competing\nagainst each other. But to achieve acceptable results the generator has to be better than the discriminator, which is not he case. \nEven after 12 Gb of data the discriminator is still way better than the generator which basically means that the generator couldn't \nimitate the sound samples good enough. The expectable result is a monotonous sough. The inability of the generator to get better\nthan the discriminator can be traced back to the data, an image(grayscale) imitating GAN for example, works with a scalar from 0-10\nper pixel. One tone(compared to pixel) has 256 16bit values, with a 44 Mhz sample rate, there are a whole of 44000 * 256 * 5 values \nto change in a 5 second sound sample, a img generator in comparison has only 400x400 values to adopt.\nThe complexity of the data thus would have to be reduced in either frequency or quality which both leeds to an unauthentic imitation.\n\n----------\n\n#### Installation\n\n    \u003e - Clone Repository\n    \u003e - Install Dependencies\n    \u003e - Train\n\n#### Training\n    \u003e - Convert .\u003cformat\u003e files to .wav files using tools/reformat.py \u003cbr\u003e\n    \u003e - python main.py -m train\n\n----------\n\nDependencies\n-------------------\n\n\u003e - numpy\n\u003e - Keras\n\u003e - matplotlib\n\u003e - librosa\n\u003e - OptionParser\n\u003e - uuid\n\u003e - tqdm\n\u003e - tensorflow\n\u003e - scipy\n\u003e - sklearn \n\u003e - h5py\n\nAudio Data\n-------------------\n\nSamplerate: 44,1 kHz \u003cbr\u003e\nAudiotype: Mono \u003cbr\u003e\nRecommended dataset: Common Voice by Mozilla\u003cbr\u003e\nFile format: .wav\n\nInspired Paper\n-------------------\n\n[Continuous recurrent neural networks with adversarial training](https://arxiv.org/pdf/1611.09904.pdf) by \u003cbr\u003e\nOlof Mogren Chalmers *University of Technology, Sweden*\n\nDependency specific issues\n-------------------\n\n - librosa\n\n\t`raise NoBackendError() ` \u003cbr\u003e\n    `audioread.NoBackendError` :\n    install [ffmpeg](https://ffmpeg.zeranoe.com/builds/) and \u003cbr\u003e\n    add environment variable for ffmpeg\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluickk%2Fgan-audio-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fluickk%2Fgan-audio-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluickk%2Fgan-audio-generator/lists"}