{"id":20046625,"url":"https://github.com/ncsoft/avocodo","last_synced_at":"2025-07-23T23:05:24.447Z","repository":{"id":64480237,"uuid":"572379580","full_name":"ncsoft/avocodo","owner":"ncsoft","description":"Official implementation of \"Avocodo: Generative Adversarial Network for Artifact-Free Vocoder\" (AAAI2023)","archived":false,"fork":false,"pushed_at":"2023-02-01T13:35:55.000Z","size":18,"stargazers_count":152,"open_issues_count":6,"forks_count":19,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-05T09:35:39.766Z","etag":null,"topics":["avocodo","gan","pytorch","vocoder"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ncsoft.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-11-30T06:24:18.000Z","updated_at":"2025-04-30T14:54:43.000Z","dependencies_parsed_at":"2023-02-17T05:50:18.995Z","dependency_job_id":null,"html_url":"https://github.com/ncsoft/avocodo","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ncsoft/avocodo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncsoft%2Favocodo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncsoft%2Favocodo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncsoft%2Favocodo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncsoft%2Favocodo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ncsoft","download_url":"https://codeload.github.com/ncsoft/avocodo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncsoft%2Favocodo/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266764876,"owners_count":23980649,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-23T02:00:09.312Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["avocodo","gan","pytorch","vocoder"],"created_at":"2024-11-13T11:25:18.301Z","updated_at":"2025-07-23T23:05:24.379Z","avatar_url":"https://github.com/ncsoft.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"---------\n🥑 Avocodo: Generative Adversarial Network for Artifact-Free Vocoder\n---------\n\n**Accepted for publication in the 37th AAAI conference on artificial intelligence.**\n\n.. image:: https://img.shields.io/badge/arXiv-2211.04610-red.svg?style=plastic\n   :target: https://arxiv.org/abs/2206.13404\n\n.. image:: https://img.shields.io/badge/Sample_Page-Avocodo-blue.svg?style=plastic\n   :target: https://nc-ai.github.io/speech/publications/Avocodo/index.html\n\n.. image:: https://img.shields.io/badge/NC_SpeechAI-publications-brightgreen.svg?style=plastic\n   :target: https://nc-ai.github.io/speech/\n\n\nIn our `paper \u003chttps://arxiv.org/abs/2206.13404\u003e`_, we proposed ``Avocodo``.\nWe provide our implementation as an open source in this repository.\n\n**Abstract :** Neural vocoders based on the generative adversarial neural network (GAN) have been widely used due to their fast inference speed and lightweight networks while generating high-quality speech waveforms. Since the perceptually important speech components are primarily concentrated in the low-frequency bands, most GAN-based vocoders perform multi-scale analysis that evaluates downsampled speech waveforms. This multi-scale analysis helps the generator improve speech intelligibility. However, in preliminary experiments, we discovered that the multi-scale analysis which focuses on the low-frequency bands causes unintended artifacts, e.g., aliasing and imaging artifacts, which degrade the synthesized speech waveform quality. Therefore, in this paper, we investigate the relationship between these artifacts and GAN-based vocoders and propose a GAN-based vocoder, called Avocodo, that allows the synthesis of high-fidelity speech with reduced artifacts. We introduce two kinds of discriminators to evaluate speech waveforms in various perspectives: a collaborative multi-band discriminator and a sub-band discriminator. We also utilize a pseudo quadrature mirror filter bank to obtain downsampled multi-band speech waveforms while avoiding aliasing. According to experimental resutls, Avocodo outperforms baseline GAN-based vocoders, both objectviely and subjectively, while reproducing speech with fewer artifacts.\n\nPre-requisites\n===============\n\n1. Install pyenv\n  - `pyenv \u003chttps://github.com/pyenv/pyenv\u003e`_\n  - `pyenv automatic installer \u003chttps://github.com/pyenv/pyenv-installer\u003e`_ (recommended)\n2. Clone this repository\n3. Setup virtual environment and install python requirements. Please refer pyproject.toml\n  .. code-block::\n\n    pyenv install 3.8.11\n    pyenv virtualenv 3.8.11 avocodo\n    pyenv local avocodo\n\n    pip install wheel\n    pip install poetry\n\n    poetry install\n4. Download and extract the `LJ Speech dataset \u003chttps://keithito.com/LJ-Speech-Dataset\u003e`_.\n  - Move all wav files to LJSpeech-1.1/wavs\n  - Split dataset into a trainset and a validationset.\n  .. code-block::\n\n    cat LJSpeech-1.1/metadata.csv | tail -n 13000 \u003e training.txt\n    cat LJSpeech-1.1/metadata.csv | head -n 100 \u003e validation.txt\n\nTraining\n===============\n  .. code-block::\n\n    python avocodo/train.py --config avocodo/configs/avocodo_v1.json\n\nInference\n===============\n  .. code-block::\n\n    python avocodo/inference.py --version ${version} --checkpoint_file_id ${checkpoint_file_id}\n\nReference\n===============\nWe referred to below repositories to make this project.\n\n  `HiFi-GAN \u003chttps://github.com/jik876/hifi-gan\u003e`_\n\n  `Parallel-WaveGAN \u003chttps://github.com/kan-bayashi/ParallelWaveGAN\u003e`_","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fncsoft%2Favocodo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fncsoft%2Favocodo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fncsoft%2Favocodo/lists"}