{"id":13826163,"url":"https://github.com/haoheliu/voicefixer","last_synced_at":"2025-05-14T04:04:59.449Z","repository":{"id":42937371,"uuid":"403475244","full_name":"haoheliu/voicefixer","owner":"haoheliu","description":"General Speech Restoration","archived":false,"fork":false,"pushed_at":"2025-02-17T14:13:03.000Z","size":3946,"stargazers_count":1121,"open_issues_count":38,"forks_count":133,"subscribers_count":17,"default_branch":"main","last_synced_at":"2025-04-10T03:46:07.322Z","etag":null,"topics":["declipping","denoise","dereverberation","mel","speech","speech-analysis","speech-enhancement","speech-processing","speech-synthesis","super-resolution","tts","vocoder"],"latest_commit_sha":null,"homepage":"https://haoheliu.github.io/demopage-voicefixer/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/haoheliu.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-09-06T03:38:22.000Z","updated_at":"2025-04-07T21:19:04.000Z","dependencies_parsed_at":"2024-01-15T16:21:27.549Z","dependency_job_id":"851e9747-96b9-4f08-8eef-de0a535e16a8","html_url":"https://github.com/haoheliu/voicefixer","commit_stats":{"total_commits":69,"total_committers":6,"mean_commits":11.5,"dds":"0.26086956521739135","last_synced_commit":"15d695949766a42a2175f1f0b9c432d01a0a2d9a"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haoheliu%2Fvoicefixer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haoheliu%2Fvoicefixer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haoheliu%2Fvoicefixer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haoheliu%2Fvoicefixer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/haoheliu","download_url":"https://codeload.github.com/haoheliu/voicefixer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254067254,"owners_count":22009119,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["declipping","denoise","dereverberation","mel","speech","speech-analysis","speech-enhancement","speech-processing","speech-synthesis","super-resolution","tts","vocoder"],"created_at":"2024-08-04T09:01:33.178Z","updated_at":"2025-05-14T04:04:59.400Z","avatar_url":"https://github.com/haoheliu.png","language":"Python","funding_links":["https://www.buymeacoffee.com/haoheliuP"],"categories":["Python","Speech Enhancement \u0026 Audio Processing"],"sub_categories":["Noise Suppression \u0026 Enhancement"],"readme":"[![arXiv](https://img.shields.io/badge/arXiv-2109.13731-brightgreen.svg?style=flat-square)](https://arxiv.org/abs/2109.13731) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1HYYUepIsl2aXsdET6P_AmNVXuWP1MCMf?usp=sharing) [![PyPI version](https://badge.fury.io/py/voicefixer.svg)](https://badge.fury.io/py/voicefixer) [![githubio](https://img.shields.io/badge/GitHub.io-Audio_Samples-blue?logo=Github\u0026style=flat-square)](https://haoheliu.github.io/demopage-voicefixer)[![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97-Models%20on%20Hub-yellow)](https://huggingface.co/spaces/akhaliq/VoiceFixer)\n\n- [:speaking_head: :wrench: VoiceFixer](#speaking_head-wrench-voicefixer)\n  - [Demo](#demo)\n  - [Usage](#usage)\n    - [Command line](#command-line)\n    - [Desktop App](#desktop-app)\n    - [Python Examples](#python-examples)\n    - [Docker](#docker)\n    - [Others Features](#others-features)\n  - [Materials](#materials)\n  - [Change log](#change-log)\n  \n# :speaking_head: :wrench: VoiceFixer \n\n *Voicefixer* aims to restore human speech regardless how serious its degraded. It can handle noise, reveberation, low resolution (2kHz~44.1kHz) and clipping (0.1-1.0 threshold) effect within one model.\n\nThis package provides: \n- A pretrained *Voicefixer*, which is build based on neural vocoder.\n- A pretrained 44.1k universal speaker-independent neural vocoder.\n\n![main](test/figure.png)\n\n- If you found this repo helpful, please consider citing or [![\"Buy Me A Coffee\"](https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png)](https://www.buymeacoffee.com/haoheliuP)\n\n```bib\n @misc{liu2021voicefixer,   \n     title={VoiceFixer: Toward General Speech Restoration With Neural Vocoder},   \n     author={Haohe Liu and Qiuqiang Kong and Qiao Tian and Yan Zhao and DeLiang Wang and Chuanzeng Huang and Yuxuan Wang},  \n     year={2021},  \n     eprint={2109.13731},  \n     archivePrefix={arXiv},  \n     primaryClass={cs.SD}  \n }\n```\n\n## Demo\n\nPlease visit [demo page](https://haoheliu.github.io/demopage-voicefixer/) to view what voicefixer can do.\n\n## Usage\n\n### Run Modes\n\n| Mode | Description |\n| ---- | ----------- |\n| `0`    | Original Model (suggested by default) |\n| `1`    | Add preprocessing module (remove higher frequency) |\n| `2`    | Train mode (might work sometimes on seriously degraded real speech) |\n| `all`  | Run all modes - will output 1 wav file for each supported mode. |\n\n### Command line\n\nFirst, install voicefixer via pip:\n```shell\npip install git+https://github.com/haoheliu/voicefixer.git\n```\n\nProcess a file:\n```shell\n# Specify the input .wav file. Output file is outfile.wav.\nvoicefixer --infile test/utterance/original/original.wav\n# Or specify a output path\nvoicefixer --infile test/utterance/original/original.wav --outfile test/utterance/original/original_processed.wav\n```\n\nProcess files in a folder:\n```shell\nvoicefixer --infolder /path/to/input --outfolder /path/to/output\n```\n\nChange mode (The default mode is 0):\n```shell\nvoicefixer --infile /path/to/input.wav --outfile /path/to/output.wav --mode 1\n```\n\nRun all modes:\n```shell\n# output file saved to `/path/to/output-modeX.wav`.\nvoicefixer --infile /path/to/input.wav --outfile /path/to/output.wav --mode all\n```\n\nPre-load the weights only without any actual processing:\n```shell\nvoicefixer --weight_prepare\n```\n\nFor more helper information please run:\n\n```shell\nvoicefixer -h\n```\n\n### Desktop App\n\n[Demo on Youtube](https://www.youtube.com/watch?v=d_j8UKTZ7J8) (Thanks @Justin John)\n\nInstall voicefixer via pip:\n```shell script\npip install voicefixer\n```\n\nYou can test audio samples on your desktop by running website (powered by [streamlit](https://streamlit.io/))\n\n1. Clone the repo first.\n```shell script\ngit clone https://github.com/haoheliu/voicefixer.git\ncd voicefixer\n```\n:warning: **For windows users**, please make sure you have installed [WGET](https://eternallybored.org/misc/wget) and added the wget command to the system path (thanks @justinjohn0306).\n\n\n2. Initialize and start web page.\n```shell script\n# Run streamlit \nstreamlit run test/streamlit.py\n```\n\n- If you run for the first time: the web page may leave blank for several minutes for downloading models. You can checkout the terminal for downloading progresses.  \n\n- You can use [this low quality speech file](https://github.com/haoheliu/voicefixer/blob/main/test/utterance/original/original.wav) we provided for a test run. The page after processing will look like the following.\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"test/streamlit.png\" alt=\"figure\" width=\"400\"/\u003e\u003c/p\u003e\n\n- For users from main land China, if you experience difficulty on downloading checkpoint. You can access them alternatively on [百度网盘](https://pan.baidu.com/s/194ufkUR_PYf1nE1KqkEZjQ) (提取密码: qis6). Please download the two checkpoints inside and place them in the following folder.\n  - Place **vf.ckpt** inside *~/.cache/voicefixer/analysis_module/checkpoints*. (The \"~\" represents your home directory)\n  - Place **model.ckpt-1490000_trimed.pt** inside *~/.cache/voicefixer/synthesis_module/44100*. (The \"~\" represents your home directory)\n\n### Python Examples \n\nFirst, install voicefixer via pip:\n```shell script\npip install voicefixer\n```\n\nThen run the following scripts for a test run:\n\n```shell script\ngit clone https://github.com/haoheliu/voicefixer.git; cd voicefixer\npython3 test/test.py # test script\n```\nWe expect it will give you the following output:\n```shell script\nInitializing VoiceFixer...\nTest voicefixer mode 0, Pass\nTest voicefixer mode 1, Pass\nTest voicefixer mode 2, Pass\nInitializing 44.1kHz speech vocoder...\nTest vocoder using groundtruth mel spectrogram...\nPass\n```\n*test/test.py* mainly contains the test of the following two APIs:\n- voicefixer.restore\n- vocoder.oracle\n\n```python\n...\n\n# TEST VOICEFIXER\n## Initialize a voicefixer\nprint(\"Initializing VoiceFixer...\")\nvoicefixer = VoiceFixer()\n# Mode 0: Original Model (suggested by default)\n# Mode 1: Add preprocessing module (remove higher frequency)\n# Mode 2: Train mode (might work sometimes on seriously degraded real speech)\nfor mode in [0,1,2]:\n    print(\"Testing mode\",mode)\n    voicefixer.restore(input=os.path.join(git_root,\"test/utterance/original/original.flac\"), # low quality .wav/.flac file\n                       output=os.path.join(git_root,\"test/utterance/output/output_mode_\"+str(mode)+\".flac\"), # save file path\n                       cuda=False, # GPU acceleration\n                       mode=mode)\n    if(mode != 2):\n        check(\"output_mode_\"+str(mode)+\".flac\")\n    print(\"Pass\")\n\n# TEST VOCODER\n## Initialize a vocoder\nprint(\"Initializing 44.1kHz speech vocoder...\")\nvocoder = Vocoder(sample_rate=44100)\n\n### read wave (fpath) -\u003e mel spectrogram -\u003e vocoder -\u003e wave -\u003e save wave (out_path)\nprint(\"Test vocoder using groundtruth mel spectrogram...\")\nvocoder.oracle(fpath=os.path.join(git_root,\"test/utterance/original/p360_001_mic1.flac\"),\n               out_path=os.path.join(git_root,\"test/utterance/output/oracle.flac\"),\n               cuda=False) # GPU acceleration\n\n...\n```\n\nYou can clone this repo and try to run test.py inside the *test* folder.\n\n### Docker\n\n\u003e Currently the the Docker image is not published and needs to be built locally, but this way you make sure you're running it with all the expected configuration.\n\u003e The generated image size is about 10GB and that is mainly due to the dependencies that consume around 9.8GB on their own.\n\n\u003e However, the layer containing `voicefixer` is the last added layer, making any rebuild if you change sources relatively small (~200MB at a time as the weights get refreshed on image build).\n\nThe `Dockerfile` can be viewed [here](Dockerfile).\n\nAfter cloning the repo:\n\n#### OS Agnostic\n\n```shell\n# To build the image\ncd voicefixer\ndocker build -t voicefixer:cpu .\n\n# To run the image\ndocker run --rm -v \"$(pwd)/data:/opt/voicefixer/data\" voicefixer:cpu \u003call_other_cli_args_here\u003e\n\n## Example: docker run --rm -v \"$(pwd)/data:/opt/voicefixer/data\" voicefixer:cpu --infile data/my-input.wav --outfile data/my-output.mode-all.wav --mode all\n```\n\n#### Wrapper script: Linux and MacOS\n```bash\n# To build the image\ncd voicefixer\n./docker-build-local.sh\n\n# To run the image\n./run.sh \u003call_other_cli_args_here\u003e\n\n## Example: ./run.sh --infile data/my-input.wav --outfile data/my-output.mode-all.wav --mode all\n```\n\n### Others Features\n\n- How to use your own vocoder, like pre-trained HiFi-Gan?\n\nFirst you need to write a following helper function with your model. Similar to the helper function in this repo: https://github.com/haoheliu/voicefixer/blob/main/voicefixer/vocoder/base.py#L35\n\n```shell script\n    def convert_mel_to_wav(mel):\n        \"\"\"\n        :param non normalized mel spectrogram: [batchsize, 1, t-steps, n_mel]\n        :return: [batchsize, 1, samples]\n        \"\"\"\n        return wav\n```\n\nThen pass this function to *voicefixer.restore*, for example:\n```\nvoicefixer.restore(input=\"\", # input wav file path\n                   output=\"\", # output wav file path\n                   cuda=False, # whether to use gpu acceleration\n                   mode = 0,\n                   your_vocoder_func = convert_mel_to_wav)\n```\n\nNote: \n- For compatibility, your vocoder should working on 44.1kHz wave with mel frequency bins 128. \n- The input mel spectrogram to the helper function should not be normalized by the width of each mel filter. \n\n## Materials\n- Voicefixer training: https://github.com/haoheliu/voicefixer_main.git\n- Demo page: https://haoheliu.github.io/demopage-voicefixer/ \n\n[![46dnPO.png](https://z3.ax1x.com/2021/09/26/46dnPO.png)](https://imgtu.com/i/46dnPO)\n[![46dMxH.png](https://z3.ax1x.com/2021/09/26/46dMxH.png)](https://imgtu.com/i/46dMxH)\n\n\n## Change log\n\nSee [CHANGELOG.md](CHANGELOG.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhaoheliu%2Fvoicefixer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhaoheliu%2Fvoicefixer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhaoheliu%2Fvoicefixer/lists"}