{"id":13456797,"url":"https://github.com/AIGC-Audio/AudioGPT","last_synced_at":"2025-03-24T11:31:26.982Z","repository":{"id":155811739,"uuid":"614719201","full_name":"AIGC-Audio/AudioGPT","owner":"AIGC-Audio","description":"AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head","archived":false,"fork":false,"pushed_at":"2024-07-06T21:35:18.000Z","size":24122,"stargazers_count":10111,"open_issues_count":52,"forks_count":863,"subscribers_count":134,"default_branch":"main","last_synced_at":"2025-03-18T23:41:39.338Z","etag":null,"topics":["audio","gpt","music","sound","speech","talking-head"],"latest_commit_sha":null,"homepage":"https://huggingface.co/spaces/AIGC-Audio/AudioGPT","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AIGC-Audio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-16T07:12:18.000Z","updated_at":"2025-03-16T22:44:13.000Z","dependencies_parsed_at":"2024-07-31T08:24:55.945Z","dependency_job_id":null,"html_url":"https://github.com/AIGC-Audio/AudioGPT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIGC-Audio%2FAudioGPT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIGC-Audio%2FAudioGPT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIGC-Audio%2FAudioGPT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIGC-Audio%2FAudioGPT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AIGC-Audio","download_url":"https://codeload.github.com/AIGC-Audio/AudioGPT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245260872,"owners_count":20586489,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","gpt","music","sound","speech","talking-head"],"created_at":"2024-07-31T08:01:27.931Z","updated_at":"2025-03-24T11:31:26.976Z","avatar_url":"https://github.com/AIGC-Audio.png","language":"Python","funding_links":[],"categories":["HarmonyOS","Python","Repos","Application","\u003cspan id=\"audio\"\u003eAudio\u003c/span\u003e","GitHub projects","Summary","语音识别与合成_其他","精选开源项目合集","Uncategorized","开源项目","Developing","Open Source Projects"],"sub_categories":["Windows Manager","Taxonomy","\u003cspan id=\"tool\"\u003eLLM (LLM \u0026 Tool)\u003c/span\u003e","网络服务_其他","GPT工具","Uncategorized","其他聊天机器人","Tools","Other / Chatbots"],"readme":"# AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head\n\n[![arXiv](https://img.shields.io/badge/arXiv-Paper-\u003cCOLOR\u003e.svg)](https://arxiv.org/abs/2304.12995)\n[![GitHub Stars](https://img.shields.io/github/stars/AIGC-Audio/AudioGPT?style=social)](https://github.com/AIGC-Audio/AudioGPT)\n![visitors](https://visitor-badge.glitch.me/badge?page_id=AIGC-Audio.AudioGPT)\n[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/AIGC-Audio/AudioGPT)\n\n\nWe provide our implementation and pretrained models as open source in this repository.\n\n\n## Get Started\n\nPlease refer to [run.md](run.md)\n\n\n## Capabilities\n\nHere we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to [asset](assets/README.md).\n\nCurrently not every model has repository.\n### Speech\n|            Task            |   Supported Foundation Models   | Status |\n|:--------------------------:|:-------------------------------:|:------:|\n|       Text-to-Speech       | [FastSpeech](https://github.com/ming024/FastSpeech2), [SyntaSpeech](https://github.com/yerfor/SyntaSpeech), [VITS](https://github.com/jaywalnut310/vits) |  Yes (WIP)   |\n|       Style Transfer       |         [GenerSpeech](https://github.com/Rongjiehuang/GenerSpeech)         |  Yes   |\n|     Speech Recognition     |           [whisper](https://github.com/openai/whisper), [Conformer](https://github.com/sooftware/conformer)           |  Yes   |\n|     Speech Enhancement     |          [ConvTasNet]()         |  Yes (WIP)   |\n|     Speech Separation      |          [TF-GridNet](https://arxiv.org/pdf/2211.12433.pdf)         |  Yes (WIP)   |\n|     Speech Translation     |          [Multi-decoder](https://arxiv.org/pdf/2109.12804.pdf)      |  WIP   |\n|      Mono-to-Binaural      |          [NeuralWarp](https://github.com/fdarmon/NeuralWarp)         |  Yes   |\n\n### Sing\n\n|           Task            |   Supported Foundation Models   | Status |\n|:-------------------------:|:-------------------------------:|:------:|\n|       Text-to-Sing        |         [DiffSinger](https://github.com/MoonInTheRiver/DiffSinger), [VISinger](https://github.com/jerryuhoo/VISinger)          |  Yes (WIP)   |\n\n### Audio\n|          Task          | Supported Foundation Models | Status |\n|:----------------------:|:---------------------------:|:------:|\n|     Text-to-Audio      |      [Make-An-Audio]()      |  Yes   |\n|    Audio Inpainting    |      [Make-An-Audio]()      |  Yes   |\n|     Image-to-Audio     |      [Make-An-Audio]()      |  Yes   |\n|    Sound Detection     |    [Audio-transformer](https://github.com/RetroCirce/HTS-Audio-Transformer)    | Yes    |\n| Target Sound Detection |    [TSDNet](https://github.com/gy65896/TSDNet)    |  Yes   |\n|    Sound Extraction    |    [LASSNet](https://github.com/liuxubo717/LASS)    |  Yes   |\n\n\n### Talking Head\n\n|           Task            |   Supported Foundation Models   |   Status   |\n|:-------------------------:|:-------------------------------:|:----------:|\n|  Talking Head Synthesis   |          [GeneFace](https://github.com/yerfor/GeneFace)           | Yes (WIP)  |\n\n\n## Acknowledgement\nWe appreciate the open source of the following projects:\n\n[ESPNet](https://github.com/espnet/espnet) \u0026#8194;\n[NATSpeech](https://github.com/NATSpeech/NATSpeech) \u0026#8194;\n[Visual ChatGPT](https://github.com/microsoft/visual-chatgpt) \u0026#8194;\n[Hugging Face](https://github.com/huggingface) \u0026#8194;\n[LangChain](https://github.com/hwchase17/langchain) \u0026#8194;\n[Stable Diffusion](https://github.com/CompVis/stable-diffusion) \u0026#8194;\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAIGC-Audio%2FAudioGPT","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAIGC-Audio%2FAudioGPT","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAIGC-Audio%2FAudioGPT/lists"}