https://github.com/AIGC-Audio/AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
https://github.com/AIGC-Audio/AudioGPT

audio gpt music sound speech talking-head

Last synced: 7 months ago
JSON representation

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Host: GitHub
URL: https://github.com/AIGC-Audio/AudioGPT
Owner: AIGC-Audio
License: other
Created: 2023-03-16T07:12:18.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-07-06T21:35:18.000Z (over 1 year ago)
Last Synced: 2025-03-18T23:41:39.338Z (8 months ago)
Topics: audio, gpt, music, sound, speech, talking-head
Language: Python
Homepage: https://huggingface.co/spaces/AIGC-Audio/AudioGPT
Size: 23 MB
Stars: 10,111
Watchers: 134
Forks: 863
Open Issues: 52
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

AiTreasureBox - AIGC-Audio/AudioGPT - 10-21_10190_0](https://img.shields.io/github/stars/AIGC-Audio/AudioGPT.svg) |AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head| (Repos)
awesome - AIGC-Audio/AudioGPT - AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head (Python)
ai-game-devtools - AudioGPT
awesome-langchain - AudioGPT - Audio/AudioGPT?style=social) (Open Source Projects / Other / Chatbots)
awesome-ai-talking-heads - AudioGPT
awesome-llm-and-aigc - AudioGPT - Audio/AudioGPT?style=social"/> : AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head. (Summary)
awesome-llm-and-aigc - AudioGPT - Audio/AudioGPT?style=social"/> : AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head. (Summary)
StarryDivineSky - AIGC-Audio/AudioGPT
awesome-open-gpt - AudioGPT🔥
allinchatgpt - AudioGPT - AudioGPT GitHub项目地址 (Uncategorized / Uncategorized)
awesome-langchain-zh - AudioGPT - Audio/AudioGPT?style=social): 理解和生成语音，音乐，声音和会说话的头部 (开源项目 / 其他聊天机器人)

README

          # AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

[![arXiv](https://img.shields.io/badge/arXiv-Paper-.svg)](https://arxiv.org/abs/2304.12995)

[![GitHub Stars](https://img.shields.io/github/stars/AIGC-Audio/AudioGPT?style=social)](https://github.com/AIGC-Audio/AudioGPT)

![visitors](https://visitor-badge.glitch.me/badge?page_id=AIGC-Audio.AudioGPT)

[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/AIGC-Audio/AudioGPT)

We provide our implementation and pretrained models as open source in this repository.

## Get Started

Please refer to [run.md](run.md)

## Capabilities

Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to [asset](assets/README.md).

Currently not every model has repository.

### Speech

|            Task            |   Supported Foundation Models   | Status |

|:--------------------------:|:-------------------------------:|:------:|

|       Text-to-Speech       | [FastSpeech](https://github.com/ming024/FastSpeech2), [SyntaSpeech](https://github.com/yerfor/SyntaSpeech), [VITS](https://github.com/jaywalnut310/vits) |  Yes (WIP)   |

|       Style Transfer       |         [GenerSpeech](https://github.com/Rongjiehuang/GenerSpeech)         |  Yes   |

|     Speech Recognition     |           [whisper](https://github.com/openai/whisper), [Conformer](https://github.com/sooftware/conformer)           |  Yes   |

|     Speech Enhancement     |          [ConvTasNet]()         |  Yes (WIP)   |

|     Speech Separation      |          [TF-GridNet](https://arxiv.org/pdf/2211.12433.pdf)         |  Yes (WIP)   |

|     Speech Translation     |          [Multi-decoder](https://arxiv.org/pdf/2109.12804.pdf)      |  WIP   |

|      Mono-to-Binaural      |          [NeuralWarp](https://github.com/fdarmon/NeuralWarp)         |  Yes   |

### Sing

|           Task            |   Supported Foundation Models   | Status |

|:-------------------------:|:-------------------------------:|:------:|

|       Text-to-Sing        |         [DiffSinger](https://github.com/MoonInTheRiver/DiffSinger), [VISinger](https://github.com/jerryuhoo/VISinger)          |  Yes (WIP)   |

### Audio

|          Task          | Supported Foundation Models | Status |

|:----------------------:|:---------------------------:|:------:|

|     Text-to-Audio      |      [Make-An-Audio]()      |  Yes   |

|    Audio Inpainting    |      [Make-An-Audio]()      |  Yes   |

|     Image-to-Audio     |      [Make-An-Audio]()      |  Yes   |

|    Sound Detection     |    [Audio-transformer](https://github.com/RetroCirce/HTS-Audio-Transformer)    | Yes    |

| Target Sound Detection |    [TSDNet](https://github.com/gy65896/TSDNet)    |  Yes   |

|    Sound Extraction    |    [LASSNet](https://github.com/liuxubo717/LASS)    |  Yes   |

### Talking Head

|           Task            |   Supported Foundation Models   |   Status   |

|:-------------------------:|:-------------------------------:|:----------:|

|  Talking Head Synthesis   |          [GeneFace](https://github.com/yerfor/GeneFace)           | Yes (WIP)  |

## Acknowledgement

We appreciate the open source of the following projects:

[ESPNet](https://github.com/espnet/espnet)  

[NATSpeech](https://github.com/NATSpeech/NATSpeech)  

[Visual ChatGPT](https://github.com/microsoft/visual-chatgpt)  

[Hugging Face](https://github.com/huggingface)  

[LangChain](https://github.com/hwchase17/langchain)  

[Stable Diffusion](https://github.com/CompVis/stable-diffusion)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/AIGC-Audio/AudioGPT

Awesome Lists containing this project

README