https://github.com/0nutation/SpeechGPT
SpeechGPT Series: Speech Large Language Models
https://github.com/0nutation/SpeechGPT
Last synced: 23 days ago
JSON representation
SpeechGPT Series: Speech Large Language Models
- Host: GitHub
- URL: https://github.com/0nutation/SpeechGPT
- Owner: 0nutation
- License: apache-2.0
- Created: 2023-05-16T15:59:40.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-22T10:08:09.000Z (10 months ago)
- Last Synced: 2025-04-08T12:12:31.563Z (26 days ago)
- Language: Python
- Homepage: https://0nutation.github.io/SpeechGPT.github.io/
- Size: 3.44 MB
- Stars: 1,364
- Watchers: 46
- Forks: 91
- Open Issues: 45
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- ai-game-devtools - SpeechGPT - Modal Conversational Abilities. | [arXiv](https://arxiv.org/abs/2305.11000) | | Speech | (<span id="speech">Speech</span> / <span id="tool">Tool (AI LLM)</span>)
- StarryDivineSky - 0nutation/SpeechGPT
- awesome-llm-and-aigc - SpeechGPT - Modal Conversational Abilities". (**[arXiv 2023](https://arxiv.org/abs/2305.11000)**). (Summary)
- awesome-llm-and-aigc - SpeechGPT - Modal Conversational Abilities". (**[arXiv 2023](https://arxiv.org/abs/2305.11000)**). (Summary)
README
# SpeechGPT: Speech Large Language Models
![]()
- [**SpeechGPT**](speechgpt) (2023/05) - Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
- [**SpeechGPT-Gen**](speechgpt-gen) (2024/01) - Scaling Chain-of-Information Speech Generation
## News
- **[2024/2/20]** We proposed **AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling**. Checkout the [paper](https://arxiv.org/abs/2402.12226) and [github](https://github.com/OpenMOSS/AnyGPT).
- **[2024/1/25]** We released **SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation**. Checkout the [paper](https://arxiv.org/abs/2401.13527) and [github](https://github.com/0nutation/SpeechGPT/tree/main/speechgpt-gen).
- **[2024/1/9]** We proposed **SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems**. Checkout the [paper](https://arxiv.org/abs/2401.03945) and [github](https://github.com/0nutation/SpeechAgents).
- **[2023/9/15]** We released SpeechGPT code and checkpoints and SpeechInstruct dataset.
- **[2023/9/1]** We proposed **SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models**. We released the code and checkpoints of SpeechTokenizer. Checkout the [paper](https://arxiv.org/abs/2308.16692), [demo](https://0nutation.github.io/SpeechTokenizer.github.io/) and [github](https://github.com/ZhangXInFD/SpeechTokenizer).
- **[2023/5/18]** We released **SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities**. We propose SpeechGPT, the first multi-modal LLM capable of perceiving and generating multi-modal contents following multi-modal human instructions. Checkout the [paper](https://arxiv.org/abs/2305.11000) and [demo](https://0nutation.github.io/SpeechGPT.github.io/).## Acknowledgements
- We express our appreciation to Fuliang Weng and Rong Ye for their valuable suggestions and guidance.## Citation
If you find our work useful for your research and applications, please cite using the BibTex:```
@misc{zhang2023speechgpt,
title={SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities},
author={Dong Zhang and Shimin Li and Xin Zhang and Jun Zhan and Pengyu Wang and Yaqian Zhou and Xipeng Qiu},
year={2023},
eprint={2305.11000},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```