https://github.com/0nutation/SpeechGPT

SpeechGPT Series: Speech Large Language Models
https://github.com/0nutation/SpeechGPT

Last synced: 8 months ago
JSON representation

SpeechGPT Series: Speech Large Language Models

Host: GitHub
URL: https://github.com/0nutation/SpeechGPT
Owner: 0nutation
License: apache-2.0
Created: 2023-05-16T15:59:40.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-07-22T10:08:09.000Z (over 1 year ago)
Last Synced: 2025-04-08T12:12:31.563Z (8 months ago)
Language: Python
Homepage: https://0nutation.github.io/SpeechGPT.github.io/
Size: 3.44 MB
Stars: 1,364
Watchers: 46
Forks: 91
Open Issues: 45
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

ai-game-devtools - SpeechGPT - Modal Conversational Abilities. | [arXiv](https://arxiv.org/abs/2305.11000) | | Speech | (<span id="speech">Speech</span> / <span id="tool">LLM (LLM & Tool)</span>)
StarryDivineSky - 0nutation/SpeechGPT
awesome-llm-and-aigc - SpeechGPT - Modal Conversational Abilities". (**[arXiv 2023](https://arxiv.org/abs/2305.11000)**). (Summary)

README

          # SpeechGPT: Speech Large Language Models



     




- [**SpeechGPT**](speechgpt) (2023/05) - Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities

- [**SpeechGPT-Gen**](speechgpt-gen) (2024/01) - Scaling Chain-of-Information Speech Generation

## News

- **[2024/2/20]** We proposed **AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling**. Checkout the [paper](https://arxiv.org/abs/2402.12226) and [github](https://github.com/OpenMOSS/AnyGPT).

- **[2024/1/25]** We released **SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation**. Checkout the [paper](https://arxiv.org/abs/2401.13527) and [github](https://github.com/0nutation/SpeechGPT/tree/main/speechgpt-gen).

- **[2024/1/9]** We proposed **SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems**. Checkout the [paper](https://arxiv.org/abs/2401.03945) and [github](https://github.com/0nutation/SpeechAgents).

- **[2023/9/15]** We released SpeechGPT code and checkpoints and SpeechInstruct dataset.

- **[2023/9/1]** We proposed **SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models**. We released the code and checkpoints of SpeechTokenizer. Checkout the [paper](https://arxiv.org/abs/2308.16692), [demo](https://0nutation.github.io/SpeechTokenizer.github.io/) and [github](https://github.com/ZhangXInFD/SpeechTokenizer).

- **[2023/5/18]** We released **SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities**. We propose SpeechGPT, the first multi-modal LLM capable of perceiving and generating multi-modal contents following multi-modal human instructions.  Checkout the [paper](https://arxiv.org/abs/2305.11000) and [demo](https://0nutation.github.io/SpeechGPT.github.io/).

## Acknowledgements

- We express our appreciation to Fuliang Weng and Rong Ye for their valuable suggestions and guidance.

## Citation

If you find our work useful for your research and applications, please cite using the BibTex:

```

@misc{zhang2023speechgpt,

      title={SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities}, 

      author={Dong Zhang and Shimin Li and Xin Zhang and Jun Zhan and Pengyu Wang and Yaqian Zhou and Xipeng Qiu},

      year={2023},

      eprint={2305.11000},

      archivePrefix={arXiv},

      primaryClass={cs.CL}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/0nutation/SpeechGPT

Awesome Lists containing this project

README