{"id":13603209,"url":"https://github.com/0nutation/SpeechGPT","last_synced_at":"2025-04-11T14:30:39.278Z","repository":{"id":166907982,"uuid":"641506918","full_name":"0nutation/SpeechGPT","owner":"0nutation","description":"SpeechGPT Series: Speech Large Language Models","archived":false,"fork":false,"pushed_at":"2024-07-22T10:08:09.000Z","size":3611,"stargazers_count":1364,"open_issues_count":45,"forks_count":91,"subscribers_count":46,"default_branch":"main","last_synced_at":"2025-04-08T12:12:31.563Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://0nutation.github.io/SpeechGPT.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/0nutation.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-16T15:59:40.000Z","updated_at":"2025-04-07T19:41:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"a7199a60-049f-44bd-8cce-e7800023589f","html_url":"https://github.com/0nutation/SpeechGPT","commit_stats":{"total_commits":59,"total_committers":3,"mean_commits":"19.666666666666668","dds":0.03389830508474578,"last_synced_commit":"b290011570431cc559546f841335170a9619fa72"},"previous_names":["0nutation/speechgpt"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0nutation%2FSpeechGPT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0nutation%2FSpeechGPT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0nutation%2FSpeechGPT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0nutation%2FSpeechGPT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/0nutation","download_url":"https://codeload.github.com/0nutation/SpeechGPT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248419648,"owners_count":21100213,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T18:01:57.537Z","updated_at":"2025-04-11T14:30:39.273Z","avatar_url":"https://github.com/0nutation.png","language":"Python","funding_links":[],"categories":["Python","\u003cspan id=\"speech\"\u003eSpeech\u003c/span\u003e","多模态大模型","Summary"],"sub_categories":["\u003cspan id=\"tool\"\u003eLLM (LLM \u0026 Tool)\u003c/span\u003e","网络服务_其他"],"readme":"# SpeechGPT: Speech Large Language Models\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"speechgpt/imgs/logo.png\" width=\"20%\"\u003e \u003cbr\u003e\n\u003c/p\u003e\n\n\n- [**SpeechGPT**](speechgpt) (2023/05) - Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities\n\n- [**SpeechGPT-Gen**](speechgpt-gen) (2024/01) - Scaling Chain-of-Information Speech Generation\n\n\n## News\n- **[2024/2/20]** We proposed **AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling**. Checkout the [paper](https://arxiv.org/abs/2402.12226) and [github](https://github.com/OpenMOSS/AnyGPT).\n- **[2024/1/25]** We released **SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation**. Checkout the [paper](https://arxiv.org/abs/2401.13527) and [github](https://github.com/0nutation/SpeechGPT/tree/main/speechgpt-gen).\n- **[2024/1/9]** We proposed **SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems**. Checkout the [paper](https://arxiv.org/abs/2401.03945) and [github](https://github.com/0nutation/SpeechAgents).\n- **[2023/9/15]** We released SpeechGPT code and checkpoints and SpeechInstruct dataset.\n- **[2023/9/1]** We proposed **SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models**. We released the code and checkpoints of SpeechTokenizer. Checkout the [paper](https://arxiv.org/abs/2308.16692), [demo](https://0nutation.github.io/SpeechTokenizer.github.io/) and [github](https://github.com/ZhangXInFD/SpeechTokenizer).\n- **[2023/5/18]** We released **SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities**. We propose SpeechGPT, the first multi-modal LLM capable of perceiving and generating multi-modal contents following multi-modal human instructions.  Checkout the [paper](https://arxiv.org/abs/2305.11000) and [demo](https://0nutation.github.io/SpeechGPT.github.io/).\n\n\n\n## Acknowledgements\n- We express our appreciation to Fuliang Weng and Rong Ye for their valuable suggestions and guidance.\n\n\n\n## Citation\nIf you find our work useful for your research and applications, please cite using the BibTex:\n\n```\n@misc{zhang2023speechgpt,\n      title={SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities}, \n      author={Dong Zhang and Shimin Li and Xin Zhang and Jun Zhan and Pengyu Wang and Yaqian Zhou and Xipeng Qiu},\n      year={2023},\n      eprint={2305.11000},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F0nutation%2FSpeechGPT","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F0nutation%2FSpeechGPT","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F0nutation%2FSpeechGPT/lists"}