{"id":50273966,"url":"https://github.com/FunAudioLLM/Fun-ASR","last_synced_at":"2026-06-13T10:01:18.475Z","repository":{"id":329050178,"uuid":"1116516142","full_name":"FunAudioLLM/Fun-ASR","owner":"FunAudioLLM","description":"End-to-end speech recognition large model: 31 languages, dialects, accents, lyrics, hotwords, timestamps, speaker diarization. Trained on tens of millions of hours.","archived":false,"fork":false,"pushed_at":"2026-06-08T11:17:14.000Z","size":2202,"stargazers_count":1234,"open_issues_count":6,"forks_count":123,"subscribers_count":9,"default_branch":"main","last_synced_at":"2026-06-08T13:14:43.189Z","etag":null,"topics":["31-languages","asr","audio-language-model","chinese-dialects","fun-asr","llm-asr","multilingual-asr","pytorch","real-time-asr","speaker-diarization","speech-recognition","speech-to-text","transcription","whisper-alternative"],"latest_commit_sha":null,"homepage":"https://www.funasr.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FunAudioLLM.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-15T01:46:48.000Z","updated_at":"2026-06-08T12:26:41.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/FunAudioLLM/Fun-ASR","commit_stats":null,"previous_names":["funaudiollm/fun-asr"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/FunAudioLLM/Fun-ASR","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FunAudioLLM%2FFun-ASR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FunAudioLLM%2FFun-ASR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FunAudioLLM%2FFun-ASR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FunAudioLLM%2FFun-ASR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FunAudioLLM","download_url":"https://codeload.github.com/FunAudioLLM/Fun-ASR/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FunAudioLLM%2FFun-ASR/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34279898,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-13T02:00:06.617Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["31-languages","asr","audio-language-model","chinese-dialects","fun-asr","llm-asr","multilingual-asr","pytorch","real-time-asr","speaker-diarization","speech-recognition","speech-to-text","transcription","whisper-alternative"],"created_at":"2026-05-27T19:00:21.384Z","updated_at":"2026-06-13T10:01:18.460Z","avatar_url":"https://github.com/FunAudioLLM.png","language":"Python","funding_links":[],"categories":["Industry Strength Natural Language Processing"],"sub_categories":[],"readme":"# Fun-ASR\n\n「[简体中文](README_zh.md)」|「English」|「[日本語](README_ja.md)」|「[한국어](README_ko.md)」\n\nFun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab. It is trained on tens of millions of hours of real speech data, possessing powerful contextual understanding capabilities and industry adaptability. It supports low-latency real-time transcription and covers 31 languages. It excels in vertical domains such as education and finance, accurately recognizing professional terminology and industry expressions, effectively addressing challenges like \"hallucination\" generation and language confusion, achieving \"clear hearing, understanding meaning, and accurate writing.\"\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"images/funasr-v2.png\"\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\u003ch4\u003e\n\u003ca href=\"https://funaudiollm.github.io/funasr\"\u003e Homepage \u003c/a\u003e\n｜\u003ca href=\"#core-features\"\u003e Core Features \u003c/a\u003e\n｜\u003ca href=\"#performance-evaluation\"\u003e Performance Evaluation \u003c/a\u003e\n｜\u003ca href=\"#environment-setup\"\u003e Environment Setup \u003c/a\u003e\n｜\u003ca href=\"#usage-tutorial\"\u003e Usage Tutorial \u003c/a\u003e\n\n\u003c/h4\u003e\n\nModel Repository: [modelscope](https://www.modelscope.cn/models/FunAudioLLM/Fun-ASR-Nano-2512), [huggingface](https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512)\n\nOnline Experience:\n[ModelScope Community Space](https://modelscope.cn/studios/FunAudioLLM/Fun-ASR-Nano), [huggingface space](https://huggingface.co/spaces/FunAudioLLM/Fun-ASR-Nano)\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FunAudioLLM/Fun-ASR/blob/main/examples/colab/fun_asr_nano_quickstart.ipynb)\n\n\u003c/div\u003e\n\n|                                                                           Model Name                                                                            |                                                                                                                                                                                                       Task Details                                                                                                                                                                                                       |         Training Data          | Parameters |\n| :-------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------: | :--------: |\n|       Fun-ASR-Nano \u003cbr\u003e ([⭐](https://www.modelscope.cn/models/FunAudioLLM/Fun-ASR-Nano-2512) [🤗](https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512))       | Speech recognition supports Chinese, English, and Japanese. Chinese includes support for 7 dialects (Wu, Cantonese, Min, Hakka, Gan, Xiang, Jin) and 26 regional accents (Henan, Shanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi and more than 20 other regions). English and Japanese cover multiple regional accents. Additional features include lyric recognition and rap speech recognition. |   Tens of millions of hours    |    800M    |\n| Fun-ASR-MLT-Nano \u003cbr\u003e ([⭐](https://www.modelscope.cn/models/FunAudioLLM/Fun-ASR-MLT-Nano-2512) [🤗](https://huggingface.co/FunAudioLLM/Fun-ASR-MLT-Nano-2512)) |                                    Speech recognition supports Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Swedish, and 31 languages in total.                                    | Hundreds of thousands of hours |    800M    |\n\n\u003ca name=\"What's News\"\u003e\u003c/a\u003e\n\n# What's New 🔥\n\n- 2026/05: **vLLM Inference Engine** — native high-throughput batch (3-5x faster) + WebSocket real-time streaming service. See [vLLM Guide](docs/vllm_guide.md).\n- 2026/05: Fun-ASR-Nano now supports speaker diarization. Use with `vad_model` + `spk_model` + `punc_model` to get per-sentence speaker labels. Requires installing FunASR from source: `pip install git+https://github.com/modelscope/FunASR.git`\n- 2025/12: [Fun-ASR-Nano-2512](https://modelscope.cn/models/FunAudioLLM/Fun-ASR-Nano-2512) is an end-to-end speech recognition large model trained on tens of millions of hours real speech data. It supports low-latency real-time transcription and covers 31 languages.\n- 2024/7: [FunASR](https://github.com/modelscope/FunASR) is a fundamental speech recognition toolkit that offers a variety of features, including speech recognition (ASR), Voice Activity Detection (VAD), Punctuation Restoration, Language Models, Speaker Verification, Speaker Diarization and multi-talker ASR.\n\n# Core Features 🎯\n\n**Fun-ASR** focuses on high-precision speech recognition, multi-language support, and industry customization capabilities\n\n- **Far-field High-noise Recognition:** Deeply optimized for far-distance sound pickup and high-noise scenarios (such as conference rooms, in-vehicle environments, industrial sites, etc.), improving recognition accuracy to **93%**.\n- **Chinese Dialects and Regional Accents:**\n  - Supports **7 major dialects**: Wu, Cantonese, Min, Hakka, Gan, Xiang, Jin\n  - Covers **26 regional accents**: including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi and more than 20 other regions\n- **Multi-language Free Speech:** Supports recognition of **31 languages**, with focused optimization on East and Southeast Asian languages, supporting free language switching and mixed recognition.\n- **Music Background Lyric Recognition:** Enhanced speech recognition performance under music background interference, supporting accurate recognition of lyric content in songs.\n\n# Environment Setup 🐍\n\n```shell\ngit clone https://github.com/FunAudioLLM/Fun-ASR.git\ncd Fun-ASR\npip install -r requirements.txt\n```\n\n\u003ca name=\"usage-tutorial\"\u003e\u003c/a\u003e\n\n# TODO\n\n- [x] Support returning timestamps\n  \u003e **Known limitation:** In the current open-source release, the released Fun-ASR-Nano `model.pt` checkpoint does not include trained `ctc_decoder.*` / `ctc.*` weights, so timestamp output may be returned but is not reliable. For accurate character-level timestamps, use Paraformer instead, for example `AutoModel(model=\"paraformer-zh\", vad_model=\"fsmn-vad\", ...)`. See [issue #106](https://github.com/FunAudioLLM/Fun-ASR/issues/106).\n- [x] Support speaker diarization\n- [x] Support model training\n\n# Usage 🛠️\n\n## Inference\n\n### Using funasr for inference\n\n```python\nfrom funasr import AutoModel\n\n\ndef main():\n    model_dir = \"FunAudioLLM/Fun-ASR-Nano-2512\"\n    model = AutoModel(\n        model=model_dir,\n        trust_remote_code=True,\n        remote_code=\"./model.py\",\n        device=\"cuda:0\",\n        # hub：download models from ms (for ModelScope) or hf (for Hugging Face).\n        hub=\"hf\"\n    )\n\n    wav_path = f\"{model.model_path}/example/zh.mp3\"\n    res = model.generate(\n        input=[wav_path],\n        cache={},\n        batch_size=1,\n        hotwords=[\"开放时间\"],\n        # 中文、英文、日文 for Fun-ASR-Nano-2512\n        # 韩文、越南语、印尼语、泰语、马来语、菲律宾语、阿拉伯语、\n        # 印地语、保加利亚语、克罗地亚语、捷克语、丹麦语、荷兰语、爱沙尼亚语、芬兰语、希腊语、\n        # 匈牙利语、爱尔兰语、拉脱维亚语、立陶宛语、马耳他语、波兰语、葡萄牙语、罗马尼亚语、\n        # 斯洛伐克语、斯洛文尼亚语、瑞典语 for Fun-ASR-MLT-Nano-2512\n        language=\"中文\",\n        itn=True, # or False\n    )\n    text = res[0][\"text\"]\n    print(text)\n\n    model = AutoModel(\n        model=model_dir,\n        trust_remote_code=True,\n        vad_model=\"fsmn-vad\",\n        vad_kwargs={\"max_single_segment_time\": 30000},\n        remote_code=\"./model.py\",\n        device=\"cuda:0\",\n    )\n    res = model.generate(input=[wav_path], cache={}, batch_size=1)\n    text = res[0][\"text\"]\n    print(text)\n\n\nif __name__ == \"__main__\":\n    main()\n```\n\n### Speaker Diarization\n\n```python\nfrom funasr import AutoModel\n\n\ndef main():\n    model_dir = \"FunAudioLLM/Fun-ASR-Nano-2512\"\n    model = AutoModel(\n        model=model_dir,\n        trust_remote_code=True,\n        remote_code=\"./model.py\",\n        vad_model=\"fsmn-vad\",\n        vad_kwargs={\"max_single_segment_time\": 30000},\n        spk_model=\"cam++\",\n        punc_model=\"ct-punc\",\n        device=\"cuda:0\",\n        hub=\"hf\",\n    )\n\n    wav_path = f\"{model.model_path}/example/zh.mp3\"\n    res = model.generate(input=[wav_path], cache={}, batch_size=1, language=\"中文\")\n\n    # Per-sentence results with speaker labels\n    for sent in res[0][\"sentence_info\"]:\n        print(f\"Speaker {sent['spk']}: [{sent['start']}ms - {sent['end']}ms] {sent['text']}\")\n\n\nif __name__ == \"__main__\":\n    main()\n```\n\n### Direct Inference\n\n```python\nfrom model import FunASRNano\n\n\ndef main():\n    model_dir = \"FunAudioLLM/Fun-ASR-Nano-2512\"\n    m, kwargs = FunASRNano.from_pretrained(model=model_dir, device=\"cuda:0\")\n    m.eval()\n\n    wav_path = f\"{kwargs['model_path']}/example/zh.mp3\"\n    res = m.inference(data_in=[wav_path], **kwargs)\n    text = res[0][0][\"text\"]\n    print(text)\n\n\nif __name__ == \"__main__\":\n    main()\n```\n\n\u003cdetails\u003e\u003csummary\u003e Parameter Description (click to expand) \u003c/summary\u003e\n\n- `model_dir`: Model name or local disk model path.\n- `trust_remote_code`: Whether to trust remote code for loading custom model implementations.\n- `remote_code`: Specify the location of specific model code (e.g., `model.py` in the current directory), supporting both absolute and relative paths.\n- `device`: Specify the device to use, such as \"cuda:0\" or \"cpu\".\n\n\u003c/details\u003e\n\n\n# vLLM High-Throughput Inference 🚀\n\nFun-ASR natively integrates the [vLLM](https://github.com/vllm-project/vllm) engine for high-throughput batch inference and production-grade real-time streaming service.\n\n\u003e Full guide: [docs/vllm_guide.md](docs/vllm_guide.md) | API docs: [modelscope.github.io/FunASR/vllm.html](https://modelscope.github.io/FunASR/vllm.html)\n\n### Three Modes\n\n| Mode | Use Case | Entry |\n|------|----------|-------|\n| **Offline Batch** | Large-scale transcription | `AutoModelVLLM` |\n| **Streaming SDK** | Real-time subtitles | `FunASRNanoStreamingVLLM` |\n| **WebSocket Service** | Production deployment | `serve_realtime_ws.py` |\n\n### Offline Batch Inference (3-5x faster)\n\n```python\nfrom funasr.auto.auto_model_vllm import AutoModelVLLM\n\nmodel = AutoModelVLLM(\n    model=\"FunAudioLLM/Fun-ASR-Nano-2512\",\n    tensor_parallel_size=2,      # Multi-GPU\n    gpu_memory_utilization=0.8,\n)\n\nresults = model.generate(\n    [\"audio1.wav\", \"audio2.wav\", \"audio3.wav\"],\n    language=\"中文\",\n    hotwords=[\"张三\", \"北京\"],\n)\nfor r in results:\n    print(f\"[{r['key']}] {r['text']}\")\n```\n\n### Real-time WebSocket Service\n\n```bash\n# Start server (with dynamic VAD + speaker diarization)\npython serve_realtime_ws.py --port 10095 --language 中文 --tensor-parallel-size 2\n\n# Browser client\nopen client_mic.html\n\n# Python client\npython client_python.py --server ws://localhost:10095 --mic\n```\n\n**WebSocket Protocol:**\n```\nClient: \"START\" → Server: {\"event\":\"started\"}\nClient: [audio bytes] → Server: {\"sentences\":[...], \"partial\":\"...\"}\nClient: \"STOP\" → Server: {\"sentences\":[...], \"is_final\":true}\n```\n\n### Streaming SDK\n\n```python\nfrom funasr.models.fun_asr_nano.inference_vllm_streaming import FunASRNanoStreamingVLLM\n\nengine = FunASRNanoStreamingVLLM.from_pretrained(\n    model=\"FunAudioLLM/Fun-ASR-Nano-2512\", chunk_ms=720\n)\n\nfor result in engine.streaming_generate(\"audio.wav\", language=\"中文\"):\n    print(f\"[{result['audio_duration_ms']:.0f}ms] {result['fixed_text']}\")\n```\n\n### Performance\n\n| Method | Time (192min audio) | RTFx | CER |\n|--------|---------------------|------|-----|\n| PyTorch native | 589s | 19.6x | 8.94% |\n| **vLLM (ours)** | **29.3s** | **393.9x** | **8.91%** |\n| yuekaizhang vLLM | 42.7s | 273.0x | 17.07% |\n\n\u003e **20.7x faster** than PyTorch with identical accuracy (CER diff \u003c 0.05%)\n\n### Install\n\n```bash\npip install funasr\u003e=1.3.3 vllm\u003e=0.12.0\n```\n\n# Finetune\n\nPlease refer to [docs/finetune.md](docs/finetune.md)\n\n# Performance 📝\n\nWe evaluated Fun-ASR against other state-of-the-art models on open-source benchmarks, Chinese dialect datasets, and industry-specific test sets. The results demonstrate that Fun-ASR achieves superior performance across various scenarios.\n\n### 1. Open-Source Dataset Performance (WER %)\n\n| Test set            | GLM-ASR-nano | GLM-ASR-nano\\* | Whisper-large-v3 | Seed-ASR | Seed-ASR\\* | Kimi-Audio | Step-Audio2 | FireRed-ASR | Fun-ASR-nano | Fun-ASR |\n| :------------------ | :----------: | :------------: | :--------------: | :------: | :--------: | :--------: | :---------: | :---------: | :----------: | :-----: |\n| **Model Size**      |     1.5B     |      1.5B      |       1.6B       |    -     |     -      |     -      |      -      |    1.1B     |     0.8B     |  7.7B   |\n| **OpenSource**      |      ✅      |       ✅       |        ✅        |    ❌    |     ❌     |     ✅     |     ✅      |     ✅      |      ✅      |   ❌    |\n| AIShell1            |     1.81     |      2.17      |       4.72       |   0.68   |    1.63    |    0.71    |    0.63     |    0.54     |     1.80     |  1.22   |\n| AIShell2            |      -       |      3.47      |       4.68       |   2.27   |    2.76    |    2.86    |    2.10     |    2.58     |     2.75     |  2.39   |\n| Fleurs-zh           |      -       |      3.65      |       5.18       |   3.43   |    3.23    |    3.11    |    2.68     |    4.81     |     2.56     |  2.53   |\n| Fleurs-en           |     5.78     |      6.95      |       6.23       |   9.39   |    9.39    |    6.99    |    3.03     |    10.79    |     5.96     |  4.74   |\n| Librispeech-clean   |     2.00     |      2.17      |       1.86       |   1.58   |    2.8     |    1.32    |    1.17     |    1.84     |     1.76     |  1.51   |\n| Librispeech-other   |     4.19     |      4.43      |       3.43       |   2.84   |    5.69    |    2.63    |    2.42     |    4.52     |     4.33     |  3.03   |\n| WenetSpeech Meeting |     6.73     |      8.21      |      18.39       |   5.69   |    7.07    |    6.24    |    4.75     |    4.95     |     6.60     |  6.17   |\n| WenetSpeech Net     |      -       |      6.33      |      11.89       |   4.66   |    4.84    |    6.45    |    4.67     |    4.94     |     6.01     |  5.46   |\n\n\u003e _Note: Seed-ASR\\* results are evaluated using the official API on volcengine; GLM-ASR-nano\\* results are evaluated using the open-source checkpoint._\n\n### 2. Industry Dataset Performance (WER %)\n\n| Test set           | GLM-ASR-Nano | Whisper-large-v3 | Seed-ASR  | FireRed-ASR | Kimi-Audio | Paraformer v2 | Fun-ASR-nano |  Fun-ASR  |\n| :----------------- | :----------: | :--------------: | :-------: | :---------: | :--------: | :-----------: | :----------: | :-------: |\n| **Model Size**     |     1.5B     |       1.6B       |     -     |    1.1B     |     8B     |     0.2B      |     0.8B     |   7.7B    |\n| **OpenSource**     |      ✅      |        ✅        |    ❌     |     ✅      |     ✅     |      ✅       |      ✅      |    ❌     |\n| Nearfield          |    16.95     |      16.58       |   7.20    |    10.10    |    9.02    |     8.11      |     7.79     |   6.31    |\n| Farfield           |     9.44     |      22.21       |   4.59    |    7.49     |   10.95    |     9.55      |     5.79     |   4.34    |\n| Complex Background |    23.79     |      32.57       |   12.90   |    15.56    |   15.56    |     15.19     |    14.59     |   11.45   |\n| English General    |    16.47     |      18.56       |   15.65   |    21.62    |   18.12    |     19.48     |    15.28     |   13.73   |\n| Opensource         |     4.67     |       7.05       |   3.83    |    5.31     |    3.79    |     6.23      |     4.22     |   3.38    |\n| Dialect            |    54.21     |      66.14       |   29.45   |    52.82    |   71.94    |     41.16     |    28.18     |   15.21   |\n| Accent             |    19.78     |      36.03       |   10.23   |    14.05    |   27.20    |     17.80     |    12.90     |   10.31   |\n| Lyrics             |    46.56     |      54.82       |   30.26   |    42.87    |   65.18    |     50.14     |    30.85     |   21.00   |\n| Hiphop             |    43.32     |      46.56       |   29.46   |    33.88    |   57.25    |     43.79     |    30.87     |   28.58   |\n| **Average**        |  **26.13**   |    **33.39**     | **15.95** |  **22.63**  | **31.00**  |   **23.49**   |  **16.72**   | **12.70** |\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"images/compare_en.png\" width=\"800\" /\u003e\n\u003c/div\u003e\n\n## Remarkable Third-Party Work\n\n- **vLLM Inference Engine (Native)**: Fun-ASR now has built-in vLLM support for high-throughput batch inference and real-time streaming. [Guide](docs/vllm_guide.md) | [Demo](demo_vllm.py)\n  ```bash\n  # Quick start with vLLM\n  from funasr import AutoModelVLLM\n  model = AutoModelVLLM(model=\"FunAudioLLM/Fun-ASR-Nano-2512\", device=\"cuda\", dtype=\"bf16\")\n  result = model.generate(input=\"audio.wav\", batch_size=32)\n  ```\n\n## Ecosystem\n\nFun-ASR-Nano is part of the **FunAudioLLM** family:\n\n| Project | Description | Stars |\n|---------|-------------|-------|\n| [FunASR](https://github.com/modelscope/FunASR) | Industrial speech recognition toolkit — VAD, ASR, punctuation, diarization | [![](https://img.shields.io/github/stars/modelscope/FunASR?style=social)](https://github.com/modelscope/FunASR) |\n| [SenseVoice](https://github.com/FunAudioLLM/SenseVoice) | Multilingual speech understanding — ASR + emotion + audio events | [![](https://img.shields.io/github/stars/FunAudioLLM/SenseVoice?style=social)](https://github.com/FunAudioLLM/SenseVoice) |\n| [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) | Natural speech generation — multi-language, zero-shot cloning | [![](https://img.shields.io/github/stars/FunAudioLLM/CosyVoice?style=social)](https://github.com/FunAudioLLM/CosyVoice) |\n| [FunClip](https://github.com/modelscope/FunClip) | AI-powered video clipping with speech recognition | [![](https://img.shields.io/github/stars/modelscope/FunClip?style=social)](https://github.com/modelscope/FunClip) |\n\n\u003ca href=\"https://star-history.com/#FunAudioLLM/Fun-ASR\u0026modelscope/FunASR\u0026FunAudioLLM/SenseVoice\u0026Date\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://api.star-history.com/svg?repos=FunAudioLLM/Fun-ASR,modelscope/FunASR,FunAudioLLM/SenseVoice\u0026type=Date\u0026theme=dark\" /\u003e\n    \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"https://api.star-history.com/svg?repos=FunAudioLLM/Fun-ASR,modelscope/FunASR,FunAudioLLM/SenseVoice\u0026type=Date\" /\u003e\n    \u003cimg alt=\"Star History Chart\" src=\"https://api.star-history.com/svg?repos=FunAudioLLM/Fun-ASR,modelscope/FunASR,FunAudioLLM/SenseVoice\u0026type=Date\" /\u003e\n  \u003c/picture\u003e\n\u003c/a\u003e\n\n## Citations\n\n```bibtex\n@misc{an2025funasrtechnicalreport,\n      title={Fun-ASR Technical Report},\n      author={Keyu An and Yanni Chen and Zhigao Chen and Chong Deng and Zhihao Du and Changfeng Gao and Zhifu Gao and Bo Gong and Xiangang Li and Yabin Li and Ying Liu and Xiang Lv and Yunjie Ji and Yiheng Jiang and Bin Ma and Haoneng Luo and Chongjia Ni and Zexu Pan and Yiping Peng and Zhendong Peng and Peiyao Wang and Hao Wang and Haoxu Wang and Wen Wang and Wupeng Wang and Yuzhong Wu and Biao Tian and Zhentao Tan and Nan Yang and Bin Yuan and Jieping Ye and Jixing Yu and Qinglin Zhang and Kun Zou and Han Zhao and Shengkui Zhao and Jingren Zhou and Yanqiao Zhu},\n      year={2025},\n      eprint={2509.12508},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2509.12508},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFunAudioLLM%2FFun-ASR","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FFunAudioLLM%2FFun-ASR","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFunAudioLLM%2FFun-ASR/lists"}