{"id":13543476,"url":"https://github.com/wenet-e2e/wenet","last_synced_at":"2025-05-13T21:06:39.193Z","repository":{"id":36967689,"uuid":"313501668","full_name":"wenet-e2e/wenet","owner":"wenet-e2e","description":"Production First and Production Ready End-to-End Speech Recognition Toolkit","archived":false,"fork":false,"pushed_at":"2024-10-18T07:14:31.000Z","size":25338,"stargazers_count":4146,"open_issues_count":45,"forks_count":1074,"subscribers_count":90,"default_branch":"main","last_synced_at":"2024-10-29T10:44:20.380Z","etag":null,"topics":["asr","automatic-speech-recognition","conformer","e2e-models","production-ready","pytorch","speech-recognition","transformer","whisper"],"latest_commit_sha":null,"homepage":"https://wenet-e2e.github.io/wenet/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wenet-e2e.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-11-17T03:57:23.000Z","updated_at":"2024-10-29T07:07:13.000Z","dependencies_parsed_at":"2023-12-21T10:56:08.359Z","dependency_job_id":"092e71c7-c80a-439e-a682-24423c60e21d","html_url":"https://github.com/wenet-e2e/wenet","commit_stats":{"total_commits":1420,"total_committers":157,"mean_commits":9.044585987261147,"dds":0.8507042253521127,"last_synced_commit":"2d0da71db4023174027b39d8716804d039b74a67"},"previous_names":["mobvoi/wenet"],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wenet-e2e%2Fwenet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wenet-e2e%2Fwenet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wenet-e2e%2Fwenet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wenet-e2e%2Fwenet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wenet-e2e","download_url":"https://codeload.github.com/wenet-e2e/wenet/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247918921,"owners_count":21018044,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","automatic-speech-recognition","conformer","e2e-models","production-ready","pytorch","speech-recognition","transformer","whisper"],"created_at":"2024-08-01T11:00:32.125Z","updated_at":"2025-04-08T20:09:37.903Z","avatar_url":"https://github.com/wenet-e2e.png","language":"Python","funding_links":[],"categories":["C++","Python","语音识别"],"sub_categories":["网络服务_其他"],"readme":"# WeNet\n\n[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Python-Version](https://img.shields.io/badge/Python-3.7%7C3.8-brightgreen)](https://github.com/wenet-e2e/wenet)\n\n[**Roadmap**](https://github.com/wenet-e2e/wenet/issues/1683)\n| [**Docs**](https://wenet-e2e.github.io/wenet)\n| [**Papers**](https://wenet-e2e.github.io/wenet/papers.html)\n| [**Runtime**](https://github.com/wenet-e2e/wenet/tree/main/runtime)\n| [**Pretrained Models**](docs/pretrained_models.md)\n| [**HuggingFace**](https://huggingface.co/spaces/wenet/wenet_demo)\n| [**Ask WeNet Guru**](https://gurubase.io/g/wenet)\n\n**We** share **Net** together.\n\n## Highlights\n\n* **Production first and production ready**: The core design principle, WeNet provides full stack production solutions for speech recognition.\n* **Accurate**: WeNet achieves SOTA results on a lot of public speech datasets.\n* **Light weight**: WeNet is easy to install, easy to use, well designed, and well documented.\n\n\n## Install\n\n### Install python package\n\n``` sh\npip install git+https://github.com/wenet-e2e/wenet.git\n```\n\n**Command-line usage** (use `-h` for parameters):\n\n``` sh\nwenet --language chinese audio.wav\n```\n\n**Python programming usage**:\n\n``` python\nimport wenet\n\nmodel = wenet.load_model('chinese')\nresult = model.transcribe('audio.wav')\nprint(result['text'])\n```\n\nPlease refer [python usage](docs/python_package.md) for more command line and python programming usage.\n\n### Install for training \u0026 deployment\n\n- Clone the repo\n``` sh\ngit clone https://github.com/wenet-e2e/wenet.git\n```\n\n- Install Conda: please see https://docs.conda.io/en/latest/miniconda.html\n- Create Conda env:\n\n``` sh\nconda create -n wenet python=3.10\nconda activate wenet\nconda install conda-forge::sox\n```\n\n- Install CUDA: please follow this [link](https://icefall.readthedocs.io/en/latest/installation/index.html#id1), It's recommended to install CUDA 12.1\n- Install torch and torchaudio, It's recomended to use 2.2.2+cu121:\n\n``` sh\npip install torch==2.2.2+cu121 torchaudio==2.2.2+cu121 -f https://download.pytorch.org/whl/torch_stable.html\n```\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eFor Ascend NPU users:\u003c/b\u003e\u003c/summary\u003e\n\n- Install CANN: please follow this [link](https://ascend.github.io/docs/sources/ascend/quick_install.html) to install CANN toolkit and kernels.\n\n- Install WeNet with torch-npu dependencies:\n\n``` sh\npip install -e .[torch-npu]\n```\n\n- Related version control table:\n\n| Requirement  |      Minimum     | Recommend   |\n| ------------ | ---------------- | ----------- |\n| CANN         | 8.0.RC2.alpha003 | latest      |\n| torch        | 2.1.0            | 2.2.0       |\n| torch-npu    | 2.1.0            | 2.2.0       |\n| torchaudio   | 2.1.0            | 2.2.0       |\n| deepspeed    | 0.13.2           | latest      |\n\n\u003c/details\u003e\n\n- Install other python packages\n\n``` sh\npip install -r requirements.txt\npre-commit install  # for clean and tidy code\n```\n\n- Frequently Asked Questions (FAQs)\n\n``` sh\n# If you encounter sox compatibility issues\nRuntimeError: set_buffer_size requires sox extension which is not available.\n# ubuntu\nsudo apt-get install sox libsox-dev\n# centos\nsudo yum install sox sox-devel\n# conda env\nconda install  conda-forge::sox\n```\n\n**Build for deployment**\n\nOptionally, if you want to use x86 runtime or language model(LM),\nyou have to build the runtime as follows. Otherwise, you can just ignore this step.\n\n``` sh\n# runtime build requires cmake 3.14 or above\ncd runtime/libtorch\nmkdir build \u0026\u0026 cd build \u0026\u0026 cmake -DGRAPH_TOOLS=ON .. \u0026\u0026 cmake --build .\n```\n\nPlease see [doc](https://github.com/wenet-e2e/wenet/tree/main/runtime) for building\nruntime on more platforms and OS.\n\n\n## Discussion \u0026 Communication\n\nYou can directly discuss on [Github Issues](https://github.com/wenet-e2e/wenet/issues).\n\nFor Chinese users, you can also scan the QR code on the left to follow our official account of WeNet.\nWe created a WeChat group for better discussion and quicker response.\nPlease scan the personal QR code on the right, and the guy is responsible for inviting you to the chat group.\n\n| \u003cimg src=\"https://github.com/robin1001/qr/blob/master/wenet.jpeg\" width=\"250px\"\u003e | \u003cimg src=\"https://github.com/robin1001/qr/blob/master/binbin.jpeg\" width=\"250px\"\u003e |\n| ---- | ---- |\n\n\n## Acknowledge\n\n1. We borrowed a lot of code from [ESPnet](https://github.com/espnet/espnet) for transformer based modeling.\n2. We borrowed a lot of code from [Kaldi](http://kaldi-asr.org/) for WFST based decoding for LM integration.\n3. We referred [EESEN](https://github.com/srvk/eesen) for building TLG based graph for LM integration.\n4. We referred to [OpenTransformer](https://github.com/ZhengkunTian/OpenTransformer/) for python batch inference of e2e models.\n\n## Citations\n\n``` bibtex\n@inproceedings{yao2021wenet,\ntitle={WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit},\nauthor={Yao, Zhuoyuan and Wu, Di and Wang, Xiong and Zhang, Binbin and Yu, Fan and Yang, Chao and Peng, Zhendong and Chen, Xiaoyu and Xie, Lei and Lei, Xin},\n  booktitle={Proc. Interspeech},\n  year={2021},\n  address={Brno, Czech Republic },\n  organization={IEEE}\n}\n\n@article{zhang2022wenet,\n  title={WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit},\n  author={Zhang, Binbin and Wu, Di and Peng, Zhendong and Song, Xingchen and Yao, Zhuoyuan and Lv, Hang and Xie, Lei and Yang, Chao and Pan, Fuping and Niu, Jianwei},\n  journal={arXiv preprint arXiv:2203.15455},\n  year={2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwenet-e2e%2Fwenet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwenet-e2e%2Fwenet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwenet-e2e%2Fwenet/lists"}