{"id":19589698,"url":"https://github.com/jackaduma/las_mandarin_pytorch","last_synced_at":"2025-04-27T12:32:48.881Z","repository":{"id":49327362,"uuid":"263612259","full_name":"jackaduma/LAS_Mandarin_PyTorch","owner":"jackaduma","description":"Listen, attend and spell Model and a Chinese Mandarin Pretrained model  (中文-普通话 ASR模型)","archived":false,"fork":false,"pushed_at":"2023-04-28T22:19:39.000Z","size":459,"stargazers_count":111,"open_issues_count":6,"forks_count":16,"subscribers_count":4,"default_branch":"master","last_synced_at":"2023-11-07T18:24:45.984Z","etag":null,"topics":["asr","chinese-speech-recognition","deep-learning","deeplearning","listen-attend-and-spell","mandarin","pytorch-implementation","speech-recognition","speech-to-text"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jackaduma.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-13T11:38:50.000Z","updated_at":"2023-10-18T04:10:23.000Z","dependencies_parsed_at":"2022-08-25T17:24:06.994Z","dependency_job_id":null,"html_url":"https://github.com/jackaduma/LAS_Mandarin_PyTorch","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackaduma%2FLAS_Mandarin_PyTorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackaduma%2FLAS_Mandarin_PyTorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackaduma%2FLAS_Mandarin_PyTorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackaduma%2FLAS_Mandarin_PyTorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jackaduma","download_url":"https://codeload.github.com/jackaduma/LAS_Mandarin_PyTorch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224070386,"owners_count":17250651,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","chinese-speech-recognition","deep-learning","deeplearning","listen-attend-and-spell","mandarin","pytorch-implementation","speech-recognition","speech-to-text"],"created_at":"2024-11-11T08:20:18.793Z","updated_at":"2024-11-11T08:20:19.392Z","avatar_url":"https://github.com/jackaduma.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **LAS_Mandarin_PyTorch**\n\n[![standard-readme compliant](https://img.shields.io/badge/readme%20style-standard-brightgreen.svg?style=flat-square)](https://github.com/jackaduma/LAS_Mandarin_PyTorch)\n\n[**中文说明**](./README.zh-CN.md) | [**English**](./README.md)\n\nThis code is a PyTorch implementation for paper: [**Listen, Attend and Spell**](https://arxiv.org/abs/1508.01211]), a nice work on End-to-End **ASR**, **Speech Recognition** model.\n\nalso provides a **Chinese Mandarin ASR** pretrained model.\n\n- [x] Dataset\n  - [ ] [LibriSpeech]() for English Speech Recognition\n  - [x] [AISHELL-Speech](https://openslr.org/33/) for Chinese Mandarin Speech Recognition\n- [x] Usage\n  - [x] generate vocab file\n  - [x] training\n  - [x] test\n  - [ ] infer \n- [ ] Demo\n\n------\n\n## **Listen-Attend-Spell**\n\n### **Google Blog Page** \n\n[Improving End-to-End Models For Speech Recognition](https://ai.googleblog.com/2017/12/improving-end-to-end-models-for-speech.html)\n\nThe LAS architecture consists of 3 components. The listener encoder component, which is similar to a standard AM, takes the a time-frequency representation of the input speech signal, x, and uses a set of neural network layers to map the input to a higher-level feature representation, henc. The output of the encoder is passed to an attender, which uses henc to learn an alignment between input features x and predicted subword units {yn, … y0}, where each subword is typically a grapheme or wordpiece. Finally, the output of the attention module is passed to the speller (i.e., decoder), similar to an LM, that produces a probability distribution over a set of hypothesized words.\n\n\n![Components of the LAS End-to-End Model.\n](https://4.bp.blogspot.com/-D26UVY-JPh4/WjK9bo6LVtI/AAAAAAAACRk/ABz4VpV0uvUywryKqaaIXgFz4w-JukTegCLcBGAs/s640/image1.png \"Components of the LAS End-to-End Model.\n\")\n\nComponents of the LAS End-to-End Model.\n\n\n------\n\n**This repository contains:**\n\n1. [model code](core) which implemented the paper.\n2. [generate vocab file](generate_vocab_file.py), you can use to generate your vocab file for [your dataset](dataset).\n3. [training scripts](train_asr.py) to train the model.\n4. [testing scripts](test_asr.py) to test the model.\n\n------\n\n## **Table of Contents**\n\n- [**LAS\\_Mandarin\\_PyTorch**](#las_mandarin_pytorch)\n  - [**Listen-Attend-Spell**](#listen-attend-spell)\n    - [**Google Blog Page**](#google-blog-page)\n  - [**Table of Contents**](#table-of-contents)\n  - [**Requirement**](#requirement)\n  - [**Usage**](#usage)\n    - [**preprocess**](#preprocess)\n    - [**train**](#train)\n    - [**test**](#test)\n  - [**Pretrained**](#pretrained)\n    - [**English**](#english)\n    - [**Chinese Mandarin**](#chinese-mandarin)\n  - [**Demo**](#demo)\n  - [**Star-History**](#star-history)\n  - [**Reference**](#reference)\n  - [Donation](#donation)\n  - [**License**](#license)\n\n\n------\n\n\n## **Requirement** \n\n```bash\npip install -r requirements.txt\n```\n## **Usage**\n\n### **preprocess**\n\nFirst, we should generate our vocab file from dataset's transcripts file. Please reference code in [generate_vocab_file.py](generate_vocab_file.py). If you want train aishell data, you can use [generate_vocab_file_aishell.py](generate_vocab_file_aishell.py) directly.\n\n\n```python\npython generate_vocab_file_aishell.py --input_file $DATA_DIR/data_aishell/transcript_v0.8.txt --output_file ./aishell_vocab.txt --mode character --vocab_size 5000\n```\n\nit will create a vocab file named **aishell_vocab.txt** in your folder.\n\n\n### **train** \n\nBefore training, you need to write your dataset code in package [dataset](dataset).\n\nIf you want use my aishell dataset code, you also should take care about the transcripts file path in [data/aishell.py](dataset/aishell.py) line 26:\n\n```python\nsrc_file = \"/data/Speech/SLR33/data_aishell/\" + \"transcript/aishell_transcript_v0.8.txt\"\n```\n\nWhen ready. \n\nLet's train:\n\n```bash\npython main.py --config ./config/aishell_asr_example_lstm4atthead1.yaml\n```\n\nyou can write your config file, please reference [config/aishell_asr_example_lstm4atthead1.yaml](config/aishell_asr_example_lstm4atthead1.yaml)\n\nspecific variables: corpus's path \u0026 vocab_file\n\n### **test**\n\n```bash\npython main.py --config ./config/aishell_asr_example_lstm4atthead1.yaml --test\n```\n\n------\n\n## **Pretrained**\n\n### **English**\n\n### **Chinese Mandarin**\n\na pretrained model training on AISHELL-Dataset\n\ndownload from [Google Drive](https://drive.google.com/file/d/1Lcu6aFdoChvKEHuBs5_efNSk5edVkeyR/view?usp=sharing)\n\n------\n\n## **Demo**\n\ninference:\n\n\n```bash\npython infer.py\n```\n\n------\n\n## **Star-History**\n\n![star-history](https://api.star-history.com/svg?repos=jackaduma/LAS_Mandarin_PyTorch\u0026type=Date \"star-history\")\n\n------\n\n## **Reference**\n\n1. [**Listen, Attend and Spell**](https://arxiv.org/abs/1508.01211v2), W Chan et al.\n2. [Neural Machine Translation of Rare Words with Subword Units](http://www.aclweb.org/anthology/P16-1162), R Sennrich et al.\n3. [Attention-Based Models for Speech Recognition](https://arxiv.org/abs/1506.07503), J Chorowski et al.\n4. [Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks](https://www.cs.toronto.edu/~graves/icml_2006.pdf), A Graves et al.\n5. [Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning](https://arxiv.org/abs/1609.06773), S Kim et al.\n6. [Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM](https://arxiv.org/abs/1706.02737), T Hori et al.\n\n------\n\n## Donation\nIf this project help you reduce time to develop, you can give me a cup of coffee :) \n\nAliPay(支付宝)\n\u003cdiv align=\"center\"\u003e\n\t\u003cimg src=\"./misc/ali_pay.png\" alt=\"ali_pay\" width=\"400\" /\u003e\n\u003c/div\u003e\n\nWechatPay(微信)\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"./misc/wechat_pay.png\" alt=\"wechat_pay\" width=\"400\" /\u003e\n\u003c/div\u003e\n\n------\n\n## **License**\n\n[MIT](LICENSE) © Kun","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjackaduma%2Flas_mandarin_pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjackaduma%2Flas_mandarin_pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjackaduma%2Flas_mandarin_pytorch/lists"}