{"id":13935700,"url":"https://github.com/hirofumi0810/tensorflow_end2end_speech_recognition","last_synced_at":"2025-07-19T20:33:45.437Z","repository":{"id":169858394,"uuid":"92275349","full_name":"hirofumi0810/tensorflow_end2end_speech_recognition","owner":"hirofumi0810","description":"End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)","archived":false,"fork":false,"pushed_at":"2018-01-23T02:05:10.000Z","size":4376,"stargazers_count":313,"open_issues_count":11,"forks_count":120,"subscribers_count":34,"default_branch":"master","last_synced_at":"2024-08-08T23:21:34.487Z","etag":null,"topics":["asr","attention-mechanism","automatic-speech-recognition","beam-search","csj","ctc","end-to-end","end-to-end-learning","joint-ctc-attention","librispeech","speech-recognition","speech-to-text","tensorflow","timit","timit-dataset"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hirofumi0810.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-05-24T09:35:21.000Z","updated_at":"2024-07-30T07:00:00.000Z","dependencies_parsed_at":null,"dependency_job_id":"cbcf1b0d-7fe2-4b29-a74e-15570ea765e0","html_url":"https://github.com/hirofumi0810/tensorflow_end2end_speech_recognition","commit_stats":null,"previous_names":["hirofumi0810/tensorflow_end2end_speech_recognition"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hirofumi0810%2Ftensorflow_end2end_speech_recognition","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hirofumi0810%2Ftensorflow_end2end_speech_recognition/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hirofumi0810%2Ftensorflow_end2end_speech_recognition/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hirofumi0810%2Ftensorflow_end2end_speech_recognition/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hirofumi0810","download_url":"https://codeload.github.com/hirofumi0810/tensorflow_end2end_speech_recognition/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226677258,"owners_count":17666021,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","attention-mechanism","automatic-speech-recognition","beam-search","csj","ctc","end-to-end","end-to-end-learning","joint-ctc-attention","librispeech","speech-recognition","speech-to-text","tensorflow","timit","timit-dataset"],"created_at":"2024-08-07T23:02:00.361Z","updated_at":"2024-11-27T03:31:10.710Z","avatar_url":"https://github.com/hirofumi0810.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"## TensorFlow Implementation of End-to-End Speech Recognition\n### Requirements\n- TensorFlow \u003e= 1.3.0\n- tqdm \u003e= 4.14.0\n- python-Levenshtein \u003e= 0.12.0\n- setproctitle \u003e= 1.1.10\n- seaborn \u003e= 0.7.1\n\n\n### Corpus\n#### [TIMIT](https://catalog.ldc.upenn.edu/LDC93S1)\n- Phone (39, 48, 61 phones)\n- character\n\n#### [LibriSpeech](http://www.openslr.org/12/)\n- Phone (under implementation)\n- Character\n- Word\n\n#### [CSJ (Corpus of Spontaneous Japanese)](http://pj.ninjal.ac.jp/corpus_center/csj/en/)\n- Phone (under implementation)\n- Japanese kana character (about 150 classes)\n- Japanese kanji characters (about 3000 classes)\n\nThese corpuses will be added in the future.\n- Switchboard\n- WSJ\n- [AMI](http://groups.inf.ed.ac.uk/ami/corpus/)\n\nThis repository does'nt include pre-processing and pre-processing is based on [this repo](https://github.com/hirofumi0810/asr_preprocessing).\nIf you want to do pre-processing, please look at this repo.\n\n\n### Model\n#### Encoder\n- BLSTM\n- LSTM\n- BGRU\n- GRU\n- VGG-BLSTM\n- VGG-LSTM\n- Multi-task BLSTM\n  - you can set another CTC layer to the aubitrary layer.\n- Multi-task LSTM\n- VGG\n\n\n#### Connectionist Temporal Classification (CTC) [\\[Graves+ 2006\\]](http://dl.acm.org/citation.cfm?id=1143891)\n- Greedy decoder\n- Beam Search decoder\n- Beam Search decoder w/ CharLM (under implementation)\n\n##### Options\n- Frame-stacking [\\[Sak+ 2015\\]](https://arxiv.org/abs/1507.06947)\n- Multi-GPUs training (synchronous)\n- Splicing\n- Down sampling (under implementation)\n\n\n#### Attention Mechanism\n##### Decoder\n- Greedy decoder\n- Beam search decoder (under implementation)\n\n##### Attention type\n- Bahdanau's content-based attention\n- Bahdanau's normed content-based attention (under implementation)\n- location-based attention\n- Hybrid attention\n- Luong's dot attention\n- Luong's scaled dot attention (under implementation)\n- Luong's general attention\n- Luong's concat attention\n- Baidu's attention (under implementation)\n\n###### Options\n- Sharpning\n- Temperature regularization in the softmax layer (Output posteriors)\n- Joint CTC-Attention [\\[Kim 2016\\]](https://arxiv.org/abs/1609.06773.)\n- Coverage (under implementation)\n\n\n### Usage\nPlease refer to docs in each corpuse\n- TIMIT\n- LibriSpeech\n- CSJ\n\n\n### Lisense\nMIT\n\n\n### Contact\nhiro.mhbc@gmail.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhirofumi0810%2Ftensorflow_end2end_speech_recognition","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhirofumi0810%2Ftensorflow_end2end_speech_recognition","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhirofumi0810%2Ftensorflow_end2end_speech_recognition/lists"}