{"id":13689885,"url":"https://github.com/srvk/eesen","last_synced_at":"2025-05-02T06:31:34.801Z","repository":{"id":44159422,"uuid":"37831007","full_name":"srvk/eesen","owner":"srvk","description":"The official repository of the Eesen project","archived":false,"fork":false,"pushed_at":"2019-05-23T03:21:22.000Z","size":6146,"stargazers_count":824,"open_issues_count":60,"forks_count":343,"subscribers_count":82,"default_branch":"master","last_synced_at":"2024-11-12T15:43:14.556Z","etag":null,"topics":["asr","ctc","ctc-loss","kaldi","speech-recognition","speech-to-text","tensorflow"],"latest_commit_sha":null,"homepage":"http://arxiv.org/abs/1507.08240","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/srvk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-06-21T23:58:33.000Z","updated_at":"2024-10-28T08:25:42.000Z","dependencies_parsed_at":"2022-07-30T10:17:57.850Z","dependency_job_id":null,"html_url":"https://github.com/srvk/eesen","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srvk%2Feesen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srvk%2Feesen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srvk%2Feesen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srvk%2Feesen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/srvk","download_url":"https://codeload.github.com/srvk/eesen/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251998498,"owners_count":21677989,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","ctc","ctc-loss","kaldi","speech-recognition","speech-to-text","tensorflow"],"created_at":"2024-08-02T16:00:32.134Z","updated_at":"2025-05-02T06:31:33.438Z","avatar_url":"https://github.com/srvk.png","language":"C++","funding_links":[],"categories":["\u003ca name=\"toolkits\"\u003e\u003c/a\u003e 2. Toolkits","C++"],"sub_categories":["\u003ca name=\"paperlist\"\u003e\u003c/a\u003e 1.2 All Paper List"],"readme":"### Eesen\n\n**Eesen** is to simplify the existing complicated, expertise-intensive ASR pipeline into a straightforward sequence learning problem. Acoustic modeling in Eesen involves training a single recurrent neural network (RNN) to model the mapping from speech to text. Eesen abandons the following elements required by the existing ASR pipeline:\n\n* Hidden Markov models (HMMs)\n* Gaussian mixture models (GMMs)\n* Decision trees and phonetic questions\n* Dictionary, if characters are used as the modeling units\n* **...**\n\nEesen was created by [Yajie Miao](http://www.cs.cmu.edu/~ymiao) with inspiration from the [Kaldi](https://github.com/kaldi-asr/kaldi) toolkit. [Thank you, Yajie!](https://www.youcaring.com/iscainternationalspeechcommunicationassociation-815026)\n\n### Key Components\n\nEesen contains 4 key components to enable end-to-end ASR:\n* **Acoustic Model**  -- Bi-directional RNNs with LSTM units.\n* **Training**        -- [Connectionist temporal classification (CTC)](http://www.machinelearning.org/proceedings/icml2006/047_Connectionist_Tempor.pdf) as the training objective.\n* **WFST Decoding**   -- A principled decoding approach based on Weighted Finite-State Transducers (WFSTs), or \n* **RNN-LM Decoding** -- Decoding based on (character) [RNN language models](https://arxiv.org/abs/1408.2873), when using Tensorflow (currently its own branch)\n\n### Highlights of Eesen\n\n* The WFST-based decoding approach can incorporate lexicons and language models into CTC decoding in an effective and efficient way.\n* The RNN-LM decoding approach does not require a fixed lexicon.\n* GPU implementation of LSTM model training and CTC learning, now also using [Tensorflow](https://www.tensorflow.org/).\n* Multiple utterances are processed in parallel for training speed-up.\n* Fully-fledged [example setups](asr_egs/) to demonstrate end-to-end system building, with both phonemes and characters as labels, following [Kaldi](https://github.com/kaldi-asr/kaldi) recipes and conventions.\n\n### Experimental Results\n\nRefer to RESULTS under each [example setup](asr_egs/).\n\n### References\n\nFor more information, please refer to the following paper(s):\n\nYajie Miao, Mohammad Gowayyed, and Florian Metze, \"[EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding](http://arxiv.org/abs/1507.08240),\" in Proc. Automatic Speech Recognition and Understanding Workshop (ASRU), Scottsdale, AZ; U.S.A., December 2015. IEEE.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrvk%2Feesen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsrvk%2Feesen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrvk%2Feesen/lists"}