https://github.com/sooftware/end-to-end-speech-recognition-models
PyTorch implementation of automatic speech recognition models.
https://github.com/sooftware/end-to-end-speech-recognition-models
acoustic-model asr deepspeech2 e2e end-to-end las listen-attend-and-spell pytorch transformer vad voice-activity-detection
Last synced: about 1 month ago
JSON representation
PyTorch implementation of automatic speech recognition models.
- Host: GitHub
- URL: https://github.com/sooftware/end-to-end-speech-recognition-models
- Owner: sooftware
- License: apache-2.0
- Created: 2020-11-28T12:59:44.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2021-01-10T18:57:47.000Z (over 4 years ago)
- Last Synced: 2025-04-09T23:51:30.117Z (about 1 month ago)
- Topics: acoustic-model, asr, deepspeech2, e2e, end-to-end, las, listen-attend-and-spell, pytorch, transformer, vad, voice-activity-detection
- Language: Python
- Homepage:
- Size: 84 KB
- Stars: 38
- Watchers: 2
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# End-to-End Speech Recognition Models
[](https://pytorch.org/)
![]()
[](https://www.codefactor.io/repository/github/sooftware/end-to-end-speech-recognition-models)
This repository contains end-to-end automatic speech recognition models.This repository does not include training or audio or text preprocessing codes. If you want to see the code other than the model, please refer to [here](https://github.com/sooftware/KoSpeech).
Many speech recognition open sources contain all the training-related code, making it hard to see only the model structure. So I have created a repository for only the models I've implemented and make them public.
I will continue to add to this the speech recognition models that I implement.
## Implementation List
- Deep Speech 2
*Dario Amodei et al. [Deep Speech2: End-to-End Speech Recognition in English and Mandarin](https://arxiv.org/abs/1512.02595)*
*SeanNaren. [deepspeech.pytorch](https://github.com/SeanNaren/deepspeech.pytorch)*
- Listen, Attend and Spell (modified version)
*Wiliam Chan et al. [Listen, Attend and Spell](https://arxiv.org/abs/1508.01211)*
*Takaaki Hori et al. [Advances in Joint CTC-Attention based E2E ASR with a Deep CNN Encoder and RNN-LM](https://arxiv.org/abs/1706.02737)*
*IBM. [Pytorch-seq2seq](https://github.com/IBM/pytorch-seq2seq)*
*clovaai. [ClovaCall](https://github.com/clovaai/ClovaCall)*
- Speech Transformer
*Ashish Vaswani et al. [Attention Is All You Need](https://arxiv.org/abs/1706.03762)*
*Yuanyuan Zhao et al. [The SpeechTransformer for Large-scale Mandarin Chinese Speech Recognition](https://ieeexplore.ieee.org/document/8682586)*
*kaituoxu. [Speech-Transformer](https://github.com/kaituoxu/Speech-Transformer)*
- Jasper
*Jason Li et al, [Jasper: An End-to-End Convolutional Neural Acoustic Model](https://arxiv.org/pdf/1904.03288.pdf)*
*NVIDIA. [DeepLearningExample](https://github.com/NVIDIA/DeepLearningExamples)*
- Voice Activity Detection (1 dimensional Resnet Model)
*filippogiruzzi. [voice_activity_detection](https://github.com/filippogiruzzi/voice_activity_detection)*
## Troubleshoots and Contributing
If you have any questions, bug reports, and feature requests, please [open an issue](https://github.com/sooftware/End-to-end-Speech-Recognition/issues) on Github.
I appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.
### Code Style
I follow [PEP-8](https://www.python.org/dev/peps/pep-0008/) for code style. Especially the style of docstrings is important to generate documentation.
### License
This project is licensed under the Apache-2.0 LICENSE - see the [LICENSE.md](https://github.com/sooftware/End-to-End-Speech-Recognition-Models/blob/main/LICENSE) file for details
## Author
* Soohwan Kim [@sooftware](https://github.com/sooftware)
* Contacts: [email protected]