https://github.com/sooftware/end-to-end-speech-recognition-models

PyTorch implementation of automatic speech recognition models.
https://github.com/sooftware/end-to-end-speech-recognition-models

acoustic-model asr deepspeech2 e2e end-to-end las listen-attend-and-spell pytorch transformer vad voice-activity-detection

Last synced: about 1 month ago
JSON representation

PyTorch implementation of automatic speech recognition models.

Host: GitHub
URL: https://github.com/sooftware/end-to-end-speech-recognition-models
Owner: sooftware
License: apache-2.0
Created: 2020-11-28T12:59:44.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2021-01-10T18:57:47.000Z (over 4 years ago)
Last Synced: 2025-04-09T23:51:30.117Z (about 1 month ago)
Topics: acoustic-model, asr, deepspeech2, e2e, end-to-end, las, listen-attend-and-spell, pytorch, transformer, vad, voice-activity-detection
Language: Python
Homepage:
Size: 84 KB
Stars: 38
Watchers: 2
Forks: 5
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # End-to-End Speech Recognition Models  

  

[](https://pytorch.org/) 



[![CodeFactor](https://www.codefactor.io/repository/github/sooftware/end-to-end-speech-recognition-models/badge)](https://www.codefactor.io/repository/github/sooftware/end-to-end-speech-recognition-models)

  

This repository contains end-to-end automatic speech recognition models.This repository does not include training or audio or text preprocessing codes. If you want to see the code other than the model, please refer to [here](https://github.com/sooftware/KoSpeech).   

Many speech recognition open sources contain all the training-related code, making it hard to see only the model structure. So I have created a repository for only the models I've implemented and make them public.   

I will continue to add to this the speech recognition models that I implement.  

  

## Implementation List  

  

- Deep Speech 2  

*Dario Amodei et al. [Deep Speech2: End-to-End Speech Recognition in English and Mandarin](https://arxiv.org/abs/1512.02595)*   

*SeanNaren. [deepspeech.pytorch](https://github.com/SeanNaren/deepspeech.pytorch)*

  

- Listen, Attend and Spell (modified version)   

*Wiliam Chan et al. [Listen, Attend and Spell](https://arxiv.org/abs/1508.01211)*   

*Takaaki Hori et al. [Advances in Joint CTC-Attention based E2E ASR with a Deep CNN Encoder and RNN-LM](https://arxiv.org/abs/1706.02737)*   

*IBM. [Pytorch-seq2seq](https://github.com/IBM/pytorch-seq2seq)*  

*clovaai. [ClovaCall](https://github.com/clovaai/ClovaCall)*

  

- Speech Transformer  

*Ashish Vaswani et al. [Attention Is All You Need](https://arxiv.org/abs/1706.03762)*     

*Yuanyuan Zhao et al. [The SpeechTransformer for Large-scale Mandarin Chinese Speech Recognition](https://ieeexplore.ieee.org/document/8682586)*  

*kaituoxu. [Speech-Transformer](https://github.com/kaituoxu/Speech-Transformer)*

  

- Jasper  

*Jason Li et al, [Jasper: An End-to-End Convolutional Neural Acoustic Model](https://arxiv.org/pdf/1904.03288.pdf)*  

*NVIDIA. [DeepLearningExample](https://github.com/NVIDIA/DeepLearningExamples)*

  

- Voice Activity Detection (1 dimensional Resnet Model)   

*filippogiruzzi. [voice_activity_detection](https://github.com/filippogiruzzi/voice_activity_detection)*

  

## Troubleshoots and Contributing

If you have any questions, bug reports, and feature requests, please [open an issue](https://github.com/sooftware/End-to-end-Speech-Recognition/issues) on Github.   

  

I appreciate any kind of feedback or contribution.  Feel free to proceed with small issues like bug fixes, documentation improvement.  For major contributions and new features, please discuss with the collaborators in corresponding issues.  

  

### Code Style

I follow [PEP-8](https://www.python.org/dev/peps/pep-0008/) for code style. Especially the style of docstrings is important to generate documentation.  

   

### License

This project is licensed under the Apache-2.0 LICENSE - see the [LICENSE.md](https://github.com/sooftware/End-to-End-Speech-Recognition-Models/blob/main/LICENSE) file for details

  

## Author

  

* Soohwan Kim [@sooftware](https://github.com/sooftware)

* Contacts: [email protected]

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sooftware/end-to-end-speech-recognition-models

Awesome Lists containing this project

README