https://github.com/mddct/neural-lm-deprecated

focus on fusion on speech recognition
https://github.com/mddct/neural-lm-deprecated

Last synced: 8 months ago
JSON representation

focus on fusion on speech recognition

Host: GitHub
URL: https://github.com/mddct/neural-lm-deprecated
Owner: Mddct
Archived: true
Created: 2022-06-03T08:57:21.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2023-03-27T11:17:03.000Z (about 3 years ago)
Last Synced: 2025-01-31T07:34:50.648Z (over 1 year ago)
Language: Python
Homepage:
Size: 74.2 KB
Stars: 5
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # (deprecated, will reimplement by jax) under development may not work until whole pipeline done

# neural-lm

focus on fusion on speech recognition

# Note 

> When a language model is used wide beam searches often yield

> incomplete transcripts. With narrow beams, the problem is less

> visible due to implicit hypothesis pruning.

See if it appears in ctc+lm fusion

# TODO

- [x] adaptive softmax for large voca (because pytorch offical implementation can't work with torchscript)

- [ ] onnx support and torchscript

- [x] gru

- [x] rnn tie embedding

- [ ] gru fusion on wenet runtime ctc prefix beam search

- [ ] transformer-xl with cache

- [ ] transformer-xl with cache to fusion 

- [ ] mwer training when lm fusion 

- [ ] etc

# reference

- [Deep Speech: Scaling up end-to-end speech recognition](https://arxiv.org/pdf/1412.5567.pdf) 

- [END-TO-END ATTENTION-BASED LARGE VOCABULARY SPEECH RECOGNITION](https://arxiv.org/pdf/1508.04395.pdf)

- [On Using Monolingual Corpora in Neural Machine Translation](https://arxiv.org/pdf/1503.03535.pdf)

- [First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs](https://arxiv.org/pdf/1408.2873.pdf)

- [Towards better decoding and language model integration in sequence to sequence models](https://arxiv.org/pdf/1612.02695.pdf)

- [END-TO-END ATTENTION-BASED LARGE VOCABULARY SPEECH RECOGNITION](https://arxiv.org/pdf/1508.04395.pdf)

- [Efficient softmax approximation for GPUs](https://arxiv.org/pdf/1609.04309.pdf)

- [Using the Output Embedding to Improve Language Models](https://arxiv.org/abs/1608.05859)

- [Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition](https://arxiv.org/pdf/2106.02302.pdf)

- etc

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mddct/neural-lm-deprecated

Awesome Lists containing this project

README