https://github.com/jmaczan/asr-dysarthria
Research on Automatic Speech Recognition for dysarthric speech
https://github.com/jmaczan/asr-dysarthria
asr automatic-speech-recognition deep-learning dysarthria dysarthric-speech self-supervised-learning wav2vec2
Last synced: about 1 year ago
JSON representation
Research on Automatic Speech Recognition for dysarthric speech
- Host: GitHub
- URL: https://github.com/jmaczan/asr-dysarthria
- Owner: jmaczan
- Created: 2024-01-12T21:17:34.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-09T08:01:50.000Z (over 1 year ago)
- Last Synced: 2025-03-25T21:38:10.498Z (about 1 year ago)
- Topics: asr, automatic-speech-recognition, deep-learning, dysarthria, dysarthric-speech, self-supervised-learning, wav2vec2
- Language: Jupyter Notebook
- Homepage: https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria
- Size: 2.64 MB
- Stars: 11
- Watchers: 2
- Forks: 2
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ASR Dysarthria
Automatic speech recognition for people with dysarthria
This repo is under heavy research and development and so the README.md is outdated. Sorry!
I deployed a web page so you can use a model in your browser: https://asr-dysarthria-preliminary.pages.dev/
## Training
Use this Jupyter Notebook [wav2vec2-large-xls-r-300m-dysarthria-big-dataset.ipynb](wav2vec2-large-xls-r-300m-dysarthria-big-dataset.ipynb) to train your own model
## Installation
Prerequisities:
- Python >= 3.10
- Anaconda
Steps:
- `conda install --file requirements.txt`
## Inference
In directory cli-app:
Run model.safetensors: `python -m run`
Run ONNX: `python -m onnx_run`
Adjust these scripts if needed (by default they translate a `file.wav` file in `cli-app` folder)
## Deploying
Download and convert trained model (model.safetensors file)
```sh
mkdir models
python scripts/convert_model.py --url https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria-big-dataset/resolve/main/model.safetensors --output models
```
Serve it
```
cd web-app
python -m http.server
```
## Pretrained models
- [Recommended] Loss: 0.0864, Wer: 0.182 https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria-big-dataset
- Loss: 0.0615 Wer: 0.1764 https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria
## Datasets
- Uaspeech https://huggingface.co/datasets/Vinotha/uaspeechall
- TORGO https://huggingface.co/datasets/jmaczan/TORGO
## Description
The code here is based on Patrick von Platen's article and notebook https://huggingface.co/blog/fine-tune-xlsr-wav2vec2
## Resources
### Papers
https://ar5iv.labs.arxiv.org/html/2204.00770 (https://arxiv.org/abs/2204.00770)
https://www.isca-speech.org/archive/pdfs/interspeech_2022/baskar22b_interspeech.pdf
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10225595
https://www.sciencedirect.com/science/article/pii/S2405959521000874
https://www.isca-speech.org/archive/pdfs/interspeech_2021/green21_interspeech.pdf
https://arxiv.org/pdf/2006.11477.pdf
https://arxiv.org/pdf/2211.00089.pdf
https://www.sciencedirect.com/science/article/abs/pii/S0957417423002981
### Code
https://huggingface.co/blog/fine-tune-wav2vec2-english
### Data
http://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html
### Dataset
#### Big
https://huggingface.co/datasets/jmaczan/TORGO
#### Small
https://huggingface.co/datasets/jmaczan/TORGO-very-small
### Others
https://ai.meta.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/
https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html
https://huggingface.co/docs/datasets/v2.16.1/audio_dataset
https://distill.pub/2017/ctc/
https://ai.meta.com/blog/self-supervision-and-building-more-robust-speech-recognition-systems/
## Cite
If you use this repository in your research, please use the following citation:
```bibtex
@misc{Maczan_ASR_Dysarthria_2024,
title = "Research on Automatic Speech Recognition for dysarthric speech",
author = "{Maczan, Jędrzej Paweł}",
howpublished = "\url{https://github.com/jmaczan/asr-dysarthria}",
year = 2024,
publisher = {GitHub}
}
```
## License
MIT License
## Author
Jędrzej Paweł Maczan
https://huggingface.co/jmaczan | jed@maczan.pl | https://github.com/jmaczan