{"id":18674034,"url":"https://github.com/jmaczan/asr-dysarthria","last_synced_at":"2025-04-12T01:32:03.685Z","repository":{"id":218087124,"uuid":"742606383","full_name":"jmaczan/asr-dysarthria","owner":"jmaczan","description":"Research on Automatic Speech Recognition for dysarthric speech","archived":false,"fork":false,"pushed_at":"2024-10-09T08:01:50.000Z","size":2768,"stargazers_count":11,"open_issues_count":4,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-25T21:38:10.498Z","etag":null,"topics":["asr","automatic-speech-recognition","deep-learning","dysarthria","dysarthric-speech","self-supervised-learning","wav2vec2"],"latest_commit_sha":null,"homepage":"https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jmaczan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-12T21:17:34.000Z","updated_at":"2025-03-21T13:54:30.000Z","dependencies_parsed_at":"2024-07-16T10:33:04.418Z","dependency_job_id":"5f02220b-491a-45c3-a08f-ea25d9bbb0ba","html_url":"https://github.com/jmaczan/asr-dysarthria","commit_stats":null,"previous_names":["jmaczan/asr-dysarthria"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmaczan%2Fasr-dysarthria","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmaczan%2Fasr-dysarthria/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmaczan%2Fasr-dysarthria/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jmaczan%2Fasr-dysarthria/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jmaczan","download_url":"https://codeload.github.com/jmaczan/asr-dysarthria/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248504294,"owners_count":21115142,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","automatic-speech-recognition","deep-learning","dysarthria","dysarthric-speech","self-supervised-learning","wav2vec2"],"created_at":"2024-11-07T09:17:17.499Z","updated_at":"2025-04-12T01:32:03.270Z","avatar_url":"https://github.com/jmaczan.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ASR Dysarthria\n\nAutomatic speech recognition for people with dysarthria\n\nThis repo is under heavy research and development and so the README.md is outdated. Sorry!\n\nI deployed a web page so you can use a model in your browser: https://asr-dysarthria-preliminary.pages.dev/\n\n## Training\n\nUse this Jupyter Notebook [wav2vec2-large-xls-r-300m-dysarthria-big-dataset.ipynb](wav2vec2-large-xls-r-300m-dysarthria-big-dataset.ipynb) to train your own model\n\n## Installation\n\nPrerequisities:\n\n- Python \u003e= 3.10\n- Anaconda\n\nSteps:\n\n- `conda install --file requirements.txt`\n\n## Inference\n\nIn directory cli-app:\n\nRun model.safetensors: `python -m run`\n\nRun ONNX: `python -m onnx_run`\n\nAdjust these scripts if needed (by default they translate a `file.wav` file in `cli-app` folder)\n\n## Deploying\n\nDownload and convert trained model (model.safetensors file)\n\n```sh\nmkdir models\npython scripts/convert_model.py --url https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria-big-dataset/resolve/main/model.safetensors --output models\n```\n\nServe it\n\n```\ncd web-app\npython -m http.server\n```\n\n## Pretrained models\n\n- [Recommended] Loss: 0.0864, Wer: 0.182 https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria-big-dataset\n- Loss: 0.0615 Wer: 0.1764 https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria\n\n## Datasets\n\n- Uaspeech https://huggingface.co/datasets/Vinotha/uaspeechall\n- TORGO https://huggingface.co/datasets/jmaczan/TORGO\n\n## Description\n\nThe code here is based on Patrick von Platen's article and notebook https://huggingface.co/blog/fine-tune-xlsr-wav2vec2\n\n## Resources\n\n### Papers\n\nhttps://ar5iv.labs.arxiv.org/html/2204.00770 (https://arxiv.org/abs/2204.00770)\n\nhttps://www.isca-speech.org/archive/pdfs/interspeech_2022/baskar22b_interspeech.pdf\n\nhttps://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=10225595\n\nhttps://www.sciencedirect.com/science/article/pii/S2405959521000874\n\nhttps://www.isca-speech.org/archive/pdfs/interspeech_2021/green21_interspeech.pdf\n\nhttps://arxiv.org/pdf/2006.11477.pdf\n\nhttps://arxiv.org/pdf/2211.00089.pdf\n\nhttps://www.sciencedirect.com/science/article/abs/pii/S0957417423002981\n\n### Code\n\nhttps://huggingface.co/blog/fine-tune-wav2vec2-english\n\n### Data\n\nhttp://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html\n\n### Dataset\n\n#### Big\n\nhttps://huggingface.co/datasets/jmaczan/TORGO\n\n#### Small\n\nhttps://huggingface.co/datasets/jmaczan/TORGO-very-small\n\n### Others\n\nhttps://ai.meta.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/\n\nhttps://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html\n\nhttps://huggingface.co/docs/datasets/v2.16.1/audio_dataset\n\nhttps://distill.pub/2017/ctc/\n\nhttps://ai.meta.com/blog/self-supervision-and-building-more-robust-speech-recognition-systems/\n\n\n## Cite\nIf you use this repository in your research, please use the following citation:\n\n```bibtex\n@misc{Maczan_ASR_Dysarthria_2024,\n  title = \"Research on Automatic Speech Recognition for dysarthric speech\",\n  author = \"{Maczan, Jędrzej Paweł}\",\n  howpublished = \"\\url{https://github.com/jmaczan/asr-dysarthria}\",\n  year = 2024,\n  publisher = {GitHub}\n}\n```\n\n## License\n\nMIT License\n\n## Author\n\nJędrzej Paweł Maczan\n\nhttps://huggingface.co/jmaczan | jed@maczan.pl | https://github.com/jmaczan\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjmaczan%2Fasr-dysarthria","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjmaczan%2Fasr-dysarthria","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjmaczan%2Fasr-dysarthria/lists"}