{"id":15540042,"url":"https://github.com/blaizzy/evaluating-asr-accent-robustness","last_synced_at":"2025-10-17T11:47:39.093Z","repository":{"id":178451277,"uuid":"656846843","full_name":"Blaizzy/evaluating-asr-accent-robustness","owner":"Blaizzy","description":null,"archived":false,"fork":false,"pushed_at":"2023-09-25T10:49:10.000Z","size":2895,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-28T16:46:49.729Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Blaizzy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-21T19:06:10.000Z","updated_at":"2023-06-24T01:02:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"30c277b9-842c-4246-8761-dded58209035","html_url":"https://github.com/Blaizzy/evaluating-asr-accent-robustness","commit_stats":null,"previous_names":["blaizzy/evaluating-asr-accent-robustness"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Blaizzy%2Fevaluating-asr-accent-robustness","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Blaizzy%2Fevaluating-asr-accent-robustness/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Blaizzy%2Fevaluating-asr-accent-robustness/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Blaizzy%2Fevaluating-asr-accent-robustness/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Blaizzy","download_url":"https://codeload.github.com/Blaizzy/evaluating-asr-accent-robustness/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246117761,"owners_count":20726069,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-02T12:12:17.486Z","updated_at":"2025-10-17T11:47:34.061Z","avatar_url":"https://github.com/Blaizzy.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Evaluating ASR Systems Accent Robustness\n[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Blaizzy/evaluating-asr-accent-robustness/blob/main/notebooks/eval.ipynb)\n\nWelcome to the repository housing the master thesis research of Prince Canuma, a student of the MSc Big Data Analytics program at Wrocław University of Science and Technology. This body of work reflects the dedication and scientific rigour Prince has applied to his studies.\n\nAuthored by Prince Canuma.\n\nTo learn more about Prince and his work, feel free to connect with him on:\n\n* [LinkedIn](https://www.linkedin.com/in/prince-canuma/)\n* [Twitter](https://twitter.com/CanumaGdt)\n  \n## Data\nAccentsDB is a database with a wide array of native and non-native English accents speech samples from around the world for testing the robustness and adaptability of ASR systems to various accents. In total, it has 23 speakers, 19:49 hours, 16,984 speech samples and 9 accents, split into 4 native accents, namely American, Australian, British, and Welsh; 1 metropolitan Indian accent and 4 non-native accents, namely Bangladeshi, Malayalam, Odiya and Telugu. The AccentsDB does not inherently provide transcriptions, which requires\n\n\u003cimg src=\"assets/accentsDB_stats.png\" width=500\u003e\n\nFigure 1: A table containing details of the AccentsDB statistics. A total of 9 accents, out of which the authors collected 4 and 5 using Amazon Polly an AWS service for generating synthetic natural-sounding human speech from text.\n\nthe generation of transcriptions for the entirety of its speech samples. This task was accomplished utilizing **Wav2Vec 2.0**, an advanced open-source model. After this transcription generation, each entry was manually inspected and, where necessary, rectified to ensure the accuracy of the transcriptions.\nHowever, the Malayam accent and a portion of the dataset posed significant challenges due to the lack of aligned speech samples, which complicated the process of generating and verifying the labels. As a result, these were excluded from the study. Consequently, this study’s final subset of AccentsDB comprised 12,614 samples, spanning 8 distinct accents.\n\nDataset link: https://huggingface.co/datasets/prince-canuma/accentsDB-with-transcripts\n\n## Metrics \nThe primary metric for evaluation will be the **Word Error Rate** (WER), which measures the percentage of words that were transcribed incorrectly. Additionally, we will use the **Character Error** (CER) to measure the number of character-level mistakes in the transcriptions. These metrics will provide a clear picture of the transcription accuracy of the evaluated models.\n\n## Tools \nIn developing and evaluating the Whisper ASR model, several key tools were utilized, which not only facilitated the research process but also ensured the robustness and reproducibility of the experiments. These tools include **[PyTorch](https://github.com/pytorch/pytorch)**, **[Huggingface Datasets](https://github.com/huggingface/datasets)**, **[Huggingface Transformers](https://github.com/huggingface/transformers)**, and **[Huggingface Hub](https://github.com/huggingface/huggingface_hub)**.\n\n## Models\nWe will compare the performance of the following ASR models:\n\n* **Whisper**: OpenAI’s ASR system trained on vast and diverse datasets.\n* **HuBERT**: Facebook AI’s ASR model pre-trained on LibriSpeech. Developed to have a competitive or superior performance to the wav2vec 2.0 on all fine-tuning subsets.\n* **Wav2Vec 2.0**: Facebook AI’s ASR model trained using a novel self-supervised learning approach. 3.5.1 Whisper\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblaizzy%2Fevaluating-asr-accent-robustness","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblaizzy%2Fevaluating-asr-accent-robustness","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblaizzy%2Fevaluating-asr-accent-robustness/lists"}