{"id":13701241,"url":"https://github.com/symblai/speech-recognition-evaluation","last_synced_at":"2025-03-18T00:17:15.282Z","repository":{"id":45240464,"uuid":"265102375","full_name":"symblai/speech-recognition-evaluation","owner":"symblai","description":"Evaluate results from ASR/Speech-to-Text quickly","archived":false,"fork":false,"pushed_at":"2021-12-28T20:23:28.000Z","size":37,"stargazers_count":36,"open_issues_count":2,"forks_count":7,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-02-24T06:39:21.873Z","etag":null,"topics":["accuracy","asr","comparison","diff","difference","evaluation","insertions","mismatches","punctuations","speech-recognition","speech-to-text","statistics","stt","transcriptions","wer","word-error-rate","words"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/symblai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-19T00:53:32.000Z","updated_at":"2024-09-26T18:59:01.000Z","dependencies_parsed_at":"2022-08-23T19:30:26.515Z","dependency_job_id":null,"html_url":"https://github.com/symblai/speech-recognition-evaluation","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/symblai%2Fspeech-recognition-evaluation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/symblai%2Fspeech-recognition-evaluation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/symblai%2Fspeech-recognition-evaluation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/symblai%2Fspeech-recognition-evaluation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/symblai","download_url":"https://codeload.github.com/symblai/speech-recognition-evaluation/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244130305,"owners_count":20402756,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accuracy","asr","comparison","diff","difference","evaluation","insertions","mismatches","punctuations","speech-recognition","speech-to-text","statistics","stt","transcriptions","wer","word-error-rate","words"],"created_at":"2024-08-02T20:01:23.722Z","updated_at":"2025-03-18T00:17:15.261Z","avatar_url":"https://github.com/symblai.png","language":"JavaScript","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"readme":"# Automatic Speech Recognition (ASR) Evaluation\n\nIf you're using any Speech-to-Text or Speech Recognition system to generate transcriptions from your audio/video content, then you can use this tool to compare how well it is doing against a human generated transcription. If you're not sure how to generate transcription, you can take a look [here](https://docs.symbl.ai/#how-tos) for list of tutorials to help you get started.\n\n## What can this utility do?\nThis is a simple utility to perform a quick evaluation on the results generated by any Speech to text (STT) or Automatic Speech Recognition (ASR) System.\n\nThis utility can calculate following metrics -\n* [Word Error Rate (WER)](https://en.wikipedia.org/wiki/Word_error_rate), which is a most common metric of measuring the performance of a Speech Recognition or Machine translation system\n* Word Information Loss (WIL), which is a simple approximation to the proportion of word information lost. Refer to [this paper](https://www.isca-speech.org/archive_v0/archive_papers/interspeech_2004/i04_2765.pdf) for more info.\n* [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) calculated at word level.\n* Number of Word level insertions, deletions and mismatches between the original file and the generated file.\n* Number of Phrase level insertions, deletions and mismatches between the original file and the generated file.\n* Color Highlighted text Comparison to visualize the differences.\n* General Statistics about the original and generated files (bytes, characters, words, new lines etc.)\n\nThe utility also performs the pre-processing or normalization of the text in the provided files based on following operations -\n* Remove Speaker Name: Remove the Speaker name at the beginning of the line.\n* Remove Annotations: Remove any custom annotations added during transcriptions.\n* Remove Whitespaces: Remove any extra white spaces.\n* Remove Quotes: Remove any double quotes\n* Remove Dashes: Remove any dashes\n* Remove Punctuations: Remove any punctuations (.,?!)\n* Convert contents to lower case\n\n## Pre-requisites\nMake sure that you have [NodeJS v8+](https://nodejs.org/en/download/) installed on your system.\n\n## Installation\n```bash\nnpm install -g speech-recognition-evaluation\n```\nVerify installation by simply running:\n```bash\nasr-eval\n```\n\n## Usage\nSimplest way to run your first evaluation is by simply passing `original` and `generated` options to `asr-eval` command.\nWhere, `original` is a plain text file containing original transcript to be used as reference; usually this is generated by human beings.\nAnd `generated` is a plain text file containing generated transcript by the STT/ASR system.\n\n```bash\nasr-eval --original ./original-file.txt --generated ./generated-file.txt\n```\n\nThis would print simply the Word Error Rate (WER) between the provided files. This is how the output should look like:\n```\nWord Error Rate (WER): 13.61350109561817%\n```\n\nTo find more information about all the available options:\n```bash\nasr-eval --help\n```\nAll the available usage options would be printed:\n```\nSynopsis\n\n  $ asr-eval --original file --generated file           \n  $ asr-eval [options] --original file --generated file \n  $ asr-eval --help                                     \n\nOptions\n\n  -o, --original file                 Original File to be used as reference. Usually, this should be the            \n                                      transcribed file by a Human being.                                            \n  -g, --generated file                File with the output generated by Speech Recognition System.                  \n  -e, --wer [true|false]              Default: true. Print Word Error Rate (WER).                                   \n  -i, --wil [true|false]              Default: true. Print Word Information Loss (WIL).                             \n  --distance [true|false]             Default: false. Print total word distance after comparison.                   \n  --stats [true|false]                Default: false. Print statistics about original and generate files, before    \n                                      and after pre-processing. Also prints statistics about word level and phrase  \n                                      level differences.                                                            \n  --pairs [true|false]                Default: false. Print all the difference pairs with type of difference.       \n  --textcomparison [true|false]       Default: false. Print the text comparison between two files with              \n                                      highlighting.                                                                 \n  --removespeakers [true|false]       Default: true. Remove the speaker at the start of each line in files before   \n                                      calculations. The speaker should be separated by colon \":\" i.e. speaker_name: \n                                      text For e.g. \"John Doe: Hello, I am John.\" would get converted to simply     \n                                      \"Hello, I am John.\"                                                           \n  --removeannotations [true|false]    Default: true. Remove any custom annotations in the transcript before         \n                                      calculations. This is useful when removing custom annotations done by human   \n                                      transcribers.  Anything in square brackets [] are detected as annotations.    \n                                      For e.g. \"Hello, I am [inaudible 00:12] because of few reasons.\" would get    \n                                      converted to \"Hello, I am because of few reasons.\"                            \n  --removewhitespaces [true|false]    Default: true. Remove any extra white spaces before calculations.             \n  --removequotes [true|false]         Default: true. Remove any double quotes '\"' from the files before             \n                                      calculations.                                                                 \n  --removedashes [true|false]         Default: true. Remove any dashes (hyphens) \"-\" from the files before          \n                                      calculations.                                                                 \n  --removepunctuations [true|false]   Default: true. Remove any punctuations \".,?!\" from the files before           \n                                      calculations.                                                                 \n  --lowercase [true|false]            Default: true. Convert both files to lower case before calculations. This is  \n                                      useful if evaluation needs to be done in case-insensitive way.                \n  --help [true|false]                 Print this usage guide.                                                                                   \n```\n\n## Getting help\nIf you need help installing or using the utility, please give a shout out in our [slack channel](https://symbldotai.slack.com/join/shared_invite/zt-4sic2s11-D3x496pll8UHSJ89cm78CA)\n\nIf you've instead found a bug or would like new features added, go ahead and open issues or pull requests against this repo!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsymblai%2Fspeech-recognition-evaluation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsymblai%2Fspeech-recognition-evaluation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsymblai%2Fspeech-recognition-evaluation/lists"}