Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/narVidhai/Speech-Transcription-Benchmarking

Example python scripts to evaluate various ASR methods
https://github.com/narVidhai/Speech-Transcription-Benchmarking

aws-transcribe google-speech-recognition python-speechrecognition speech-api speech-recognition speech-recognizer speech-to-text speech2text temi

Last synced: 24 days ago
JSON representation

Example python scripts to evaluate various ASR methods

Host: GitHub
URL: https://github.com/narVidhai/Speech-Transcription-Benchmarking
Owner: narVidhai
License: mit
Created: 2020-06-16T10:54:28.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2021-12-22T04:13:06.000Z (about 3 years ago)
Last Synced: 2024-08-09T13:13:39.991Z (5 months ago)
Topics: aws-transcribe, google-speech-recognition, python-speechrecognition, speech-api, speech-recognition, speech-recognizer, speech-to-text, speech2text, temi
Language: Python
Homepage:
Size: 19.5 KB
Stars: 12
Watchers: 4
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Code for Speech-To-Text Online APIs - Python Scripts

## Bulk Audio Transcription using Speech Transcription Services

### Initial Steps

0. Ensure you have Python 3
1. Clone this repo and `cd` into it.
2. Go into each folder for specific services and check the `README` files

### Supported Services

- [Google Speech-To-Text](Google-Speech2Text/)
- [AWS Transcribe](AWS-Transcribe/)
- [Microsoft Cognitive Service](Azure-Cognitive-Service/)
- [Rev.ai (Temi Speech) API](RevAI-Temi-API/)

Please feel free to contribute for other online speech transcriptions that you are aware of.

### Input format

Ensure you have a folder of audio files in `WAV` format. (For example, `wav_folder`)

Example:
```
wav_folder
├───000561a49624c7c56625e6d8ccd230b15d3f129083b84c19846a9593.wav
├───0005680e7ac8826cff24f15022b67a2651acd691bf897bf3d3e44345.wav
├───000568340cb01e73daaa263d90765a5c213160a75201d642d899b4df.wav
├───00057062512f8dbc62d1691b97d0e6d997f350f41c908956fec02dbd.wav
├───00057091dd6ea751089e57358095034164067c180c4d1730254924ac.wav
├───000574e671847cbc40ef7fa325f39bfb6338a7f7781e09e773702b41.wav
...
```

### Output format

Each script will dump the transcriptions in the specified output folder in the following format:

Example:
```
output_txt_folder
├───000561a49624c7c56625e6d8ccd230b15d3f129083b84c19846a9593.txt
├───0005680e7ac8826cff24f15022b67a2651acd691bf897bf3d3e44345.txt
├───000568340cb01e73daaa263d90765a5c213160a75201d642d899b4df.txt
├───00057062512f8dbc62d1691b97d0e6d997f350f41c908956fec02dbd.txt
├───00057091dd6ea751089e57358095034164067c180c4d1730254924ac.txt
├───000574e671847cbc40ef7fa325f39bfb6338a7f7781e09e773702b41.txt
...
```

### Measuring quality

If you have ground truth in the same format as the output folder described above, you can calculate the `Word Error Rate` (WER) as follows:

0. Prerequisite: `pip install jiwer==2.2.0`
1. Set the ground truth and prediction folders in the last line of `calc_wer.py`
2. Run `python calc_wer.py`

### DL Models

If say suppose you want to compare the online transcription output to the output of your deep learning models, it's easy!

**We follow the format of LibriSpeech dataset in this repo**.
So ensure you dump the output in that format (same as this repo format) and use the `calc_wer.py` script to compare the quality.

For example purposes, we have supported the following DL models:
- [ESPnet](ESPNet-Model-Inference/)

Please feel free to contribute for other DL models that you are aware of.

Any pull requests or issues for bugs or fixes or new features are warmly welcomed. :-)

### Alternatives

You can also check the following Python Libraries for more services:

- [SpeechRecognition](https://pypi.org/project/SpeechRecognition/)