https://github.com/agrover112/goodness-of-pronunciation-pipelines-for-oov-problem
Goodness of Pronunciation Pipelines for OOV Removal
https://github.com/agrover112/goodness-of-pronunciation-pipelines-for-oov-problem
asr hidden-markov-model kaldi kaldi-asr lexicon-based oov speech speech-recognition
Last synced: about 1 month ago
JSON representation
Goodness of Pronunciation Pipelines for OOV Removal
- Host: GitHub
- URL: https://github.com/agrover112/goodness-of-pronunciation-pipelines-for-oov-problem
- Owner: Agrover112
- Created: 2022-05-25T17:29:42.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-01-01T09:22:45.000Z (over 2 years ago)
- Last Synced: 2025-04-19T09:28:10.654Z (about 2 months ago)
- Topics: asr, hidden-markov-model, kaldi, kaldi-asr, lexicon-based, oov, speech, speech-recognition
- Language: Perl
- Homepage:
- Size: 1.61 MB
- Stars: 9
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Citation: CITATION.cff
Awesome Lists containing this project
README
# Goodness of Pronunciation Pipelines for OOV Problem
[](https://doi.org/10.5281/zenodo.7078841)A proposed pipeline for GoP computation solving OOV problem at Testing time using Vocab/Lexicon expansion techniques.
We also provide utilities for extracting the Phoneme posteriors , Word boundaries(alignments), using GoP scores as vectors.
The detailed explanation of the pipelines and methods are described in this [report](https://arxiv.org/abs/2209.03787).
## Main Files
- `run1File.sh` : Computes Forced Alignments.
- `run1File_posterior.sh` : Computation of Acoustic model Posterior Probabilities.
- `runAllFiles.sh : `Calls necessary files for computation of posteriors and alignment ,GoP computation.
- `online_computation.sh:` Responsible for the Online/Hybrid pipelne of GoP computation.
- `conf/: ` Configuration files for MFCC, i-vector extractors,etc.
- `get_failed_entries.sh` : File generates a Lexicon from a text file or dir of text files, along with list of failed entries if any.
- `Goodness-of-pronounciation/prop_gop_eqn.py` : Contains Python code which calculates GoP scores using posterior and alignment inputs. (Refer to comments in my fork for detailed understanding).The entire `data` and `exp` and `lab` folders can be found [here](https://drive.google.com/drive/folders/1-q1a-jv-dhJdn8KTRqWmxW3wF0e-V0sT?usp=sharing).
## Utils:
- `get_ctm.sh` : Get the phone level conversation time mapping files
- `get_word_ctm.sh`: Get the word level conversation time mapping files.
- `get_time.sh` : Get the times for ctm files.
- `collect_transcripts.sh` : Collect and place transcripts from sub-dirs to one file.
- `find_oov.sh `: A file to find the OOV occurences from 2 databases.
- `append_vocab.sh` : Append OOV lexicon entries to original Lexicon
- `temp_q.sh` : File for pre-processing text.
- `dict.sh`: A modified utils/prepare_dict.sh for Lexicon generationEntry-point for running the entire pipeline
```bash
./get_acoustic_metrics.sh wav_file_dir_path transcript_file_dir_path output_folder path
```## Outputs:
- `gop/`: Contains GoP scored outputs and phone level posteriors (ID_gop_phone_posteriors.txt).
- `lab/posteriors/`: Contains ID_posterior_infile.ark from `nnet-compute` and ID_phone_posteriors.ark are posteriors in different format than gop posteriors.
- `lab/`: Contains Forced Alignments outputs(phone level and word level .ctm files) ID_word_.ctm and ID_alignment_infile.txt
# Citation
Please cite the both the sources (Arxiv + Zenodo) if any/all of the code is used in your respective research work.
```
@software{Ankit_Goodness-of-Pronunciation-Pipelines-for-OOV-Problem_2022,
author = {Ankit, Ankit},
doi = {10.5281/zenodo.7078826.},
month = {9},
title = {{Goodness-of-Pronunciation-Pipelines-for-OOV-Problem}},
version = {new},
year = {2022}
}
```
```
@misc{https://doi.org/10.48550/arxiv.2209.03787,
doi = {10.48550/ARXIV.2209.03787}
url = {https://arxiv.org/abs/2209.03787},
author = {Grover, Ankit},
keywords = {Computation and Language (cs.CL), Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering},
title = {Goodness of Pronunciation Pipelines for OOV Problem},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International}
}
```