https://github.com/coqui-ai/open-bible-scripts
scipts for working with open.bible data
https://github.com/coqui-ai/open-bible-scripts
Last synced: 6 months ago
JSON representation
scipts for working with open.bible data
- Host: GitHub
- URL: https://github.com/coqui-ai/open-bible-scripts
- Owner: coqui-ai
- License: apache-2.0
- Created: 2021-11-22T06:50:11.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-01-24T06:02:26.000Z (over 3 years ago)
- Last Synced: 2025-04-02T17:11:11.527Z (7 months ago)
- Language: Shell
- Size: 71.3 KB
- Stars: 24
- Watchers: 6
- Forks: 12
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Align [`Open.Bible`](https://open.bible/) data
|Language|Passing|Failing|Unknown|Notes|Aligned Sample|
|---------|------|-------|-------|----|-------|
|Yoruba|💚||||[Psalm 119](https://coqui-ai-public-data.s3.amazonaws.com/psalm-119-yoruba.tar.gz)|
|Ewe|💚||||[Psalm 119](https://coqui-ai-public-data.s3.amazonaws.com/ewe-psalm119-coqui-dec11.tar.gz)|
|Lingala|💚||||[Psalm 119](https://coqui-ai-public-data.s3.amazonaws.com/lingala-coqui-psalm119-dec16.tar.gz)|
|Asante Twi|💚|||||
|Akuapem Twi|💚|||||
|Chichewa|❤️🩹|||Passing with bad alignments|[Psalm 119](https://coqui-ai-public-data.s3.amazonaws.com/chichewa-coqui-PSA_119.tar.gz)|
|Hausa||💔||||
|Luo||💔||||
|Luganda||💔||||
|Kikuyu||💔||||
|Arabic|||❓|||
|Kurdi Sorani|||❓|||
|Polish|||❓|||
|Vietnamese|||❓|||### Clone this repo
```
$ git clone https://github.com/coqui-ai/open-bible-scripts.git
```## Alignment Approach 1: Use the Montreal Forced Aligner
The first alignment approach is to use MFA to align and train a new acoustic model from stratch.
### Dependencies
You need to install a couple things on your own:
[`gnu-parallel`](https://www.gnu.org/software/parallel/)
[`covo`](https://www.github.com/ftyers/commonvoice-utils)### Start with the run script for pre-processing
Use the language name as defined in `open-bible-scripts/data/*.txt`. Use the language code as expected by [covo](https://www.github.com/ftyers/commonvoice-utils).
E.g., for Yoruba use `yoruba` and `yo`, for Ewe use `ewe` and `ee`, for Luganda `luganda` and `lg`, and so on.
```
$ cd open-bible-scripts
open-bible-scripts$ ./run-pre-alignment.sh yoruba yo
```### Generate alignments with [`mfa train`](https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/workflows/train_acoustic_model.html)
```
$ docker run -it --mount "type=bind,src=/home/ubuntu/open-bible-scripts,dst=/mnt" mmcauliffe/montreal-forced-aligner
(base) root@d8095c794d5f:/# conda activate aligner
(aligner) root@d8095c794d5f:/# mfa train --clean --num_jobs `nproc` --temp_directory /mnt/yoruba/data/mfa-tmp-dir --config_path /mnt/MFA_CONFIG /mnt/yoruba/data /mnt/yoruba/dict.txt /mnt/yoruba/data/mfa-output &> /mnt/yoruba/data/LOG &# At this point, alignment will take a while,
# so you might want to detach from the docker container
# with `Ctrl-P followed by Ctrl-Q`
```### Finish with the run script for post-processing
Use the language name as defined in `open-bible-scripts/data/*.txt`.
E.g., for Yoruba use `yoruba`, for Ewe use `ewe`, for Luganda `luganda`, and so on.
```
$ cd open-bible-scripts
open-bible-scripts$ ./run-post-alignment.sh yoruba yo
```## Alignment Approach 2: Use timing files from Biblica
This works for only Lingala, Akuapem Twi, and Asante Twi.
### Split using timing file
Install sox on your OS. See linux installation below
```bash
sudo apt-get install sox
sudo apt-get install libsox-fmt-mp3
sox --version
python3 -m venv venv
source venv/bin/activate
pip install -U pip
pip install pandas
```Execute the `run-biblica-splits-*.sh` script from the root dir, for example with Lingala:
```bash
./run-biblica-splits-lingala.sh
```