https://github.com/seanghay/kfa

A fast Khmer Forced Aligner powered by Wav2Vec2CTC and Phonetisaurus
https://github.com/seanghay/kfa

alignment cambodia forced-alignment khmer wav2vec2

Last synced: 7 months ago
JSON representation

A fast Khmer Forced Aligner powered by Wav2Vec2CTC and Phonetisaurus

Host: GitHub
URL: https://github.com/seanghay/kfa
Owner: seanghay
Created: 2024-02-16T06:37:09.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-05-02T03:27:29.000Z (about 1 year ago)
Last Synced: 2024-11-05T17:55:12.562Z (8 months ago)
Topics: alignment, cambodia, forced-alignment, khmer, wav2vec2
Language: Python
Homepage: https://pypi.org/project/kfa
Size: 10.1 MB
Stars: 4
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-khmer-language - kfa

README

        ## KFA

[[Google Colab]](https://colab.research.google.com/drive/1-aRxWOzqqsL7Qbgp95dlwN-_cvI41INf?usp=sharing)

A fast Khmer Forced Aligner powered by **Wav2Vec2CTC** and **Phonetisaurus**.

- [ ] Built-in Speech Enhancement

- [x] Word-level Alignment

```shell

pip install kfa

```

#### CLI

> [!Note]

> `audio.wav` Input audio sample rate should be in 16kHz. Use ffmpeg or any other tools to resample the audio before processing.

>

> `ffmpeg -i audio_orig.wav -ac 1 -ar 16000 audio.wav`

```shell

kfa -a audio.wav -t text.txt -o alignments.jsonl

# Output as Whisper style JSON format

kfa -a audio.wav -t text.txt --format whisper -o alignments.json

```

#### Python

```python

from kfa import align, create_session

import librosa

with open("test.txt") as infile:

    text = infile.read()

y, sr = librosa.load("text.wav", sr=16000, mono=True)

session = create_session()

for alignment in align(y, sr, text, session=session):

  print(alignment)

```

#### References

- [MMS: Scaling Speech Technology to 1000+ languages](https://github.com/facebookresearch/fairseq/tree/main/examples/mms)

- [CTC FORCED ALIGNMENT API TUTORIAL](https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html)

- [Phonetisaurus](https://github.com/AdolfVonKleist/Phonetisaurus)

- [Fine-Tune Wav2Vec2 for English ASR with 🤗 Transformers](https://huggingface.co/blog/fine-tune-wav2vec2-english)

- [Thai Wav2vec2 model to ONNX model](https://pythainlp.github.io/tutorials/notebooks/thai_wav2vec2_onnx.html)

#### License

`Apache-2.0`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/seanghay/kfa

Awesome Lists containing this project

README