https://github.com/viig99/esolafast
Fast C++ implementation of ESOLA using KFRLib, can be used for online time-stretch augmentation during SpeechToText training.
https://github.com/viig99/esolafast
asr esola kfr pybind11 python-bindings speech speech-augmentation speech-processing speech-recognition speech-to-text time-stretch
Last synced: about 1 month ago
JSON representation
Fast C++ implementation of ESOLA using KFRLib, can be used for online time-stretch augmentation during SpeechToText training.
- Host: GitHub
- URL: https://github.com/viig99/esolafast
- Owner: viig99
- License: mit
- Created: 2020-07-13T15:01:24.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-07-25T19:16:31.000Z (almost 5 years ago)
- Last Synced: 2025-04-04T17:04:20.824Z (about 2 months ago)
- Topics: asr, esola, kfr, pybind11, python-bindings, speech, speech-augmentation, speech-processing, speech-recognition, speech-to-text, time-stretch
- Language: C++
- Homepage:
- Size: 31.3 KB
- Stars: 15
- Watchers: 4
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Epoch-Synchronous Overlap-Add (ESOLA)
Fast C++ implementation of ESOLA using KFRLib, can be used for online time-stretch augmentation during SpeechToText training.## C++ Rewrite
Mostly an C++ re-write of https://github.com/BaronVladziu/ESOLA-Implementation to be used in Online SpeechToText training.## Build
```$xslt
git clone https://github.com/viig99/esolafast.git
cd esolafast
git submodule update --init --recursive
mkdir build && cd build
cmake ..
make -j`nproc`
```## Run
```$xslt
./esolafast -i INPUT_PATH -o OUTPUT_PATH -t 1.5
./esolafast --help
```## Performance
Right now faster than sox, sound-stretch & rubberband, CPU usage is much higher. Quality is better than rubberband, same as sox & sound-stretch.
```$xslt
$ time sox sample_file.wav sox_1_5.wav tempo 1.5
sox sox_1_5.wav tempo 2.0 0.01s user 0.01s system 17% cpu 0.022 total$ time soundstretch sample_file.wav st_1_5.wav -tempo=1.5
soundstretch st_1_5.wav -tempo=1.5 0.01s user 0.00s system 44% cpu 0.022 total$ time esolafast -i sample_file.wav -o es_1_5.wav -t 1.5
esolafast -i -o -t 1.5 0.01s user 0.00s system 92% cpu 0.010 total$ time rubberband -q -T 1.5 sample_file.wav rb_1_5.wav
rubberband -q -T 1.5 rb_1_5.wav 0.01s user 0.00s system 89% cpu 0.022 total
```## Python Bindings
Generate the python bindings, using pybind11
```asm
$ python setup.py build
```For example using the python binding check `examples/test.py`
## References
* [Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals
](https://arxiv.org/abs/1801.06492)
* [Epoch Extraction From Speech Signals](https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=6D94C490DA889017DE4362D322E1A23C?doi=10.1.1.586.7214&rep=rep1&type=pdf)