Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mozilla/DSAlign
DeepSpeech based forced alignment tool
https://github.com/mozilla/DSAlign
deepspeech forced-alignment
Last synced: about 1 month ago
JSON representation
DeepSpeech based forced alignment tool
- Host: GitHub
- URL: https://github.com/mozilla/DSAlign
- Owner: mozilla
- License: mpl-2.0
- Created: 2019-06-20T14:31:02.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-12-12T08:45:52.000Z (about 4 years ago)
- Last Synced: 2024-11-18T21:42:00.463Z (about 2 months ago)
- Topics: deepspeech, forced-alignment
- Language: Python
- Size: 229 KB
- Stars: 234
- Watchers: 22
- Forks: 33
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# DSAlign
DeepSpeech based forced alignment tool## Installation
It is recommended to use this tool from within a virtual environment.
After cloning and changing to the root of the project,
there is a script for creating one with all requirements in the git-ignored dir `venv`:```shell script
$ bin/createenv.sh
$ ls venv
bin include lib lib64 pyvenv.cfg share
````bin/align.sh` will automatically use it.
Internally DSAlign uses the [DeepSpeech](https://github.com/mozilla/DeepSpeech/) STT engine.
For it to be able to function, it requires a couple of files that are specific to
the language of the speech data you want to align.
If you want to align English, there is already a helper script that will download and prepare
all required data:```shell script
$ bin/getmodel.sh
[...]
$ ls models/en/
alphabet.txt lm.binary output_graph.pb output_graph.pbmm output_graph.tflite trie
```## Overview and documentation
A typical application of the aligner is done in three phases:
1. __Preparing__ the data. Albeit most of this has to be done individually,
there are some [tools for data preparation, statistics and maintenance](doc/tools.md).
All involved file formats are described [here](doc/files.md).
2. __Aligning__ the data using [the alignment tool and it algorithm](doc/algo.md).
3. __Exporting__ aligned data using [the data-set exporter](doc/export.md).## Quickstart example
### Example data
There is a script for downloading and preparing some public domain speech and transcript data.
It requires `ffmpeg` for some sample conversion.```shell script
$ bin/gettestdata.sh
$ ls data
test1 test2
```### Alignment using example data
Now the aligner can be called either "manually" (specifying all involved files directly):
```shell script
$ bin/align.sh --audio data/test1/audio.wav --script data/test1/transcript.txt --aligned data/test1/aligned.json --tlog data/test1/transcript.log
```Or "automatically" by specifying a so-called catalog file that bundles all involved paths:
```shell script
$ bin/align.sh --catalog data/test1.catalog
```