https://github.com/revdotcom/fstalign
An efficient OpenFST-based tool for calculating WER and aligning two transcript sequences.
https://github.com/revdotcom/fstalign
alignment asr-benchmark fst speech-recognition word-error-rate
Last synced: 6 months ago
JSON representation
An efficient OpenFST-based tool for calculating WER and aligning two transcript sequences.
- Host: GitHub
- URL: https://github.com/revdotcom/fstalign
- Owner: revdotcom
- License: apache-2.0
- Created: 2021-03-24T16:23:11.000Z (over 4 years ago)
- Default Branch: develop
- Last Pushed: 2025-01-27T19:37:30.000Z (8 months ago)
- Last Synced: 2025-03-28T14:09:19.289Z (6 months ago)
- Topics: alignment, asr-benchmark, fst, speech-recognition, word-error-rate
- Language: C++
- Homepage:
- Size: 339 KB
- Stars: 166
- Watchers: 35
- Forks: 9
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README

[](https://opensource.org/licenses/Apache-2.0)# fstalign
- [Overview](#Overview)
- [Installation](#Installation)
* [Dependencies](#Dependencies)
* [Build](#Build)
* [Docker](#Docker)
- [Documentation](#Documentation)## Overview
`fstalign` is a tool for creating alignment between two sequences of tokens (here out referred to as “reference” and “hypothesis”). It has two key functions: computing word error rate (WER) and aligning [NLP-formatted](https://github.com/revdotcom/fstalign/blob/develop/docs/NLP-Format.md) references with CTM hypotheses.Due to its use of OpenFST and lazy algorithms for text-based alignment, `fstalign` is efficient for calculating WER while also providing significant flexibility for different measurement features and error analysis.
## Installation
### Dependencies
We use git submodules to manage third-party dependencies. Initialize and update submodules before proceeding to the main build steps.
```
git submodule update --init --recursive
```This will pull the current dependencies:
- catch2 - for unit testing
- spdlog - for logging
- CLI11 - for CLI construction
- csv - for CTM and NLP input parsing
- jsoncpp - for JSON output construction
- strtk - for various string utilitiesAdditionally, we have dependencies outside of the third-party submodules:
- OpenFST - currently provided to the build system by settings the $OPENFST_ROOT environment variable or during the CMake command via `-DOPENFST_ROOT`.### Build
The current build framework is CMake. Install CMake following the instructions here (https://cmake.org/install/).To build fstalign, run:
```
mkdir build && cd build
cmake .. -DOPENFST_ROOT="" -DDYNAMIC_OPENFST=ON
make
```Note: `-DDYNAMIC_OPENFST=ON` is needed if OpenFST at `OPENFST_ROOT` is compiled as shared libraries. Otherwise static libraries are assumed.
Finally, tests can be run using:
```
make test
```### Docker
The fstalign docker image is hosted on Docker Hub and can be easily pulled and run:
```
docker pull revdotcom/fstalign
docker run --rm -it revdotcom/fstalign
```See https://hub.docker.com/r/revdotcom/fstalign/tags for the available versions/tags to pull. If you desire to run the tool on local files you can mount local directories with the `-v` flag of the `docker run` command.
From inside the container:
```
/fstalign/build/fstalign --help
```For development you can also build the docker image locally using:
```
docker build . -t fstalign-dev
```## Documentation
For more information on how to use `fstalign` see our [documentation](https://github.com/revdotcom/fstalign/blob/develop/docs/Usage.md) for more details.