Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bbc/stt-align-node
node version of stt-align https://github.com/bbc/stt-align by Chris Baume - R&D.
https://github.com/bbc/stt-align-node
alignement labs news-labs newslabs re-alignement stt
Last synced: 10 days ago
JSON representation
node version of stt-align https://github.com/bbc/stt-align by Chris Baume - R&D.
- Host: GitHub
- URL: https://github.com/bbc/stt-align-node
- Owner: bbc
- License: mit
- Created: 2018-12-10T18:15:22.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-07-18T20:39:38.000Z (over 1 year ago)
- Last Synced: 2024-04-08T21:02:56.889Z (7 months ago)
- Topics: alignement, labs, news-labs, newslabs, re-alignement, stt
- Language: JavaScript
- Homepage: https://bbc.github.io/stt-align-node
- Size: 1.59 MB
- Stars: 12
- Watchers: 17
- Forks: 5
- Open Issues: 26
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Stt-align-node
See [The alignment problem](./docs/the-alignment-problem.md) in the docs for more background of the problem this module set out to address.
Originally developed as a node version of python's [stt-align](https://github.com/bbc/stt-align) by Chris Baume - BBC R&D.
## Setup - development```
git clone [email protected]:bbc/stt-align-node.git
``````
cd stt-align-node
``````
npm install
```## Setup - in production
```
npm install @bbc/stt-align-node
```
---
## Usage
Other then to realign STT results with accurate text, this modules can also be used to perform related oprations in the same domain, such as benchmarking STT.
|Function| Description | type|
|:------|------|----|
|`alignSTT`|Realign STT json with accurate text. by transposing words from accurate text to timecodes of STT. | `json`|
|`diffsList`|return a diff json of STT vs accurate text | `json`|
|`diffsListAsHtml`|return a diff of STT vs accurate text as HTML| `html`|
|`diffsCount`|return a diff of STT vs accurate text as HTML| `json`|
|`calculateWordDuration`|return a diff of STT vs accurate text as HTML| `Number`|See [See `README` in `example-usage` folder](./example-usage/README.md) as well as [code examples](./example-usage) for more.
---
## System Architecture
Node version of [stt-align](https://github.com/bbc/stt-align) by Chris Baume - R&D.
In _pseudo code_ overview of `alignSTT`:
- input, output as described in the example usage.
- Accurate base text transcription, string.
- Array of word objects transcription from STT service.- Align words
- normalize words, by removing capitalization and punctuation and converting numbers to letters
- generate array list of words from base text, and array list of words from stt transcript.
- get [opcodes](https://docs.python.org/2/library/difflib.html#difflib.SequenceMatcher.get_opcodes) using `difflib` comparing two arrays
- for equal matches, add matched STT word objects segment to results array base text index position.
- Then iterate to result array to replace STT word objects text with words from base text- interpolate missing words
- calculates missing timecodes
- first optimization
- using neighboring words to do a first pass at setting missing start and end time when present
- Then Missing word timings are interpolated using interpolation library [`'everpolate`](http://borischumichev.github.io/everpolate/#linear).## Development env
- node `10`
- npm `6.1.0`
## Build
```
npm run build
```bundles the code with react, into a `./build` folder.
## build demo
```
npm run build:demo
```
Demo is in docs folderPublish demo to github pages
```
npm run deploy:ghpages
```## Tests
```
npm run test:watch
```- [ ] add more tests
## Deployment
Deploy to npm
```
npm run publish:public
```