{"id":18377038,"url":"https://github.com/bbc/stt-align-node","last_synced_at":"2025-04-06T20:32:08.260Z","repository":{"id":41780377,"uuid":"161217311","full_name":"bbc/stt-align-node","owner":"bbc","description":"node version of stt-align https://github.com/bbc/stt-align by Chris Baume - R\u0026D.","archived":false,"fork":false,"pushed_at":"2023-07-18T20:39:38.000Z","size":1667,"stargazers_count":12,"open_issues_count":26,"forks_count":5,"subscribers_count":17,"default_branch":"master","last_synced_at":"2024-04-08T21:02:56.889Z","etag":null,"topics":["alignement","labs","news-labs","newslabs","re-alignement","stt"],"latest_commit_sha":null,"homepage":"https://bbc.github.io/stt-align-node","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bbc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-12-10T18:15:22.000Z","updated_at":"2023-11-02T16:01:36.000Z","dependencies_parsed_at":"2022-08-11T16:51:50.568Z","dependency_job_id":null,"html_url":"https://github.com/bbc/stt-align-node","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbc%2Fstt-align-node","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbc%2Fstt-align-node/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbc%2Fstt-align-node/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbc%2Fstt-align-node/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bbc","download_url":"https://codeload.github.com/bbc/stt-align-node/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223263843,"owners_count":17115993,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignement","labs","news-labs","newslabs","re-alignement","stt"],"created_at":"2024-11-06T00:25:59.534Z","updated_at":"2024-11-06T00:26:00.214Z","avatar_url":"https://github.com/bbc.png","language":"JavaScript","readme":"# Stt-align-node\n\n\u003c!-- _One liner + link to confluence page_  _Screenshot of UI - optional_ --\u003e\n\nSee [The alignment problem](./docs/the-alignment-problem.md) in the docs for more background of the problem this module set out to address.\n\nOriginally developed as a node version of python's [stt-align](https://github.com/bbc/stt-align) by Chris Baume - BBC R\u0026D.\n \n## Setup - development\n\n```\ngit clone git@github.com:bbc/stt-align-node.git\n```\n\n```\ncd stt-align-node\n```\n\n```\nnpm install\n```\n\n## Setup - in production\n\n```\nnpm install @bbc/stt-align-node\n```\n \n\n---\n\n## Usage\n\n\nOther then to realign STT results with accurate text, this modules can also be used to perform related oprations in the same domain, such as benchmarking STT.\n\n|Function| Description | type|\n|:------|------|----|\n|`alignSTT`|Realign STT json with accurate text. by transposing words from accurate text to timecodes of STT. | `json`|\n|`diffsList`|return a diff json of STT  vs accurate text | `json`|\n|`diffsListAsHtml`|return a diff of STT  vs accurate text as HTML| `html`|\n|`diffsCount`|return a diff of STT  vs accurate text as HTML| `json`|\n|`calculateWordDuration`|return a diff of STT  vs accurate text as HTML| `Number`|\n\n\nSee [See `README` in `example-usage` folder](./example-usage/README.md) as well as [code examples](./example-usage) for more.\n\n---\n\n## System Architecture\n\u003c!-- _High level overview of system architecture_ --\u003e\n\nNode version of [stt-align](https://github.com/bbc/stt-align) by Chris Baume - R\u0026D.\n\nIn _pseudo code_ overview of `alignSTT`:\n\n- input, output as described in the example usage. \n    - Accurate base text transcription, string.\n    - Array of word objects transcription from STT service.\n\n- Align words\n    - normalize words, by removing capitalization and punctuation and converting numbers to letters\n    - generate array list of words from base text, and array list of words from stt transcript. \n        - get [opcodes](https://docs.python.org/2/library/difflib.html#difflib.SequenceMatcher.get_opcodes)  using `difflib` comparing two arrays\n        - for equal matches, add matched STT word objects segment to results array base text index position.\n        - Then iterate to result array to replace STT word objects text with words from base text  \n\n    - interpolate missing words\n        - calculates missing timecodes\n        - first optimization \n            -  using neighboring words to do a first pass at setting missing start and end time when present \n        - Then Missing word timings are interpolated using interpolation library [`'everpolate`](http://borischumichev.github.io/everpolate/#linear).\n\n\n\n## Development env\n \u003c!-- _How to run the development environment_\n_Coding style convention ref optional, eg which linter to use_\n_Linting, github pre-push hook - optional_ --\u003e\n\n- node `10`\n- npm `6.1.0`\n \n\n## Build\n\n```\nnpm run build\n```\n\nbundles the code with react, into a `./build` folder.\n\n\n## build demo\n\n```\nnpm run build:demo\n```\nDemo is in docs folder \n\nPublish demo to github pages \n\n```\nnpm run deploy:ghpages\n```\n\n## Tests\n\n```\nnpm run test:watch\n```\n\n- [ ] add more tests \n\n## Deployment\n\n\u003c!-- _How to deploy the code/app into test/staging/production_ --\u003e\n\nDeploy to npm \n\n```\nnpm run publish:public\n```\n\n\u003c!-- TODOs:\n\n- [ ] Clean up repository\n- [ ] change baseText and sttText mentions to be `referenceText` and `hypothesisText`\n- [ ] add linting \n- [x] add babel(?)\n- [ ] change if else to be switch statments\n --\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbbc%2Fstt-align-node","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbbc%2Fstt-align-node","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbbc%2Fstt-align-node/lists"}