Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wasertech/verified_sentences_stt
Verified sentenced for STT uses, by Data Manger.
https://github.com/wasertech/verified_sentences_stt
Last synced: 6 days ago
JSON representation
Verified sentenced for STT uses, by Data Manger.
- Host: GitHub
- URL: https://github.com/wasertech/verified_sentences_stt
- Owner: wasertech
- License: gpl-3.0
- Created: 2022-10-16T00:39:13.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-01-22T12:38:18.000Z (almost 2 years ago)
- Last Synced: 2024-10-30T06:40:40.420Z (about 2 months ago)
- Size: 1.63 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# Verified Sentences for STT
Verified sentences for STT uses, by Data Manager.
## Small sample
Using the [Training Wizard for STT](https://gitlab.com/waser-technologies/models/en/stt/-/blob/master/model.train),
Users are presented with some sentences from the NLU data.
Their role is to determine if the sentences can be used to train a language model for STT.
If a sentence is usable, then it is accepted (`true += 1`).
Else it is rejected (`false += 1`).At the end of the process users can share their results.
They look like so.
```toml
# reviews.toml["can you lookup something for me please"]
true = 1
false = 0
lang = ["en",]["answer me this"]
true = 1
false = 0
lang = ["en",][ddgr]
true = 0
false = 1
lang = ["en",]
```To find if any sentence is valid, get the boolean with the highest count.
Checkout [`reviews.toml`](reviews.toml) to see everything.
`lang` is a list of languages the sentences has been or not verified
## Usage
This data can be used to quickly train a language model for STT applications (such as [Assistant](https://gitlab.com/waser-technologies/technologies/assistant)). It can also be used to train a classifier to automate even more the process.
Using the Training Wizard, you can pull requests automatically once you are done validating your data.