Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tomaarsen/ttstextnormalization
Convert English text from written expressions into spoken forms
https://github.com/tomaarsen/ttstextnormalization
competition nlp normalization spoken-forms text-normalization tts
Last synced: 2 months ago
JSON representation
Convert English text from written expressions into spoken forms
- Host: GitHub
- URL: https://github.com/tomaarsen/ttstextnormalization
- Owner: tomaarsen
- License: mit
- Created: 2019-12-30T11:36:59.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2022-06-22T20:53:44.000Z (over 2 years ago)
- Last Synced: 2024-10-26T23:38:33.955Z (3 months ago)
- Topics: competition, nlp, normalization, spoken-forms, text-normalization, tts
- Language: Python
- Size: 12 MB
- Stars: 21
- Watchers: 5
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# TTSTextNormalization
This repository houses my solution to Google's [Text Normalization Challenge - English Language](https://www.kaggle.com/c/text-normalization-challenge-english-language). Most of the magic happens within the converter directory, which is responsible for the actual conversions from input to output tokens.
Alongside the code is a [paper](https://github.com/tomaarsen/TTSTextNormalization/blob/master/paper.pdf) written regarding my solution. The abstract for this paper is as follows:---
## Abstract
This paper proposes a method for solving, as well as a solution to, a text-to-speech normalization problem, which focuses on converting text from written expressions into spoken forms. The method parses input tokens through a gradient boosted decision tree model, which classifies the token as one of 16 different types of tokens. The token is then converted based on the predicted token type, resulting in a normalized output of the spoken form. Upon entering a related text-to-speech normalization competition, the solution achieved an accuracy of **99.590%**, placing 12th out of the 260 teams, or within the **top 5%** of all submissions.---
In order to run any of the python files, the `data/raw` folder must contain the raw training and testing data from the competition itself. Due to the Terms and Conditions of the competition, this data cannot be shared on this repository.
This repository acts as an archive, and is not intended to be updated.
---
### Contributing
I am not taking contributions for this repository, as it is designed as an archive.---
### License
This project is licensed under the MIT License - see the LICENSE.md file for details.