{"id":15519834,"url":"https://github.com/tomaarsen/ttstextnormalization","last_synced_at":"2025-04-23T04:19:09.401Z","repository":{"id":37995977,"uuid":"230904999","full_name":"tomaarsen/TTSTextNormalization","owner":"tomaarsen","description":"Convert English text from written expressions into spoken forms","archived":false,"fork":false,"pushed_at":"2022-06-22T20:53:44.000Z","size":12543,"stargazers_count":25,"open_issues_count":0,"forks_count":3,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-23T04:19:02.854Z","etag":null,"topics":["competition","nlp","normalization","spoken-forms","text-normalization","tts"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tomaarsen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-12-30T11:36:59.000Z","updated_at":"2025-04-05T23:50:06.000Z","dependencies_parsed_at":"2022-09-08T07:23:00.483Z","dependency_job_id":null,"html_url":"https://github.com/tomaarsen/TTSTextNormalization","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomaarsen%2FTTSTextNormalization","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomaarsen%2FTTSTextNormalization/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomaarsen%2FTTSTextNormalization/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomaarsen%2FTTSTextNormalization/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tomaarsen","download_url":"https://codeload.github.com/tomaarsen/TTSTextNormalization/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250366715,"owners_count":21418772,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["competition","nlp","normalization","spoken-forms","text-normalization","tts"],"created_at":"2024-10-02T10:23:03.796Z","updated_at":"2025-04-23T04:19:09.383Z","avatar_url":"https://github.com/tomaarsen.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TTSTextNormalization\n\nThis repository houses my solution to Google's [Text Normalization Challenge - English Language](https://www.kaggle.com/c/text-normalization-challenge-english-language). Most of the magic happens within the converter directory, which is responsible for the actual conversions from input to output tokens.\nAlongside the code is a [paper](https://github.com/tomaarsen/TTSTextNormalization/blob/master/paper.pdf) written regarding my solution. The abstract for this paper is as follows:\n\n---\n\n## Abstract\nThis paper proposes a method for solving, as well as a solution to, a text-to-speech normalization problem, which focuses on converting text from written expressions into spoken forms. The method parses input tokens through a gradient boosted decision tree model, which classifies the token as one of 16 different types of tokens. The token is then converted based on the predicted token type, resulting in a normalized output of the spoken form. Upon entering a related text-to-speech normalization competition, the solution achieved an accuracy of **99.590%**, placing 12th out of the 260 teams, or within the **top 5%** of all submissions.\n\n---\n\nIn order to run any of the python files, the `data/raw` folder must contain the raw training and testing data from the competition itself. Due to the Terms and Conditions of the competition, this data cannot be shared on this repository.\n\nThis repository acts as an archive, and is not intended to be updated.\n\n---\n\n### Contributing\nI am not taking contributions for this repository, as it is designed as an archive.\n\n---\n\n### License\nThis project is licensed under the MIT License - see the LICENSE.md file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomaarsen%2Fttstextnormalization","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftomaarsen%2Fttstextnormalization","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomaarsen%2Fttstextnormalization/lists"}