{"id":20703490,"url":"https://github.com/frederickroman/syllable-count-predictor","last_synced_at":"2026-04-27T04:31:48.852Z","repository":{"id":112471952,"uuid":"458481375","full_name":"FrederickRoman/syllable-count-predictor","owner":"FrederickRoman","description":"Neural network model that predicts the number of syllables in an English word. It shows its creation end-to-end: from data collection to evaluation of various models. One of the explored models is used in the Readgauge app.","archived":false,"fork":false,"pushed_at":"2022-02-18T06:13:08.000Z","size":30872,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-26T16:37:39.740Z","etag":null,"topics":["blstm","blstm-neural-networks","linguistics","neural-network","nlp","nlp-machine-learning","nltk","phonetics","syllable-count","tensorflow","tensorflow2","text-classification"],"latest_commit_sha":null,"homepage":"https://readscale.netlify.app","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FrederickRoman.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-12T09:53:28.000Z","updated_at":"2025-03-20T13:13:54.000Z","dependencies_parsed_at":"2023-05-15T06:45:20.292Z","dependency_job_id":null,"html_url":"https://github.com/FrederickRoman/syllable-count-predictor","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/FrederickRoman/syllable-count-predictor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrederickRoman%2Fsyllable-count-predictor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrederickRoman%2Fsyllable-count-predictor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrederickRoman%2Fsyllable-count-predictor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrederickRoman%2Fsyllable-count-predictor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FrederickRoman","download_url":"https://codeload.github.com/FrederickRoman/syllable-count-predictor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrederickRoman%2Fsyllable-count-predictor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32323212,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T23:26:28.701Z","status":"online","status_checked_at":"2026-04-27T02:00:06.769Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["blstm","blstm-neural-networks","linguistics","neural-network","nlp","nlp-machine-learning","nltk","phonetics","syllable-count","tensorflow","tensorflow2","text-classification"],"created_at":"2024-11-17T01:08:11.912Z","updated_at":"2026-04-27T04:31:48.833Z","avatar_url":"https://github.com/FrederickRoman.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# syllable-count-predictor\nNeural network model that predicts the number of syllables in an English word. It shows its creation end-to-end: from data collection to evaluation of various models. This is the model design followed as part of the making of the reading level scoring app [Readgauge](https://readscale.netlify.app).\n\n\u003cfigure style=\"display:flex;justify-content:center;align-items:center;\" \u003e\n\u003cimg src=\"https://github.com/FrederickRoman/syllable-count-predictor/blob/main/docs/img/readgauge-app-syllable-pred-demo.png\" alt=\"Readgauge logo\" height=\"400\"/\u003e\n  \u003cfiguercaption\u003eScreenshot from \u003ca href=\"https://readscale.netlify.app/about\"\u003ereadgauge/about\u003c/a\u003e\n\u003c/figure\u003e\n\n## Getting Started\n\nThis repo has both the data and the code to run the models. All you need to do is to meet the prerequisites. \n\n### Prerequisites\n\nPython\u003e=3.8.6\n\n```\nnltk\npandas \nnumpy \ntensorflow \n```\n### Preprocessing\n#### Syllable count dictionary creation\nRun the jupyter notebook cells in train.ipynb under [/preprocess/syllable_count_dict_creation](https://github.com/FrederickRoman/syllable-count-predictor/blob/main/ML/preprocess/syllable_count_dict_creation/syllable_count_dict_creation.ipynb)\n#### Synthetic syllable count dictionary creation (for data augmentation)\n\n```\npython ./ML/preprocess/data_synthesizer/data_synthesizer.py \n```\n\n### Training\nRun the jupyter notebook cells in train.ipynb under [training/feedforward](https://github.com/FrederickRoman/syllable-count-predictor/blob/main/ML/training/feedforward/ff_on_natural_data/train.ipynb) or under [training/blstm](https://github.com/FrederickRoman/syllable-count-predictor/blob/main/ML/training/blstm/blstm_on_natural_data/train.ipynb).\n\n\n## External deployment (not on this repo)\n\nThese model were trained to find one to be integrated to to the [Readgauge](https://readscale.netlify.app) client-side web app. It runs live [here](https://readscale.netlify.app) and its repository is [here](https://github.com/FrederickRoman/Readgauge).\n\n\u003cdiv style=\"display:flex; flex-direction:column;\"\u003e\n\u003cimg src=\"https://github.com/FrederickRoman/Readgauge/blob/main/public/android-chrome-512x512.png\" alt=\"Readgauge logo\" height=\"200\"/\u003e\n\u003cimg src=\"https://github.com/FrederickRoman/Readgauge/blob/main/docs/mockups/Home_Nest%20Hub.png\" height=\"200\" alt=\"Results mockup\"/\u003e\n\u003c/div\u003e\n\n### Data source\n\nThe syllableCountDict dataset contains the syllable count of each word\n\nIt was created using [nltk's built-in CMU dictionary](https://www.nltk.org/_modules/nltk/corpus/reader/cmudict.html).\n\nThe Carnegie Mellon Pronouncing Dictionary [cmudict.0.6]\nCopyright 1998 Carnegie Mellon University\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrederickroman%2Fsyllable-count-predictor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffrederickroman%2Fsyllable-count-predictor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrederickroman%2Fsyllable-count-predictor/lists"}