https://github.com/frederickroman/syllable-count-predictor
Neural network model that predicts the number of syllables in an English word. It shows its creation end-to-end: from data collection to evaluation of various models. One of the explored models is used in the Readgauge app.
https://github.com/frederickroman/syllable-count-predictor
blstm blstm-neural-networks linguistics neural-network nlp nlp-machine-learning nltk phonetics syllable-count tensorflow tensorflow2 text-classification
Last synced: 2 months ago
JSON representation
Neural network model that predicts the number of syllables in an English word. It shows its creation end-to-end: from data collection to evaluation of various models. One of the explored models is used in the Readgauge app.
- Host: GitHub
- URL: https://github.com/frederickroman/syllable-count-predictor
- Owner: FrederickRoman
- License: mit
- Created: 2022-02-12T09:53:28.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-02-18T06:13:08.000Z (about 3 years ago)
- Last Synced: 2025-01-17T19:55:01.464Z (4 months ago)
- Topics: blstm, blstm-neural-networks, linguistics, neural-network, nlp, nlp-machine-learning, nltk, phonetics, syllable-count, tensorflow, tensorflow2, text-classification
- Language: Jupyter Notebook
- Homepage: https://readscale.netlify.app
- Size: 29.4 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# syllable-count-predictor
Neural network model that predicts the number of syllables in an English word. It shows its creation end-to-end: from data collection to evaluation of various models. This is the model design followed as part of the making of the reading level scoring app [Readgauge](https://readscale.netlify.app).
![]()
Screenshot from readgauge/about## Getting Started
This repo has both the data and the code to run the models. All you need to do is to meet the prerequisites.
### Prerequisites
Python>=3.8.6
```
nltk
pandas
numpy
tensorflow
```
### Preprocessing
#### Syllable count dictionary creation
Run the jupyter notebook cells in train.ipynb under [/preprocess/syllable_count_dict_creation](https://github.com/FrederickRoman/syllable-count-predictor/blob/main/ML/preprocess/syllable_count_dict_creation/syllable_count_dict_creation.ipynb)
#### Synthetic syllable count dictionary creation (for data augmentation)```
python ./ML/preprocess/data_synthesizer/data_synthesizer.py
```### Training
Run the jupyter notebook cells in train.ipynb under [training/feedforward](https://github.com/FrederickRoman/syllable-count-predictor/blob/main/ML/training/feedforward/ff_on_natural_data/train.ipynb) or under [training/blstm](https://github.com/FrederickRoman/syllable-count-predictor/blob/main/ML/training/blstm/blstm_on_natural_data/train.ipynb).## External deployment (not on this repo)
These model were trained to find one to be integrated to to the [Readgauge](https://readscale.netlify.app) client-side web app. It runs live [here](https://readscale.netlify.app) and its repository is [here](https://github.com/FrederickRoman/Readgauge).
![]()
![]()
### Data source
The syllableCountDict dataset contains the syllable count of each word
It was created using [nltk's built-in CMU dictionary](https://www.nltk.org/_modules/nltk/corpus/reader/cmudict.html).
The Carnegie Mellon Pronouncing Dictionary [cmudict.0.6]
Copyright 1998 Carnegie Mellon University