An open API service indexing awesome lists of open source software.

https://github.com/frederickroman/syllable-count-predictor

Neural network model that predicts the number of syllables in an English word. It shows its creation end-to-end: from data collection to evaluation of various models. One of the explored models is used in the Readgauge app.
https://github.com/frederickroman/syllable-count-predictor

blstm blstm-neural-networks linguistics neural-network nlp nlp-machine-learning nltk phonetics syllable-count tensorflow tensorflow2 text-classification

Last synced: 2 months ago
JSON representation

Neural network model that predicts the number of syllables in an English word. It shows its creation end-to-end: from data collection to evaluation of various models. One of the explored models is used in the Readgauge app.

Awesome Lists containing this project

README

        

# syllable-count-predictor
Neural network model that predicts the number of syllables in an English word. It shows its creation end-to-end: from data collection to evaluation of various models. This is the model design followed as part of the making of the reading level scoring app [Readgauge](https://readscale.netlify.app).

Readgauge logo
Screenshot from readgauge/about

## Getting Started

This repo has both the data and the code to run the models. All you need to do is to meet the prerequisites.

### Prerequisites

Python>=3.8.6

```
nltk
pandas
numpy
tensorflow
```
### Preprocessing
#### Syllable count dictionary creation
Run the jupyter notebook cells in train.ipynb under [/preprocess/syllable_count_dict_creation](https://github.com/FrederickRoman/syllable-count-predictor/blob/main/ML/preprocess/syllable_count_dict_creation/syllable_count_dict_creation.ipynb)
#### Synthetic syllable count dictionary creation (for data augmentation)

```
python ./ML/preprocess/data_synthesizer/data_synthesizer.py
```

### Training
Run the jupyter notebook cells in train.ipynb under [training/feedforward](https://github.com/FrederickRoman/syllable-count-predictor/blob/main/ML/training/feedforward/ff_on_natural_data/train.ipynb) or under [training/blstm](https://github.com/FrederickRoman/syllable-count-predictor/blob/main/ML/training/blstm/blstm_on_natural_data/train.ipynb).

## External deployment (not on this repo)

These model were trained to find one to be integrated to to the [Readgauge](https://readscale.netlify.app) client-side web app. It runs live [here](https://readscale.netlify.app) and its repository is [here](https://github.com/FrederickRoman/Readgauge).


Readgauge logo
Results mockup

### Data source

The syllableCountDict dataset contains the syllable count of each word

It was created using [nltk's built-in CMU dictionary](https://www.nltk.org/_modules/nltk/corpus/reader/cmudict.html).

The Carnegie Mellon Pronouncing Dictionary [cmudict.0.6]
Copyright 1998 Carnegie Mellon University