https://github.com/frederickroman/syllable-count-predictor

Neural network model that predicts the number of syllables in an English word. It shows its creation end-to-end: from data collection to evaluation of various models. One of the explored models is used in the Readgauge app.
https://github.com/frederickroman/syllable-count-predictor

blstm blstm-neural-networks linguistics neural-network nlp nlp-machine-learning nltk phonetics syllable-count tensorflow tensorflow2 text-classification

Last synced: 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/frederickroman/syllable-count-predictor
Owner: FrederickRoman
License: mit
Created: 2022-02-12T09:53:28.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-02-18T06:13:08.000Z (about 3 years ago)
Last Synced: 2025-01-17T19:55:01.464Z (4 months ago)
Topics: blstm, blstm-neural-networks, linguistics, neural-network, nlp, nlp-machine-learning, nltk, phonetics, syllable-count, tensorflow, tensorflow2, text-classification
Language: Jupyter Notebook
Homepage: https://readscale.netlify.app
Size: 29.4 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# syllable-count-predictor
Neural network model that predicts the number of syllables in an English word. It shows its creation end-to-end: from data collection to evaluation of various models. This is the model design followed as part of the making of the reading level scoring app [Readgauge](https://readscale.netlify.app).

Readgauge logo
Screenshot from readgauge/about

## Getting Started

This repo has both the data and the code to run the models. All you need to do is to meet the prerequisites.

### Prerequisites

Python>=3.8.6

```
nltk
pandas
numpy
tensorflow
```
### Preprocessing
#### Syllable count dictionary creation
Run the jupyter notebook cells in train.ipynb under [/preprocess/syllable_count_dict_creation](https://github.com/FrederickRoman/syllable-count-predictor/blob/main/ML/preprocess/syllable_count_dict_creation/syllable_count_dict_creation.ipynb)
#### Synthetic syllable count dictionary creation (for data augmentation)

```
python ./ML/preprocess/data_synthesizer/data_synthesizer.py
```

### Training
Run the jupyter notebook cells in train.ipynb under [training/feedforward](https://github.com/FrederickRoman/syllable-count-predictor/blob/main/ML/training/feedforward/ff_on_natural_data/train.ipynb) or under [training/blstm](https://github.com/FrederickRoman/syllable-count-predictor/blob/main/ML/training/blstm/blstm_on_natural_data/train.ipynb).

## External deployment (not on this repo)

These model were trained to find one to be integrated to to the [Readgauge](https://readscale.netlify.app) client-side web app. It runs live [here](https://readscale.netlify.app) and its repository is [here](https://github.com/FrederickRoman/Readgauge).

### Data source

The syllableCountDict dataset contains the syllable count of each word

It was created using [nltk's built-in CMU dictionary](https://www.nltk.org/_modules/nltk/corpus/reader/cmudict.html).

The Carnegie Mellon Pronouncing Dictionary [cmudict.0.6]
Copyright 1998 Carnegie Mellon University

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/frederickroman/syllable-count-predictor

Awesome Lists containing this project

README