An open API service indexing awesome lists of open source software.

https://github.com/alwaysvivek/next-word-prediction

🔮 Predicts the next word in a text sequence using either an N-gram statistical model or an LSTM-based neural network.
https://github.com/alwaysvivek/next-word-prediction

argparse laplace-smoothing machine-learning neural-network ngrams nlp nltk numpy python3 tensorflow

Last synced: 2 months ago
JSON representation

🔮 Predicts the next word in a text sequence using either an N-gram statistical model or an LSTM-based neural network.

Awesome Lists containing this project

README

          

# 📚 Next Word Prediction

This project 🔮 predicts the next word in a sequence based on a given text corpus. It implements two different approaches:

1. **N-gram Model:** A statistical language model that predicts the next word based on the preceding N-1 words.
2. **Neural Network Model:** A deep learning model (specifically, an LSTM network) that learns to predict the next word from the text data.

## ⚙️ Requirements

* Python 3.x
* nltk
* tensorflow
* numpy

To install the dependencies, run:

```bash
pip install nltk tensorflow numpy
```

## 🚀 Usage

1. Clone the repository.
2. Run the `next_word_prediction.py` script.
3. You will be prompted to enter the following:
* The value of `n` for the N-gram model (e.g., 3 for trigrams).
* The seed text for prediction.
* The model type (`ngram` or `nn`).

```bash
python next_word_prediction.py --corpus
```

## 📊 N-gram Model

The N-gram model uses Laplace smoothing to handle unseen N-grams. The smoothing factor (alpha) is set to 1 by default. The vocabulary size is determined from the training corpus.

## 🧠 Neural Network Model

The neural network model is a simple LSTM network with an embedding layer, an LSTM layer, and a dense output layer. The model is trained on the input text data. It's crucial to have a large enough corpus for effective training.

## 💾 Training Data

The project uses `Pride and Prejudice.txt` as the default training data. You can replace this file with your own text file.

## 📝 Notes

* The neural network model may require significant training time depending on the size of the corpus.
* The performance of the models depends on the quality and size of the training data.
* The neural network implementation is a basic example and can be further improved by tuning hyperparameters, adding more layers, or using different architectures.