https://github.com/alwaysvivek/next-word-prediction

🔮 Predicts the next word in a text sequence using either an N-gram statistical model or an LSTM-based neural network.
https://github.com/alwaysvivek/next-word-prediction

argparse laplace-smoothing machine-learning neural-network ngrams nlp nltk numpy python3 tensorflow

Last synced: 3 months ago
JSON representation

🔮 Predicts the next word in a text sequence using either an N-gram statistical model or an LSTM-based neural network.

Host: GitHub
URL: https://github.com/alwaysvivek/next-word-prediction
Owner: alwaysvivek
License: mit
Created: 2025-11-02T19:01:30.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-11-02T19:05:22.000Z (9 months ago)
Last Synced: 2025-11-02T20:24:00.295Z (9 months ago)
Topics: argparse, laplace-smoothing, machine-learning, neural-network, ngrams, nlp, nltk, numpy, python3, tensorflow
Language: Python
Homepage:
Size: 326 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# 📚 Next Word Prediction

This project 🔮 predicts the next word in a sequence based on a given text corpus. It implements two different approaches:

1. **N-gram Model:** A statistical language model that predicts the next word based on the preceding N-1 words.
2. **Neural Network Model:** A deep learning model (specifically, an LSTM network) that learns to predict the next word from the text data.

## ⚙️ Requirements

* Python 3.x
* nltk
* tensorflow
* numpy

To install the dependencies, run:

```bash
pip install nltk tensorflow numpy
```

## 🚀 Usage

1. Clone the repository.
2. Run the `next_word_prediction.py` script.
3. You will be prompted to enter the following:
* The value of `n` for the N-gram model (e.g., 3 for trigrams).
* The seed text for prediction.
* The model type (`ngram` or `nn`).

```bash
python next_word_prediction.py --corpus
```

## 📊 N-gram Model

The N-gram model uses Laplace smoothing to handle unseen N-grams. The smoothing factor (alpha) is set to 1 by default. The vocabulary size is determined from the training corpus.

## 🧠 Neural Network Model

The neural network model is a simple LSTM network with an embedding layer, an LSTM layer, and a dense output layer. The model is trained on the input text data. It's crucial to have a large enough corpus for effective training.

## 💾 Training Data

The project uses `Pride and Prejudice.txt` as the default training data. You can replace this file with your own text file.

## 📝 Notes

* The neural network model may require significant training time depending on the size of the corpus.
* The performance of the models depends on the quality and size of the training data.
* The neural network implementation is a basic example and can be further improved by tuning hyperparameters, adding more layers, or using different architectures.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/alwaysvivek/next-word-prediction

Awesome Lists containing this project

README