https://github.com/alwaysvivek/next-word-prediction
🔮 Predicts the next word in a text sequence using either an N-gram statistical model or an LSTM-based neural network.
https://github.com/alwaysvivek/next-word-prediction
argparse laplace-smoothing machine-learning neural-network ngrams nlp nltk numpy python3 tensorflow
Last synced: 2 months ago
JSON representation
🔮 Predicts the next word in a text sequence using either an N-gram statistical model or an LSTM-based neural network.
- Host: GitHub
- URL: https://github.com/alwaysvivek/next-word-prediction
- Owner: alwaysvivek
- License: mit
- Created: 2025-11-02T19:01:30.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-11-02T19:05:22.000Z (8 months ago)
- Last Synced: 2025-11-02T20:24:00.295Z (8 months ago)
- Topics: argparse, laplace-smoothing, machine-learning, neural-network, ngrams, nlp, nltk, numpy, python3, tensorflow
- Language: Python
- Homepage:
- Size: 326 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 📚 Next Word Prediction
This project 🔮 predicts the next word in a sequence based on a given text corpus. It implements two different approaches:
1. **N-gram Model:** A statistical language model that predicts the next word based on the preceding N-1 words.
2. **Neural Network Model:** A deep learning model (specifically, an LSTM network) that learns to predict the next word from the text data.
## ⚙️ Requirements
* Python 3.x
* nltk
* tensorflow
* numpy
To install the dependencies, run:
```bash
pip install nltk tensorflow numpy
```
## 🚀 Usage
1. Clone the repository.
2. Run the `next_word_prediction.py` script.
3. You will be prompted to enter the following:
* The value of `n` for the N-gram model (e.g., 3 for trigrams).
* The seed text for prediction.
* The model type (`ngram` or `nn`).
```bash
python next_word_prediction.py --corpus
```
## 📊 N-gram Model
The N-gram model uses Laplace smoothing to handle unseen N-grams. The smoothing factor (alpha) is set to 1 by default. The vocabulary size is determined from the training corpus.
## 🧠 Neural Network Model
The neural network model is a simple LSTM network with an embedding layer, an LSTM layer, and a dense output layer. The model is trained on the input text data. It's crucial to have a large enough corpus for effective training.
## 💾 Training Data
The project uses `Pride and Prejudice.txt` as the default training data. You can replace this file with your own text file.
## 📝 Notes
* The neural network model may require significant training time depending on the size of the corpus.
* The performance of the models depends on the quality and size of the training data.
* The neural network implementation is a basic example and can be further improved by tuning hyperparameters, adding more layers, or using different architectures.