https://github.com/mozeel-v/word-wave
WordWave is an intelligent next-word and short-sequence predictor built on a Bidirectional LSTM with attention mechanism, trained on a subset of the Wikipedia dataset. The app provides real-time word generation and metric-based evaluation, accessible via a user-friendly Streamlit dashboard.
https://github.com/mozeel-v/word-wave
keras lstm rnn streamlit
Last synced: 2 months ago
JSON representation
WordWave is an intelligent next-word and short-sequence predictor built on a Bidirectional LSTM with attention mechanism, trained on a subset of the Wikipedia dataset. The app provides real-time word generation and metric-based evaluation, accessible via a user-friendly Streamlit dashboard.
- Host: GitHub
- URL: https://github.com/mozeel-v/word-wave
- Owner: Mozeel-V
- License: mit
- Created: 2025-06-21T07:29:57.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-21T07:41:23.000Z (about 1 year ago)
- Last Synced: 2025-06-21T08:29:51.745Z (about 1 year ago)
- Topics: keras, lstm, rnn, streamlit
- Language: Jupyter Notebook
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# WordWave β Next Word & Sequence Predictor
[](https://www.python.org/)
[](https://www.tensorflow.org/)
[](https://streamlit.io/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/Mozeel-V/word-wave)
WordWave is an intelligent next-word and short-sequence predictor built on a **Bidirectional LSTM** with **attention mechanism**, trained on a subset of the Wikipedia dataset. The app provides real-time word generation and metric-based evaluation, accessible via a user-friendly **Streamlit dashboard**.
---
## π Features
- Built using a deep **Embedding β BiLSTM β Attention β Dense** pipeline for next-word prediction
- Supports **beam search decoding** to improve generation quality over greedy search
- Evaluates with key metrics:
- β
**Top-5 Accuracy**
- π **Perplexity**
- π΅ **BLEU Score**
- User can input a **seed sentence and target length**, and the app generates fluent text
- Designed as a **Streamlit web app** for easy interaction and visualization
---
## π οΈ Project Structure
```sh
wordwave/
βββ app.py # Streamlit app UI and core functionalities
βββ word-wave.ipynb # Jupyter Notebook for training and saving model
βββ word-wave.keras # Trained Keras model (saved format)
βββ tokenizer.pkl # Fitted tokenizer object (Pickle)
βββ requirements.txt # All Python dependencies
βββ README.md # This file
βββ .gitignore # Standard gitignore file template
```
---
## βοΈ How to Run
### 1. Clone the repository
```bash
git clone https://github.com/Mozeel-V/wordwave.git
cd wordwave
```
### 2. Install dependencies
```bash
pip install -r requirements.txt
```
### 3. Run the Streamlit app
```bash
streamlit run app.py
```
Youβll be able to enter text, pick how many words to generate, and see live predictions along with evaluation metrics.
---
## π Model Overview
- **Architecture**:
`Embedding β Bidirectional LSTM β Attention β Dense`
- **Loss**: Sparse Categorical Crossentropy
- **Optimizer**: Adam
- **Evaluation Metrics**:
- Top-5 Accuracy (37%+ on eval subset)
- BLEU Score
- Perplexity (>200 baseline)
- **Decoding**: Supports both **greedy** and **beam search** decoding
- **Training Data**: English Wikipedia (`0.1%` slice from `20220301.en`)
---
## π§ Sample Generation
```text
Seed: "deep learning models are"
Generated: "deep learning models are used to perform various tasks including natural language processing"
```
- BLEU Score: 0.38
- Perplexity: 215.4
---
## π§ͺ Evaluation
### Top-5 Accuracy
Implemented using `sklearn.metrics.top_k_accuracy_score`, measuring how often the true word appears in the modelβs top 5 predictions.
### BLEU Score
Compares generated text to a reference sentence using `nltk` BLEU metric (1-gram to 4-gram weights).
### Perplexity
Calculated as the exponentiated negative average log-likelihood of predicted next words β lower is better.
---
## π§° Future Improvements
- Add character-level prediction
- Fine-tune with larger dataset portions
- Integrate GPT-style transformer decoder for comparison
- Export as REST API for backend integration
---
## β
How to Evaluate Model
You can run evaluation metrics either:
1. **Automatically via Streamlit app**, or
2. **Manually from notebook**:
```python
from sklearn.metrics import top_k_accuracy_score
from nltk.translate.bleu_score import sentence_bleu
```
---
## π¦ Requirements
- Python 3.7+
- TensorFlow 2.x
- Streamlit
- NLTK
- scikit-learn
- NumPy
---
## π License
MIT License β use freely for research and educational purposes.
---
## π€ Contributions
Contributions, feature requests, and feedback are welcome!
---
## π¨βπ» Author
**Mozeel Vanwani** | IIT Kharagpur CSE
π§ [vanwani.mozeel@gmail.com](mailto:vanwani.mozeel@gmail.com)
---