{"id":29061712,"url":"https://github.com/mozeel-v/word-wave","last_synced_at":"2026-04-14T06:33:57.229Z","repository":{"id":300357880,"uuid":"1005967531","full_name":"Mozeel-V/word-wave","owner":"Mozeel-V","description":"WordWave is an intelligent next-word and short-sequence predictor built on a Bidirectional LSTM with attention mechanism, trained on a subset of the Wikipedia dataset. The app provides real-time word generation and metric-based evaluation, accessible via a user-friendly Streamlit dashboard.","archived":false,"fork":false,"pushed_at":"2025-06-21T07:41:23.000Z","size":0,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-21T08:29:51.745Z","etag":null,"topics":["keras","lstm","rnn","streamlit"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Mozeel-V.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-21T07:29:57.000Z","updated_at":"2025-06-21T07:42:44.000Z","dependencies_parsed_at":"2025-06-21T08:41:54.255Z","dependency_job_id":null,"html_url":"https://github.com/Mozeel-V/word-wave","commit_stats":null,"previous_names":["mozeel-v/word-wave"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Mozeel-V/word-wave","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mozeel-V%2Fword-wave","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mozeel-V%2Fword-wave/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mozeel-V%2Fword-wave/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mozeel-V%2Fword-wave/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Mozeel-V","download_url":"https://codeload.github.com/Mozeel-V/word-wave/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mozeel-V%2Fword-wave/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262219795,"owners_count":23276888,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["keras","lstm","rnn","streamlit"],"created_at":"2025-06-27T08:04:17.514Z","updated_at":"2026-04-14T06:33:57.223Z","avatar_url":"https://github.com/Mozeel-V.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# WordWave – Next Word \u0026 Sequence Predictor\n\n[![Made with Python](https://img.shields.io/badge/Made%20with-Python-3670A0?logo=python\u0026logoColor=white)](https://www.python.org/)\n[![Deep Learning](https://img.shields.io/badge/Model-TensorFlow%2FKeras-orange?logo=tensorflow\u0026logoColor=white)](https://www.tensorflow.org/)\n[![Streamlit App](https://img.shields.io/badge/Frontend-Streamlit-FF4B4B?logo=streamlit\u0026logoColor=white)](https://streamlit.io/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Status](https://img.shields.io/badge/Build-Active-blue)](https://github.com/Mozeel-V/word-wave)\n\nWordWave is an intelligent next-word and short-sequence predictor built on a **Bidirectional LSTM** with **attention mechanism**, trained on a subset of the Wikipedia dataset. The app provides real-time word generation and metric-based evaluation, accessible via a user-friendly **Streamlit dashboard**.\n\n---\n\n## 🚀 Features\n\n- Built using a deep **Embedding → BiLSTM → Attention → Dense** pipeline for next-word prediction\n- Supports **beam search decoding** to improve generation quality over greedy search\n- Evaluates with key metrics:  \n  - ✅ **Top-5 Accuracy**\n  - 📉 **Perplexity**\n  - 🔵 **BLEU Score**\n- User can input a **seed sentence and target length**, and the app generates fluent text\n- Designed as a **Streamlit web app** for easy interaction and visualization\n\n---\n\n## 🛠️ Project Structure\n\n```sh\nwordwave/\n├── app.py                 # Streamlit app UI and core functionalities\n├── word-wave.ipynb        # Jupyter Notebook for training and saving model\n├── word-wave.keras        # Trained Keras model (saved format)\n├── tokenizer.pkl          # Fitted tokenizer object (Pickle)\n├── requirements.txt       # All Python dependencies\n├── README.md              # This file\n└── .gitignore             # Standard gitignore file template\n```\n\n---\n\n## ⚙️ How to Run\n\n### 1. Clone the repository\n```bash\ngit clone https://github.com/Mozeel-V/wordwave.git\ncd wordwave\n```\n\n### 2. Install dependencies\n```bash\npip install -r requirements.txt\n```\n\n### 3. Run the Streamlit app\n```bash\nstreamlit run app.py\n```\n\nYou’ll be able to enter text, pick how many words to generate, and see live predictions along with evaluation metrics.\n\n---\n\n## 📈 Model Overview\n\n- **Architecture**:  \n  `Embedding → Bidirectional LSTM → Attention → Dense`\n- **Loss**: Sparse Categorical Crossentropy  \n- **Optimizer**: Adam  \n- **Evaluation Metrics**:  \n  - Top-5 Accuracy (37%+ on eval subset)  \n  - BLEU Score  \n  - Perplexity (\u003e200 baseline)\n\n- **Decoding**: Supports both **greedy** and **beam search** decoding\n- **Training Data**: English Wikipedia (`0.1%` slice from `20220301.en`)\n\n---\n\n## 🧠 Sample Generation\n\n```text\nSeed: \"deep learning models are\"\nGenerated: \"deep learning models are used to perform various tasks including natural language processing\"\n```\n\n- BLEU Score: 0.38 \n- Perplexity: 215.4\n\n---\n\n## 🧪 Evaluation\n\n### Top-5 Accuracy\n\nImplemented using `sklearn.metrics.top_k_accuracy_score`, measuring how often the true word appears in the model’s top 5 predictions.\n\n### BLEU Score\n\nCompares generated text to a reference sentence using `nltk` BLEU metric (1-gram to 4-gram weights).\n\n### Perplexity\n\nCalculated as the exponentiated negative average log-likelihood of predicted next words — lower is better.\n\n---\n\n## 🧰 Future Improvements\n\n- Add character-level prediction\n- Fine-tune with larger dataset portions\n- Integrate GPT-style transformer decoder for comparison\n- Export as REST API for backend integration\n\n---\n\n## ✅ How to Evaluate Model\n\nYou can run evaluation metrics either:\n\n1. **Automatically via Streamlit app**, or  \n2. **Manually from notebook**:\n\n```python\nfrom sklearn.metrics import top_k_accuracy_score\nfrom nltk.translate.bleu_score import sentence_bleu\n```\n\n---\n\n## 📦 Requirements\n\n- Python 3.7+\n- TensorFlow 2.x\n- Streamlit\n- NLTK\n- scikit-learn\n- NumPy\n\n---\n\n## 📝 License\n\nMIT License — use freely for research and educational purposes.\n\n---\n\n## 🤝 Contributions\n\nContributions, feature requests, and feedback are welcome!\n\n---\n\n## 👨‍💻 Author\n\n**Mozeel Vanwani** | IIT Kharagpur CSE  \n📧 [vanwani.mozeel@gmail.com](mailto:vanwani.mozeel@gmail.com)\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmozeel-v%2Fword-wave","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmozeel-v%2Fword-wave","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmozeel-v%2Fword-wave/lists"}