{"id":50432704,"url":"https://github.com/ahmadrazacdx/seq-modeling-from-scratch","last_synced_at":"2026-05-31T15:01:23.822Z","repository":{"id":325233073,"uuid":"1100355653","full_name":"ahmadrazacdx/seq-modeling-from-scratch","owner":"ahmadrazacdx","description":"From Scratch RNN, LSTM, GRU, and Seq2Seq architectures for language modeling and educational purposes.","archived":false,"fork":false,"pushed_at":"2025-11-26T09:31:18.000Z","size":801,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-29T08:46:11.429Z","etag":null,"topics":["attention-mechanism","deep-learning","educational","from-scratch","gru","language-modeling","lstm","rnn","seq2seq"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ahmadrazacdx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-20T07:08:23.000Z","updated_at":"2025-11-26T09:31:22.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ahmadrazacdx/seq-modeling-from-scratch","commit_stats":null,"previous_names":["ahmadrazacdx/seq-modeling-from-scratch"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ahmadrazacdx/seq-modeling-from-scratch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmadrazacdx%2Fseq-modeling-from-scratch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmadrazacdx%2Fseq-modeling-from-scratch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmadrazacdx%2Fseq-modeling-from-scratch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmadrazacdx%2Fseq-modeling-from-scratch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ahmadrazacdx","download_url":"https://codeload.github.com/ahmadrazacdx/seq-modeling-from-scratch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmadrazacdx%2Fseq-modeling-from-scratch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33735663,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention-mechanism","deep-learning","educational","from-scratch","gru","language-modeling","lstm","rnn","seq2seq"],"created_at":"2026-05-31T15:01:22.874Z","updated_at":"2026-05-31T15:01:23.813Z","avatar_url":"https://github.com/ahmadrazacdx.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sequence Modeling from Scratch [![DOI](https://zenodo.org/badge/1100355653.svg)](https://doi.org/10.5281/zenodo.17720229)\n\n\u003e *What I cannot create, I do not understand.* — **Richard Feynman**\n\n## Abstract\nIn an era where deep learning frameworks increasingly abstract the underlying mathematics of neural networks, the intuitive grasp of gradient dynamics is often lost. **Designed as a comprehensive educational resource**, this repository presents a rigorous, first-principles reconstruction of sequence modeling architectures spanning Recurrent Neural Networks (RNNs), Long Short-Term Memory networks (LSTMs), Gated Recurrent Units (GRUs), and Encoder-Decoder (Seq2Seq) with Attention, implemented entirely in **NumPy**. Unlike standard tutorials that rely on automatic differentiation engines (e.g., Autograd), this curriculum explicitly derives and implements the Backpropagation Through Time (BPTT). By manually engineering the forward and backward passes, the work exposes the specific algebraic mechanisms that govern memory retention, gradient flow, and attention scoring. The implementations are validated against foundational literature *(Elman, 1990; Hochreiter \u0026 Schmidhuber, 1997; Cho et al., 2014; Bahdanau et al., 2014; Luong et al., 2015)*, providing a comprehensive, transparent view into understanding of the algorithms often hidden behind black-box APIs.\n## Key Learning Outcomes\nBy completing this curriculum, you will be able to:\n\n* **Master Backpropagation Through Time (BPTT):** Go beyond \"black box\" APIs by manually deriving and implementing the calculus behind RNNs, LSTMs, and GRUs.\n* **Bridge Theory and Implementation:** Develop the skill to translate complex mathematical equations from research papers (like *Cho et al. 2014*) directly into working, vectorized NumPy code.\n* **Understand NLP Efficiency:** Learn how discrete tokens are mapped to continuous vectors and how to implement sparse gradient updates for embedding layers manually.\n* **Analyze Model Internals:** Gain deep intuition into *why* architectures behave the way they do, enabling you to debug convergence issues like vanishing gradients.\n## Curriculum\nThe repository is organized into phases, guiding you from character-level to word-level progression.\n\n### Phase 1: Character-Level Fundamentals\n*Objective: Derive and implement the core algorithms of recurrence and memory.*\n\n* **[01_RNN_NumPy.ipynb](./char_level_lm/01_RNN_NumPy.ipynb):** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ahmadrazacdx/seq-modeling-from-scratch/blob/main/char_level_lm/01_RNN_NumPy.ipynb) Vanilla RNN. Implements the basic recurrence relation `h_t = tanh(Wx + Uh)` and visualizes the vanishing gradient problem in practice.\n* **[02_LSTM_NumPy.ipynb](./char_level_lm/02_LSTM_NumPy.ipynb):** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ahmadrazacdx/seq-modeling-from-scratch/blob/main/char_level_lm/02_LSTM_NumPy.ipynb) Vanilla LSTM. Constructs the complete four-gate architecture (Forget, Input, Candidate, Output) and the cell state highway that preserves long-term gradients.\n* **[03_GRU_NumPy.ipynb](./char_level_lm/03_GRU_NumPy.ipynb):** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ahmadrazacdx/seq-modeling-from-scratch/blob/main/char_level_lm/03_GRU_NumPy.ipynb) Vanilla GRU. Implements the original Gated Recurrent Unit formulation as defined by **Cho et al. (2014)**.\n\n### Phase 2: Word-Level Modeling \u0026 Embeddings\n*Objective: Transition from discrete characters to continuous dense vector representations.*\n\n* **[01_RNN_NumPy.ipynb](./word_level_lm/01_RNN_NumPy.ipynb):** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ahmadrazacdx/seq-modeling-from-scratch/blob/main/word_level_lm/01_RNN_NumPy.ipynb) Embedding Layers. Replaces one-hot encoding with lookup tables and implements **sparse gradient updates** manually during backpropagation.\n* **[02_LSTM_NumPy.ipynb](./word_level_lm/02_LSTM_NumPy.ipynb):** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ahmadrazacdx/seq-modeling-from-scratch/blob/main/word_level_lm/02_LSTM_NumPy.ipynb) Word-Level LSTM. Integrates learned embeddings with the LSTM architecture to handle larger vocabularies.\n* **[03_GRU_NumPy.ipynb](./word_level_lm/03_GRU_NumPy.ipynb):** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ahmadrazacdx/seq-modeling-from-scratch/blob/main/word_level_lm/03_GRU_NumPy.ipynb) Implements the **PyTorch definition** of the GRU (where the reset gate is applied after matrix multiplication), contrasting it with the academic paper definition.\n\n### Phase 3: Sequence-to-Sequence Architectures\n*Objective: Build complex architectures for mapping variable-length sequences.*\n* **[01_Encoder_Decoder_NumPy.ipynb](./seq_2_seq/01_Encoder_Decoder_NumPy.ipynb):** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ahmadrazacdx/seq-modeling-from-scratch/blob/main/seq_2_seq/01_Encoder_Decoder_NumPy.ipynb) Implements a full Encoder-Decoder architecture with a non-linear **Bridge** layer connecting the two networks. Features **Teacher Forcing** for training and **Autoregressive Inference** for sentence prediction, deriving the full BPTT chain rule across the bridge.\n* **[02_Bahdanau_Attention_NumPy.ipynb](./seq_2_seq/02_Bahdanau_Attention_NumPy.ipynb):** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ahmadrazacdx/seq-modeling-from-scratch/blob/main/seq_2_seq/02_Bahdanau_Attention_NumPy.ipynb) Overcomes the fixed-vector bottleneck by implementing **Bahdanau (Additive) Attention**. Features a fully differentiable attention mechanism that computes dynamic context vectors ($c_t$) and visualizes the **alignment matrix** to show source-target word relationships.\n* **[03_Luong_Attention_NumPy.ipynb](./seq_2_seq/03_Luong_Attention_NumPy.ipynb):** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ahmadrazacdx/seq-modeling-from-scratch/blob/main/seq_2_seq/03_Luong_Attention_NumPy.ipynb) Upgrades the Bahdanau model to the \"Update $\\rightarrow$ Look $\\rightarrow$ Predict\" paradigm. Implements the **General** scoring function (`h_t @ Wa @ h_s`) and introduces the **Attentional Vector** ($\\tilde{h}_t$) for final predictions.\n\n##  Implementation Details\nThis repository enforces production-grade engineering standards to ensure model convergence and numerical stability:\n\n1.  **Architectural Fidelity:** All models are rigorously verified against foundational research papers (e.g., *Cho et al., 2014*; *Hochreiter \u0026 Schmidhuber, 1997*) to ensure the code aligns exactly with established research.\n2.  **Adaptive Optimization:** Manual implementation of the **Adam Optimizer**, incorporating first and second moment estimation with bias correction ($\\hat{m}, \\hat{v}$) for stable convergence.\n3.  **Gradient Stabilization:** Norm-based **Gradient Clipping** applied during Backpropagation Through Time (BPTT) to mitigate the exploding gradient problem inherent in recurrent architectures.\n4.  **Numerical Precision:** Softmax and Cross-Entropy implementations utilizing the **Log-Sum-Exp** trick to prevent floating-point overflow and `NaN` errors.\n5.  **Vectorized Calculus:** Matrix operations are fully vectorized using NumPy broadcasting, ensuring computational efficiency by minimizing scalar Python loops.\n## Quick Start\n\n1.  **Clone the repository:**\n    ```bash\n    git clone https://github.com/ahmadrazacdx/seq-modeling-from-scratch.git\n    cd seq-modeling-from-scratch\n    ```\n\n2.  **Install dependencies:**\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n3.  **Run a notebook:**\n    Open any notebook in Google Colab or Jupyter. The data loader automatically looks for `data/thirsty_crow.txt`.\n\n## Sample Output\n*From `seq2seq/01_Encoder_Decoder_NumPy.ipynb` after 5000 iterations:*\n\n```text\nIteration 5000 | Loss: 0.4254\nInput:  The crow was thirsty .\nOutput: then he got an idea !\n```\n\n## References\n\n* **[RNN]** Elman, J. L. (1990). Finding structure in time. *Cognitive Science*, 14(2), 179-211. [PDF](https://jontalle.web.engr.illinois.edu/Public/Elman-FindingStructureinTime.90.pdf)\n* **[LSTM]** Hochreiter, S., \u0026 Schmidhuber, J. (1997). Long short-term memory. *Neural Computation*, 9(8), 1735-1780. [PDF](https://www.bioinf.jku.at/publications/older/2604.pdf)\n* **[GRU/Encoder-Decoder]** Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., \u0026 Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. *Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)*. [arXiv](https://arxiv.org/abs/1406.1078)\n* **[Bahdanau Attention]** Bahdanau, D., Cho, K., \u0026 Bengio, Y. (2014). *Neural machine translation by jointly learning to align and translate.* International Conference on Learning Representations (ICLR). [arXiv](https://arxiv.org/abs/1409.0473)\n* **[Luong Attention]** Luong, M. T., Pham, H., \u0026 Manning, C. D. (2015). *Effective Approaches to Attention-based Neural Machine Translation*. EMNLP. [arXiv](https://arxiv.org/abs/1508.04025)\n* **[Adam]** Kingma, D. P., \u0026 Ba, J. (2015). Adam: A method for stochastic optimization. *International Conference on Learning Representations (ICLR)*. [arXiv](https://arxiv.org/abs/1412.6980)\n\n## Citation\n\nIf you use this work, please cite it using `CITATION.cff` or the following BibTeX entry:\n\n```bibtex\n@misc{ahmad2025seqmodeling,\n  author = {{Ahmad Raza}},\n  title = {Sequence Modeling from Scratch: Rigorous NumPy Implementations of RNN, LSTM, GRU, and Attention},\n  year = {2025},\n  publisher = {GitHub},\n  url = {https://github.com/ahmadrazacdx/seq-modeling-from-scratch}\n}\n```\n## Feedback\nThis repository is a living curriculum. I built this to truly understand the recurrent nets and their dynamics. If you spot a mathematical inconsistency or have a question about a derivation/code, please open an issue.\n\n---\n**Find this resource helpful? A star ⭐ is the best way to support the project!**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fahmadrazacdx%2Fseq-modeling-from-scratch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fahmadrazacdx%2Fseq-modeling-from-scratch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fahmadrazacdx%2Fseq-modeling-from-scratch/lists"}