{"id":28179673,"url":"https://github.com/notshrirang/tinygpt","last_synced_at":"2025-05-16T02:13:28.621Z","repository":{"id":289431037,"uuid":"961966156","full_name":"NotShrirang/tinygpt","owner":"NotShrirang","description":"🎈 TinyGPT — A playful, lightweight 50M parameter GPT model fine-tuned on whimsical short stories. Fast, fun, and surprisingly creative!","archived":false,"fork":false,"pushed_at":"2025-04-23T08:09:53.000Z","size":141,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-23T08:47:16.927Z","etag":null,"topics":["gpt","large-language-models","llm","tinystories","transformer"],"latest_commit_sha":null,"homepage":"https://tinygpt.streamlit.app/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NotShrirang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-07T12:55:19.000Z","updated_at":"2025-04-23T08:09:56.000Z","dependencies_parsed_at":"2025-04-23T08:57:19.641Z","dependency_job_id":null,"html_url":"https://github.com/NotShrirang/tinygpt","commit_stats":null,"previous_names":["notshrirang/tinygpt"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NotShrirang%2Ftinygpt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NotShrirang%2Ftinygpt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NotShrirang%2Ftinygpt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NotShrirang%2Ftinygpt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NotShrirang","download_url":"https://codeload.github.com/NotShrirang/tinygpt/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254453586,"owners_count":22073618,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gpt","large-language-models","llm","tinystories","transformer"],"created_at":"2025-05-16T02:13:28.152Z","updated_at":"2025-05-16T02:13:28.605Z","avatar_url":"https://github.com/NotShrirang.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"https://github.com/user-attachments/assets/8a90d976-57cb-4e3b-a9e7-33e37816eb81\" alt=\"TinyGPT Banner\" /\u003e\n\n\n# TinyGPT 🤖\n\n[![GitHub stars](https://img.shields.io/github/stars/NotShrirang/tinygpt?style=social)](https://github.com/NotShrirang/tinygpt/stargazers)\n[![GitHub forks](https://img.shields.io/github/forks/NotShrirang/tinygpt?style=social)](https://github.com/NotShrirang/tinygpt/network/members)\n[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://tinygpt.streamlit.app/)\n\nTinyGPT is a compact 50M parameter GPT model trained on a dataset of tiny stories, designed to generate coherent and creative text based on user input. ✨\n\nHuggingFace Repository: https://huggingface.co/NotShrirang/tinygpt\n\nHosted Streamlit Application: https://tinygpt.streamlit.app/\n\n## Overview 🔍\n\nTinyGPT is a lightweight GPT implementation trained on a comprehensive dataset of short stories. With 50M parameters, it strikes a balance between computational efficiency and generative capability. The model was trained using a transformer architecture with self-attention mechanisms to capture contextual relationships in text.\n\n## Model Architecture 🏗️\n\nTinyGPT uses a standard GPT decoder-only transformer architecture with:\n\n- 8 transformer blocks 🧱\n- 8 attention heads 👁️\n- 512 embedding dimensions 📊\n- Vocabulary size of 50,304 tokens 📚\n- Context window of 512 tokens 🪟\n\n## Dataset 📖\n\nThe model was trained on the TinyStories dataset, a collection of short stories designed for training language models. This dataset provides simple narratives that help the model learn coherent story generation while maintaining a smaller size compared to larger language models.\n\n### Training Data Improvements 📈\n\n- **Scale**: TinyGPT was trained on approximately 300M tokens, significantly enhancing its language understanding capabilities.\n- **Data Processing**: Initially faced challenges with data preprocessing pipelines that affected how data was passed to the model. These issues have been resolved, leading to more consistent and higher-quality training.\n\n## Installation 💿\n\nTo install TinyGPT, follow these steps:\n\n```bash\n# Clone the repository\ngit clone https://github.com/NotShrirang/tinygpt.git\n\n# Navigate to the project directory\ncd tinygpt\n\n# Install the required packages\npip install -r requirements.txt\n\n# Download the model weights\nmkdir -p tinygpt/weights\n```\n\n## Usage 🚀\n\n### Streamlit Interface 🖥️\n\nThe easiest way to interact with TinyGPT is through its Streamlit interface:\n\n```bash\nstreamlit run main.py\n```\n\nThis will launch a web application where you can input text and see the model's generated responses.\n\n## Training ⚙️\n\nTinyGPT was trained using PyTorch on the TinyStories dataset. The training process involved:\n\n1. Tokenizing the input text\n2. Creating sliding windows of fixed block size\n3. Training the model with cross-entropy loss\n4. Applying learning rate scheduling with warmup and cosine decay\n\n\u003cimg src=\"https://github.com/user-attachments/assets/fd318849-d83b-4e44-aa3e-3119897cd4ae\" alt=\"Loss Curve\" width=\"70%\"/\u003e\n\n### Training Optimizations 🚀\n\nTinyGPT's training process leverages several optimization techniques to enhance speed, stability, and performance:\n\n- **Kernel Fusion**: Implemented to reduce memory bandwidth bottlenecks and speed up training operations\n- **Mixed Precision Training**: Utilizes bfloat16 format for significantly faster training while maintaining numerical stability\n- **Gradient Accumulation**: Applied to improve training stability and allow effective training with larger batch sizes\n- **Cosine Scheduler**: Implements variable learning rate throughout training for better convergence\n- **PyTorch's Multi-Head Attention**: Uses standard PyTorch implementations for Multi-Head Attention layers to boost training speed\n\nWhile using PyTorch's native attention implementation deviates from the \"from scratch\" philosophy, it enables more rapid model iteration and training with available resources.\n\nFor details on the training process, see the training notebook in the `notebooks/` directory.\n\n## Sample Outputs 📝\n\n### Example 1\n```text\nPrompt: One day, a dragon\n\nOutput:\nOne day, a dragon named Bobo was walking in the forest when he saw a little bunny. The bunny was sad because he had no friends. Bobo wanted to help the bunny, so he asked the bunny to give him a hug. The bunny said yes, and the bunny gave the bunny a hug.\n\nBobo was very happy and thanked the bunny. He named the bunny, and they became good friends. The bunny was always grateful for Bobo's help. They became good friends, and they always shared their toys and treats!\n```\n\n### Example 2\n```\nPrompt: A dog named\n\nOutput:\nA dog named Max went for a walk. He saw a big tree and wanted to climb it. Max was very excited and started to climb the tree. He was very careful and did not fall.\n\nMax saw a little girl named Sue. Sue was sad because she lost her toy. Max wanted to help Sue. He said, \"Don't worry, Sue. I will help you find your toy.\"\n\nMax and Sue looked for the toy together. They looked under the tree, behind the tree, and behind the tree. Finally, they found the toy under a big tree. Max was so happy and said, \"Thank you, Sue! You are a good friend.\"\n\nSue and Max played with the toy all day. They were very happy and had a fun day!\n```\n\n## Inference 🔮\n\nDuring inference, TinyGPT uses several techniques to produce high-quality text:\n\n- Temperature scaling for controlling randomness\n- Top-k and top-p sampling for focus and diversity\n- Efficient token generation one at a time\n\n## License 📜\n\nThis project is licensed under the GPL-3.0 license - see the LICENSE file for details.\n\n## Contributing 👥\n\nContributions are welcome! Feel free to submit pull requests, create issues, or suggest improvements to the model or codebase.\n\n## Support ❤️\n\nIf you find TinyGPT useful, please consider starring the repository ⭐\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnotshrirang%2Ftinygpt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnotshrirang%2Ftinygpt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnotshrirang%2Ftinygpt/lists"}