{"id":14964592,"url":"https://github.com/jinhanlei/llm-stream-service","last_synced_at":"2026-01-03T02:32:36.016Z","repository":{"id":240383957,"uuid":"802482690","full_name":"JinHanLei/LLM-Stream-Service","owner":"JinHanLei","description":"Streaming API and Web page for Large Language Models (Llama3) based on transformers+flask+gradio.","archived":false,"fork":false,"pushed_at":"2024-05-19T03:10:49.000Z","size":22,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-16T08:12:14.415Z","etag":null,"topics":["flask","gradio","huggingface","llama2","llama3","python","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JinHanLei.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-18T12:23:36.000Z","updated_at":"2025-01-16T09:02:21.000Z","dependencies_parsed_at":"2024-12-03T15:15:05.307Z","dependency_job_id":null,"html_url":"https://github.com/JinHanLei/LLM-Stream-Service","commit_stats":{"total_commits":7,"total_committers":1,"mean_commits":7.0,"dds":0.0,"last_synced_commit":"747681d7d52c705333db802abab3a04cd97a3d79"},"previous_names":["jinhanlei/llm-stream-service"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JinHanLei%2FLLM-Stream-Service","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JinHanLei%2FLLM-Stream-Service/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JinHanLei%2FLLM-Stream-Service/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JinHanLei%2FLLM-Stream-Service/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JinHanLei","download_url":"https://codeload.github.com/JinHanLei/LLM-Stream-Service/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243841217,"owners_count":20356446,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["flask","gradio","huggingface","llama2","llama3","python","transformers"],"created_at":"2024-09-24T13:33:28.347Z","updated_at":"2025-11-17T03:22:15.835Z","avatar_url":"https://github.com/JinHanLei.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LLM Stream Service\n\n![](https://img.shields.io/badge/license-MIT-blue)[![](https://img.shields.io/badge/Engilsh-0000FF)](README.md)[![](https://img.shields.io/badge/中文-FF0000)](README_zh.md)\n\n**Streaming API** and **Web page** for Large Language Models based on Python.\n\nThis repository contains:\n1. Transformers streaming generation: **REAL** streaming generation for all pre-trained models (based on transformers).\n2. Flask API: streaming response interface.\n3. Gradio APP: fast and easy LLM web page.\n\n## Quick Start\n\nTake Llama3 for example: \n\n1. Follow [Llama3 download](https://github.com/meta-llama/llama3?tab=readme-ov-file#download) to download Meta-Llama-3-8B-Instruct model, or from [huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) / [modelscope](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/summary).\n2. Follow [Llama3 quick-start](https://github.com/meta-llama/llama3?tab=readme-ov-file#quick-start) to install dependencies for Llama3.\n3. Clone this repository and install dependencies:\n\n    ```bash\n    git clone https://github.com/JinHanLei/LLM-Stream-Service\n    pip install flask gradio transformers\n    ```\n\n\n4. Run Flask service:\n\n   ```bash\n   python llama3_service.py --host 0.0.0.0 --port 8800 --ckpts /Meta-Llama-3-8B-Instruct\n   ```\n\n   **Note**\n   - Replace  `Meta-Llama-3-8B-Instruct/` with the path to your checkpoint directory.\n\n5. Run Gradio service:\n\n   ```bash\n   gradio llama3_app.py\n   ```\n\n   **Note**\n\n   - Replace the `Address` variable in `llama3_app.py` with your service address.\n\n# Journey and Challenges\n\n1. The initial streaming output scheme adopted by the project was the **TextIteratorStreamer** that comes with the official transformers library. However, the generation speed was still very slow. After researching, I found that the TextIteratorStreamer actually converts print-ready text into a streaming structure, meaning that the LLM first needs to generate the entire text block before converting it, which is not what I wanted. I wanted the LLM to yield each token as it is generated.\n\n2. Subsequently, I came across [LowinLi's project](https://github.com/LowinLi/transformers-stream-generator) that truly implemented streaming output for pretrained models. When I eagerly applied it to the Llama3 model, it threw an error. After debugging, I found that Llama3 has two **eos_tokens**, which caused the loop to generate negative ids. Thus, I made modifications based on this project, cleaned up redundancies, adapted it for Llama3, and made it easier to read and understand.\n\n# Thanks 🙇\n\n- https://github.com/meta-llama/llama3\n- https://github.com/TylunasLi/ChatGLM-web-stream-demo\n- https://github.com/LowinLi/transformers-stream-generator\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjinhanlei%2Fllm-stream-service","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjinhanlei%2Fllm-stream-service","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjinhanlei%2Fllm-stream-service/lists"}