{"id":20685371,"url":"https://github.com/yas-sim/openvino-llm-minimal-code","last_synced_at":"2025-04-22T13:39:27.695Z","repository":{"id":220034594,"uuid":"750390156","full_name":"yas-sim/openvino-llm-minimal-code","owner":"yas-sim","description":"Most simple and minimal code to run an LLM chatbot from HuggingFace hub with OpenVINO","archived":false,"fork":false,"pushed_at":"2024-02-19T01:20:44.000Z","size":135,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-08T08:47:06.559Z","etag":null,"topics":["chatbot","huggingface","huggingface-transformers","intel","large-language-models","llama","llm","llm-post-process","neuralchat","openvino","optimum-intel","python","text-generation","tinyllama","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yas-sim.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-30T14:58:13.000Z","updated_at":"2024-06-23T16:47:27.000Z","dependencies_parsed_at":"2024-02-05T03:43:55.478Z","dependency_job_id":"6ba2b9ad-7002-472f-a23d-a9322e410d69","html_url":"https://github.com/yas-sim/openvino-llm-minimal-code","commit_stats":{"total_commits":36,"total_committers":2,"mean_commits":18.0,"dds":0.08333333333333337,"last_synced_commit":"087e280b25a960036061674bb541cf2c79b436c9"},"previous_names":["yas-sim/openvino-llm-minimal-code"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yas-sim%2Fopenvino-llm-minimal-code","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yas-sim%2Fopenvino-llm-minimal-code/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yas-sim%2Fopenvino-llm-minimal-code/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yas-sim%2Fopenvino-llm-minimal-code/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yas-sim","download_url":"https://codeload.github.com/yas-sim/openvino-llm-minimal-code/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249601865,"owners_count":21297956,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatbot","huggingface","huggingface-transformers","intel","large-language-models","llama","llm","llm-post-process","neuralchat","openvino","optimum-intel","python","text-generation","tinyllama","transformers"],"created_at":"2024-11-16T22:27:06.131Z","updated_at":"2025-04-22T13:39:27.648Z","avatar_url":"https://github.com/yas-sim.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Minimum code to run an LLM model from HuggingFace with OpenVINO\r\n\r\n## Programs / Files\r\n|#|file name|description|\r\n|---|---|---|\r\n|1|[download_model.py](download_model.py)|Download a LLM model, and convert it into OpenVINO IR model|\r\n|2|[inference.py](inference.py)|Run an LLM model with OpenVINO. One of the most simple LLM inferencing code with OpenVINO and the `optimum-intel` library.|\r\n|3|[inference-stream.py](inference-stream.py)|Run an LLM model with OpenVINO and `optimum-intel`.\u003cbr\u003eDisplay the answer in streaming mode (word by word).|\r\n|4|[inference-stream-openvino-only.py](inference-stream-openvino-only.py)|Run an LLM model with only OpenVINO.\u003cbr\u003eThis program doesn't require any DL frameworks such as TF or PyTorch. Also, this program doesn't even use the '`optimum-intel`' library or HuggingFace tokenizers to run. This program uses a simple and dumb tokenizer (that I wrote) instead of HF tokenizers.\u003cbr\u003eTry swapping the tokenizer to HF tokenizer in case you see only garbage text from the program (uncomment `AutoTokenizer` and comment out `SimpleTokenizer`)| \r\n|5|[inference-stream-openvino-only-greedy.py](inference-stream-openvino-only-greedy.py)|Same as program #4 but uses 'greedy decoding' instead of sampling.\u003cbr\u003eThis program generates fixed output text because it always picks the most probability token ID from the predictions (=greedy decoding).|\r\n|6|[inference-stream-openvino-only-stateless.py](inference-stream-openvino-only-stateless.py)|Same as program #4 but supports **STATELESS** models (which does not use the internal state variables to keep KV-cache values inside of the model) instead of stateful models.|\r\n\r\n## How to run\r\n\r\n1. Preparation\r\n\r\nNote: Converting LLM model requires a large amount of memory (\u003e=32GB).\r\n```sh\r\npython -m venv venv\r\nvenv\\Scripts\\activate\r\npython -m pip install -U pip\r\npip install -U setuptools wheel\r\npip install -r requirements.txt\r\n```\r\n\r\n2. Download an LLM model and generate OpenVINO IR models\r\n```sh\r\npython download_model.py\r\n```\r\n**Hint**: You can use `optimum-cli` tool to download the models from Huggingface hub, too. You need to install `optimum-intel` Python package to export the model for OpenVINO.  \r\n**Hint**: You can generate a *stateless* model by adding `--disable-stateful` option.\r\n```sh\r\noptimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4_asym_g64 TinyLlama-1.1B-Chat-v1.0/INT4\r\noptimum-cli export openvino -m intel/neural-chat-7b-v3 --weight-format int4_asym_g64 neural-chat-7b-v3/INT4\r\n```\r\n\r\n3. Run inference\r\n```sh\r\npython inference.py\r\n# or\r\npython inference-stream.py\r\n```\r\n\r\n![stream.gif GitHub repository](./resources/stream.gif)\r\n\r\n\r\n## Official '`optimum-intel`' documents  \r\nFollowing web sites are also infomative and helpful for `optimum-intel` users.  \r\n- ['optimum-intel' GitGHub Repository](https://github.com/huggingface/optimum-intel)  \r\n- [Detailed description of inference API](https://huggingface.co/docs/optimum/intel/inference)\r\n\r\n## Test environment\r\n- Windows 11\r\n- OpenVINO 2023.3.0 LTS\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyas-sim%2Fopenvino-llm-minimal-code","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyas-sim%2Fopenvino-llm-minimal-code","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyas-sim%2Fopenvino-llm-minimal-code/lists"}