Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/yas-sim/openvino-llm-minimal-code

Most simple and minimal code to run an LLM chatbot from HuggingFace hub with OpenVINO
https://github.com/yas-sim/openvino-llm-minimal-code

chatbot huggingface huggingface-transformers intel large-language-models llama llm llm-post-process neuralchat openvino optimum-intel python text-generation tinyllama transformers

Last synced: 3 months ago
JSON representation

Most simple and minimal code to run an LLM chatbot from HuggingFace hub with OpenVINO

Host: GitHub
URL: https://github.com/yas-sim/openvino-llm-minimal-code
Owner: yas-sim
Created: 2024-01-30T14:58:13.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-02-19T01:20:44.000Z (12 months ago)
Last Synced: 2024-11-08T20:03:46.310Z (3 months ago)
Topics: chatbot, huggingface, huggingface-transformers, intel, large-language-models, llama, llm, llm-post-process, neuralchat, openvino, optimum-intel, python, text-generation, tinyllama, transformers
Language: Python
Homepage:
Size: 132 KB
Stars: 5
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Minimum code to run an LLM model from HuggingFace with OpenVINO

## Programs / Files

|#|file name|description|

|---|---|---|

|1|[download_model.py](download_model.py)|Download a LLM model, and convert it into OpenVINO IR model|

|2|[inference.py](inference.py)|Run an LLM model with OpenVINO. One of the most simple LLM inferencing code with OpenVINO and the `optimum-intel` library.|

|3|[inference-stream.py](inference-stream.py)|Run an LLM model with OpenVINO and `optimum-intel`.
Display the answer in streaming mode (word by word).|

|4|[inference-stream-openvino-only.py](inference-stream-openvino-only.py)|Run an LLM model with only OpenVINO.
This program doesn't require any DL frameworks such as TF or PyTorch. Also, this program doesn't even use the '`optimum-intel`' library or HuggingFace tokenizers to run. This program uses a simple and dumb tokenizer (that I wrote) instead of HF tokenizers.
Try swapping the tokenizer to HF tokenizer in case you see only garbage text from the program (uncomment `AutoTokenizer` and comment out `SimpleTokenizer`)| 

|5|[inference-stream-openvino-only-greedy.py](inference-stream-openvino-only-greedy.py)|Same as program #4 but uses 'greedy decoding' instead of sampling.
This program generates fixed output text because it always picks the most probability token ID from the predictions (=greedy decoding).|

|6|[inference-stream-openvino-only-stateless.py](inference-stream-openvino-only-stateless.py)|Same as program #4 but supports **STATELESS** models (which does not use the internal state variables to keep KV-cache values inside of the model) instead of stateful models.|

## How to run

1. Preparation

Note: Converting LLM model requires a large amount of memory (>=32GB).

```sh

python -m venv venv

venv\Scripts\activate

python -m pip install -U pip

pip install -U setuptools wheel

pip install -r requirements.txt

```

2. Download an LLM model and generate OpenVINO IR models

```sh

python download_model.py

```

**Hint**: You can use `optimum-cli` tool to download the models from Huggingface hub, too. You need to install `optimum-intel` Python package to export the model for OpenVINO.  

**Hint**: You can generate a *stateless* model by adding `--disable-stateful` option.

```sh

optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4_asym_g64 TinyLlama-1.1B-Chat-v1.0/INT4

optimum-cli export openvino -m intel/neural-chat-7b-v3 --weight-format int4_asym_g64 neural-chat-7b-v3/INT4

```

3. Run inference

```sh

python inference.py

# or

python inference-stream.py

```

![stream.gif GitHub repository](./resources/stream.gif)

## Official '`optimum-intel`' documents  

Following web sites are also infomative and helpful for `optimum-intel` users.  

- ['optimum-intel' GitGHub Repository](https://github.com/huggingface/optimum-intel)  

- [Detailed description of inference API](https://huggingface.co/docs/optimum/intel/inference)

## Test environment

- Windows 11

- OpenVINO 2023.3.0 LTS