https://github.com/lopatnov/ovms-continue

OpenVINO Model Server for Continue VSCode Extension
https://github.com/lopatnov/ovms-continue

ai-agents continue local-llm openai openvino ovms qwen visual-studio-code

Last synced: about 2 months ago
JSON representation

OpenVINO Model Server for Continue VSCode Extension

Host: GitHub
URL: https://github.com/lopatnov/ovms-continue
Owner: lopatnov
Created: 2025-12-19T16:39:56.000Z (7 months ago)
Default Branch: main
Last Pushed: 2025-12-19T19:41:57.000Z (7 months ago)
Last Synced: 2025-12-22T07:18:45.823Z (7 months ago)
Topics: ai-agents, continue, local-llm, openai, openvino, ovms, qwen, visual-studio-code
Language: Rich Text Format
Homepage:
Size: 76.3 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # OpenVINO Model Server for Continue VSCode Extension

Local AI models for code completion and chat in VS Code via [Continue Extension][continue]. This project can run various models, but the Qwen2.5-Coder model has shown good results experimentally. Server is precompiled for Windows.

## The hardware and software on which the system was run

- Laptop: Acer Swift Go 14 (iGPU Intel Arc, 18Gb GPU, 32Gb RAM)

- Software: Windows 11, [Intel® oneAPI Base Toolkit 2025.3.0][one-api], [Intel® Deep Learning Essentials 2025.3.0][deep-learning] (not sure that it's required)

- Folder: `C:\ai-projects\ovms-continue` (should work from any folder)

## Current models

| Model  | Size   | Speed    | Goal       | Pushed to GitHub |

|--------|--------|----------|------------|------------------|

| Qwen2.5-Coder-14B | ~8GB | ⚡ | Best quality | No, too huge |

| Qwen2.5-Coder-7B | ~4GB | ⚡⚡ | Balance | No, too huge |

| Qwen2.5-Coder-3B | ~2GB | ⚡⚡⚡ | Quick | Yes |

| Qwen2.5-Coder-1.5B | ~1GB | ⚡⚡⚡⚡ | Autocomplete | Yes |

| Qwen2.5-Coder-0.5B | ~300MB | ⚡⚡⚡⚡⚡ | Minimal resources | Yes |

## Quick start

### 1. Configure models

Edit `config_all.json` in `models` folder to configure models that should be loaded.

For example:

```json

{

    "model_config_list": [

        {

            "config": {

                "name": "Qwen2.5-Coder-0.5B-Instruct-int4-ov",

                "base_path": "Qwen2.5-Coder-0.5B-Instruct-int4-ov"

            }

        },

        {

            "config": {

                "name": "Qwen2.5-Coder-1.5B-Instruct-int4-ov",

                "base_path": "Qwen2.5-Coder-1.5B-Instruct-int4-ov"

            }

        },

        {

            "config": {

                "name": "Qwen2.5-Coder-3B-Instruct-int4-ov",

                "base_path": "Qwen2.5-Coder-3B-Instruct-int4-ov"

            }

        },

        {

            "config": {

                "name": "Qwen2.5-Coder-7B-Instruct-int4-ov",

                "base_path": "Qwen2.5-Coder-7B-Instruct-int4-ov"

            }

        }

        {

            "config": {

                "name": "Qwen2.5-Coder-14B-Instruct-int4-ov",

                "base_path": "Qwen2.5-Coder-14B-Instruct-int4-ov"

            }

        }

    ]

}

```

I not recommend to use many models. More models requires more memory.

### 2. Start OVMS

```cmd

cd C:\ai-projects\ovms-continue

start-ovms.bat

```

### 3. Check

Open in browser: http://localhost:8000/v3/models

### 4. Configure VS Code

[Continue][continue] will automatically connect to OVMS (port 8000).

Edit configuration file `C:\Users\\.continue\config.yaml` like so:

```yaml

models:

  - name: Qwen2.5-Coder-3B (GPU)

    provider: openai

    model: Qwen2.5-Coder-3B-Instruct-int4-ov

    apiKey: unused

    apiBase: http://localhost:8000/v3

    roles:

      - chat

      - edit

      - apply

  - name: Qwen2.5-Coder-1.5B (GPU)

    provider: openai

    model: Qwen2.5-Coder-1.5B-Instruct-int4-ov

    apiKey: unused

    apiBase: http://localhost:8000/v3

    roles:

      - autocomplete

```

Set your local models in Continue extension:

![Continue Models](./continue-config.jpg)

## Check that it's working

![Continue works](./continue-work.jpg)

---

---

## Adding New Models from HuggingFace

### Download Model

```cmd

cd .\models

git clone https://huggingface.co/OpenVINO/Qwen2.5-Coder-7B-Instruct-int4-ov

```

### Prepare Model Structure

OVMS requires specific folder structure. After downloading:

**1. Create version subfolder `1/`:**

```cmd

cd Qwen2.5-Coder-7B-Instruct-int4-ov

mkdir 1

```

**2. Move model files into `1/`:**

```cmd

move *.json 1\

move *.bin 1\

move *.xml 1\

move *.txt 1\

move *.model 1\

```

**3. Create `graph.pbtxt`** in model root folder (not in `1/`):

```protobuf

input_stream: "HTTP_REQUEST_PAYLOAD:input"

output_stream: "HTTP_RESPONSE_PAYLOAD:output"

node: {

  name: "LLMExecutor"

  calculator: "HttpLLMCalculator"

  input_stream: "LOOPBACK:loopback"

  input_stream: "HTTP_REQUEST_PAYLOAD:input"

  input_side_packet: "LLM_NODE_RESOURCES:llm"

  output_stream: "LOOPBACK:loopback"

  output_stream: "HTTP_RESPONSE_PAYLOAD:output"

  input_stream_info: {

    tag_index: 'LOOPBACK:0',

    back_edge: true

  }

  node_options: {

      [type.googleapis.com / mediapipe.LLMCalculatorOptions]: {

          models_path: "./1",

          cache_size: 4,

          max_num_seqs: 256,

          dynamic_split_fuse: true,

          device: "GPU"

      }

  }

  input_stream_handler {

    input_stream_handler: "SyncSetInputStreamHandler",

    options {

      [mediapipe.SyncSetInputStreamHandlerOptions.ext] {

        sync_set {

          tag_index: "LOOPBACK:0"

        }

      }

    }

  }

}

```

Change `device: "GPU"` to `"CPU"` or `"NPU"` if needed.

**4. Create `chat_template.jinja`** for chat models (example for Qwen):

```jinja

{%- for message in messages -%}

    {%- if message['role'] == 'system' -%}

        {{- '<|im_start|>system\n' + message['content'] + '<|im_end|>\n' -}}

    {%- elif message['role'] == 'user' -%}

        {{- '<|im_start|>user\n' + message['content'] + '<|im_end|>\n' -}}

    {%- elif message['role'] == 'assistant' -%}

        {{- '<|im_start|>assistant\n' + message['content'] + '<|im_end|>\n' -}}

    {%- endif -%}

{%- endfor -%}

{%- if add_generation_prompt -%}

    {{- '<|im_start|>assistant\n' -}}

{%- endif -%}

```

### Final Model Structure

```

models/

└── Qwen2.5-Coder-7B-Instruct-int4-ov/

    ├── graph.pbtxt              # OVMS graph config

    ├── chat_template.jinja      # Chat format template

    └── 1/                       # Version folder (required!)

        ├── config.json

        ├── generation_config.json

        ├── openvino_model.xml

        ├── openvino_model.bin

        ├── openvino_tokenizer.xml

        ├── openvino_tokenizer.bin

        ├── openvino_detokenizer.xml

        ├── openvino_detokenizer.bin

        ├── tokenizer.json

        ├── tokenizer_config.json

        └── ...

```

### Add to config_all.json

```json

{

    "model_config_list": [

        {

            "config": {

                "name": "Qwen2.5-Coder-7B-Instruct-int4-ov",

                "base_path": "Qwen2.5-Coder-7B-Instruct-int4-ov"

            }

        }

    ]

}

```

### Important Notes

- **File encoding:** `graph.pbtxt` must be UTF-8 **without BOM**. PowerShell's `Out-File` adds BOM — use `[System.IO.File]::WriteAllText($path, $content, [System.Text.UTF8Encoding]::new($false))` instead.

- **Context length:** Models with small context (2K tokens like GPT-J) don't work well with Continue. Use models with 4K+ context.

- **Vision models:** May require newer OpenVINO version.

## Links

- [OpenVINO Model Server](https://docs.openvino.ai/2025/model-server/ovms_what_is_openvino_model_server.html)

- [Continue Documentation](https://docs.continue.dev/)

- [OpenVINO models](https://huggingface.co/OpenVINO)

## Authors

This project was created for fun by [Oleksandr Lopatnov](https://www.linkedin.com/in/lopatnov/). 

[continue]: https://marketplace.visualstudio.com/items?itemName=Continue.continue

[one-api]: https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html

[deep-learning]: https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html?packages=dl-essentials&dl-essentials-os=linux&dl-lin=offline

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lopatnov/ovms-continue

Awesome Lists containing this project

README