{"id":34713900,"url":"https://github.com/lopatnov/ovms-continue","last_synced_at":"2026-05-27T09:34:05.629Z","repository":{"id":329455629,"uuid":"1119668766","full_name":"lopatnov/ovms-continue","owner":"lopatnov","description":"OpenVINO Model Server for Continue VSCode Extension","archived":false,"fork":false,"pushed_at":"2025-12-19T19:41:57.000Z","size":80026,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-12-22T07:18:45.823Z","etag":null,"topics":["ai-agents","continue","local-llm","openai","openvino","ovms","qwen","visual-studio-code"],"latest_commit_sha":null,"homepage":"","language":"Rich Text Format","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lopatnov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-19T16:39:56.000Z","updated_at":"2025-12-19T19:42:01.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/lopatnov/ovms-continue","commit_stats":null,"previous_names":["lopatnov/ovms-continue"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/lopatnov/ovms-continue","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lopatnov%2Fovms-continue","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lopatnov%2Fovms-continue/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lopatnov%2Fovms-continue/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lopatnov%2Fovms-continue/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lopatnov","download_url":"https://codeload.github.com/lopatnov/ovms-continue/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lopatnov%2Fovms-continue/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33560727,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-27T02:00:06.184Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","continue","local-llm","openai","openvino","ovms","qwen","visual-studio-code"],"created_at":"2025-12-25T00:52:57.122Z","updated_at":"2026-05-27T09:34:05.622Z","avatar_url":"https://github.com/lopatnov.png","language":"Rich Text Format","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OpenVINO Model Server for Continue VSCode Extension\n\nLocal AI models for code completion and chat in VS Code via [Continue Extension][continue]. This project can run various models, but the Qwen2.5-Coder model has shown good results experimentally. Server is precompiled for Windows.\n\n## The hardware and software on which the system was run\n\n- Laptop: Acer Swift Go 14 (iGPU Intel Arc, 18Gb GPU, 32Gb RAM)\n- Software: Windows 11, [Intel® oneAPI Base Toolkit 2025.3.0][one-api], [Intel® Deep Learning Essentials 2025.3.0][deep-learning] (not sure that it's required)\n- Folder: `C:\\ai-projects\\ovms-continue` (should work from any folder)\n\n## Current models\n\n| Model  | Size   | Speed    | Goal       | Pushed to GitHub |\n|--------|--------|----------|------------|------------------|\n| Qwen2.5-Coder-14B | ~8GB | ⚡ | Best quality | No, too huge |\n| Qwen2.5-Coder-7B | ~4GB | ⚡⚡ | Balance | No, too huge |\n| Qwen2.5-Coder-3B | ~2GB | ⚡⚡⚡ | Quick | Yes |\n| Qwen2.5-Coder-1.5B | ~1GB | ⚡⚡⚡⚡ | Autocomplete | Yes |\n| Qwen2.5-Coder-0.5B | ~300MB | ⚡⚡⚡⚡⚡ | Minimal resources | Yes |\n\n## Quick start\n\n### 1. Configure models\n\nEdit `config_all.json` in `models` folder to configure models that should be loaded.\n\nFor example:\n\n```json\n{\n    \"model_config_list\": [\n        {\n            \"config\": {\n                \"name\": \"Qwen2.5-Coder-0.5B-Instruct-int4-ov\",\n                \"base_path\": \"Qwen2.5-Coder-0.5B-Instruct-int4-ov\"\n            }\n        },\n        {\n            \"config\": {\n                \"name\": \"Qwen2.5-Coder-1.5B-Instruct-int4-ov\",\n                \"base_path\": \"Qwen2.5-Coder-1.5B-Instruct-int4-ov\"\n            }\n        },\n        {\n            \"config\": {\n                \"name\": \"Qwen2.5-Coder-3B-Instruct-int4-ov\",\n                \"base_path\": \"Qwen2.5-Coder-3B-Instruct-int4-ov\"\n            }\n        },\n        {\n            \"config\": {\n                \"name\": \"Qwen2.5-Coder-7B-Instruct-int4-ov\",\n                \"base_path\": \"Qwen2.5-Coder-7B-Instruct-int4-ov\"\n            }\n        }\n        {\n            \"config\": {\n                \"name\": \"Qwen2.5-Coder-14B-Instruct-int4-ov\",\n                \"base_path\": \"Qwen2.5-Coder-14B-Instruct-int4-ov\"\n            }\n        }\n    ]\n}\n```\n\nI not recommend to use many models. More models requires more memory.\n\n### 2. Start OVMS\n\n```cmd\ncd C:\\ai-projects\\ovms-continue\nstart-ovms.bat\n```\n\n### 3. Check\n\nOpen in browser: http://localhost:8000/v3/models\n\n### 4. Configure VS Code\n\n[Continue][continue] will automatically connect to OVMS (port 8000).\n\nEdit configuration file `C:\\Users\\\u003cUsername\u003e\\.continue\\config.yaml` like so:\n\n```yaml\nmodels:\n  - name: Qwen2.5-Coder-3B (GPU)\n    provider: openai\n    model: Qwen2.5-Coder-3B-Instruct-int4-ov\n    apiKey: unused\n    apiBase: http://localhost:8000/v3\n    roles:\n      - chat\n      - edit\n      - apply\n\n  - name: Qwen2.5-Coder-1.5B (GPU)\n    provider: openai\n    model: Qwen2.5-Coder-1.5B-Instruct-int4-ov\n    apiKey: unused\n    apiBase: http://localhost:8000/v3\n    roles:\n      - autocomplete\n```\n\nSet your local models in Continue extension:\n\n![Continue Models](./continue-config.jpg)\n\n## Check that it's working\n\n![Continue works](./continue-work.jpg)\n\n---\n---\n\n## Adding New Models from HuggingFace\n\n### Download Model\n\n```cmd\ncd .\\models\ngit clone https://huggingface.co/OpenVINO/Qwen2.5-Coder-7B-Instruct-int4-ov\n```\n\n### Prepare Model Structure\n\nOVMS requires specific folder structure. After downloading:\n\n**1. Create version subfolder `1/`:**\n\n```cmd\ncd Qwen2.5-Coder-7B-Instruct-int4-ov\nmkdir 1\n```\n\n**2. Move model files into `1/`:**\n\n```cmd\nmove *.json 1\\\nmove *.bin 1\\\nmove *.xml 1\\\nmove *.txt 1\\\nmove *.model 1\\\n```\n\n**3. Create `graph.pbtxt`** in model root folder (not in `1/`):\n\n```protobuf\ninput_stream: \"HTTP_REQUEST_PAYLOAD:input\"\noutput_stream: \"HTTP_RESPONSE_PAYLOAD:output\"\n\nnode: {\n  name: \"LLMExecutor\"\n  calculator: \"HttpLLMCalculator\"\n  input_stream: \"LOOPBACK:loopback\"\n  input_stream: \"HTTP_REQUEST_PAYLOAD:input\"\n  input_side_packet: \"LLM_NODE_RESOURCES:llm\"\n  output_stream: \"LOOPBACK:loopback\"\n  output_stream: \"HTTP_RESPONSE_PAYLOAD:output\"\n  input_stream_info: {\n    tag_index: 'LOOPBACK:0',\n    back_edge: true\n  }\n  node_options: {\n      [type.googleapis.com / mediapipe.LLMCalculatorOptions]: {\n          models_path: \"./1\",\n          cache_size: 4,\n          max_num_seqs: 256,\n          dynamic_split_fuse: true,\n          device: \"GPU\"\n      }\n  }\n  input_stream_handler {\n    input_stream_handler: \"SyncSetInputStreamHandler\",\n    options {\n      [mediapipe.SyncSetInputStreamHandlerOptions.ext] {\n        sync_set {\n          tag_index: \"LOOPBACK:0\"\n        }\n      }\n    }\n  }\n}\n```\n\nChange `device: \"GPU\"` to `\"CPU\"` or `\"NPU\"` if needed.\n\n**4. Create `chat_template.jinja`** for chat models (example for Qwen):\n\n```jinja\n{%- for message in messages -%}\n    {%- if message['role'] == 'system' -%}\n        {{- '\u003c|im_start|\u003esystem\\n' + message['content'] + '\u003c|im_end|\u003e\\n' -}}\n    {%- elif message['role'] == 'user' -%}\n        {{- '\u003c|im_start|\u003euser\\n' + message['content'] + '\u003c|im_end|\u003e\\n' -}}\n    {%- elif message['role'] == 'assistant' -%}\n        {{- '\u003c|im_start|\u003eassistant\\n' + message['content'] + '\u003c|im_end|\u003e\\n' -}}\n    {%- endif -%}\n{%- endfor -%}\n{%- if add_generation_prompt -%}\n    {{- '\u003c|im_start|\u003eassistant\\n' -}}\n{%- endif -%}\n```\n\n### Final Model Structure\n\n```\nmodels/\n└── Qwen2.5-Coder-7B-Instruct-int4-ov/\n    ├── graph.pbtxt              # OVMS graph config\n    ├── chat_template.jinja      # Chat format template\n    └── 1/                       # Version folder (required!)\n        ├── config.json\n        ├── generation_config.json\n        ├── openvino_model.xml\n        ├── openvino_model.bin\n        ├── openvino_tokenizer.xml\n        ├── openvino_tokenizer.bin\n        ├── openvino_detokenizer.xml\n        ├── openvino_detokenizer.bin\n        ├── tokenizer.json\n        ├── tokenizer_config.json\n        └── ...\n```\n\n### Add to config_all.json\n\n```json\n{\n    \"model_config_list\": [\n        {\n            \"config\": {\n                \"name\": \"Qwen2.5-Coder-7B-Instruct-int4-ov\",\n                \"base_path\": \"Qwen2.5-Coder-7B-Instruct-int4-ov\"\n            }\n        }\n    ]\n}\n```\n\n### Important Notes\n\n- **File encoding:** `graph.pbtxt` must be UTF-8 **without BOM**. PowerShell's `Out-File` adds BOM — use `[System.IO.File]::WriteAllText($path, $content, [System.Text.UTF8Encoding]::new($false))` instead.\n- **Context length:** Models with small context (2K tokens like GPT-J) don't work well with Continue. Use models with 4K+ context.\n- **Vision models:** May require newer OpenVINO version.\n\n## Links\n\n- [OpenVINO Model Server](https://docs.openvino.ai/2025/model-server/ovms_what_is_openvino_model_server.html)\n- [Continue Documentation](https://docs.continue.dev/)\n- [OpenVINO models](https://huggingface.co/OpenVINO)\n\n## Authors\n\nThis project was created for fun by [Oleksandr Lopatnov](https://www.linkedin.com/in/lopatnov/). \n\n[continue]: https://marketplace.visualstudio.com/items?itemName=Continue.continue\n[one-api]: https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html\n[deep-learning]: https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html?packages=dl-essentials\u0026dl-essentials-os=linux\u0026dl-lin=offline\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flopatnov%2Fovms-continue","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flopatnov%2Fovms-continue","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flopatnov%2Fovms-continue/lists"}