{"id":13596843,"url":"https://github.com/huggingface/llm-vscode","last_synced_at":"2025-05-15T23:06:39.258Z","repository":{"id":65977303,"uuid":"602505577","full_name":"huggingface/llm-vscode","owner":"huggingface","description":"LLM powered development for VSCode","archived":false,"fork":false,"pushed_at":"2024-07-17T19:53:38.000Z","size":286,"stargazers_count":1296,"open_issues_count":26,"forks_count":137,"subscribers_count":19,"default_branch":"master","last_synced_at":"2025-05-08T05:39:44.614Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"codota/tabnine-vscode","license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/huggingface.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-16T11:00:36.000Z","updated_at":"2025-05-05T06:55:46.000Z","dependencies_parsed_at":"2024-12-27T02:03:55.495Z","dependency_job_id":"2705265d-a662-4df7-9659-8c146cd462e8","html_url":"https://github.com/huggingface/llm-vscode","commit_stats":{"total_commits":25,"total_committers":6,"mean_commits":4.166666666666667,"dds":0.28,"last_synced_commit":"06612965042c6c51888b42fffa86ccc8ad7be572"},"previous_names":["huggingface/llm-vscode"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fllm-vscode","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fllm-vscode/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fllm-vscode/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fllm-vscode/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/huggingface","download_url":"https://codeload.github.com/huggingface/llm-vscode/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254436944,"owners_count":22070946,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T16:02:51.564Z","updated_at":"2025-05-15T23:06:34.232Z","avatar_url":"https://github.com/huggingface.png","language":"TypeScript","readme":"# LLM powered development for VSCode\n\n**llm-vscode** is an extension for all things LLM. It uses [**llm-ls**](https://github.com/huggingface/llm-ls) as its backend.\n\nWe also have extensions for:\n* [neovim](https://github.com/huggingface/llm.nvim)\n* [jupyter](https://github.com/bigcode-project/jupytercoder)\n* [intellij](https://github.com/huggingface/llm-intellij)\n\nPreviously **huggingface-vscode**.\n\n\u003e [!NOTE]\n\u003e When using the Inference API, you will probably encounter some limitations. Subscribe to the *PRO* plan to avoid getting rate limited in the free tier.\n\u003e\n\u003e https://huggingface.co/pricing#pro\n\n## Features\n\n### Code completion\n\nThis plugin supports \"ghost-text\" code completion, à la Copilot.\n\n### Choose your model\n\nRequests for code generation are made via an HTTP request.\n\nYou can use the Hugging Face [Inference API](https://huggingface.co/inference-api) or your own HTTP endpoint, provided it adheres to the APIs listed in [backend](#backend).\n\nThe list of officially supported models is located in the config template section.\n\n### Always fit within the context window\n\nThe prompt sent to the model will always be sized to fit within the context window, with the number of tokens determined using [tokenizers](https://github.com/huggingface/tokenizers).\n\n### Code attribution\n\nHit `Cmd+shift+a` to check if the generated code is in [The Stack](https://huggingface.co/datasets/bigcode/the-stack).\nThis is a rapid first-pass attribution check using [stack.dataportraits.org](https://stack.dataportraits.org).\nWe check for sequences of at least 50 characters that match a Bloom filter.\nThis means false positives are possible and long enough surrounding context is necesssary (see the [paper](https://dataportraits.org/) for details on n-gram striding and sequence length).\n[The dedicated Stack search tool](https://hf.co/spaces/bigcode/search) is a full dataset index and can be used for a complete second pass. \n\n## Installation\n\nInstall like any other [vscode extension](https://marketplace.visualstudio.com/items?itemName=HuggingFace.huggingface-vscode).\n\nBy default, this extension uses [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) \u0026 [Hugging Face Inference API](https://huggingface.co/inference-api) for the inference.\n\n#### HF API token\n\nYou can supply your HF API token ([hf.co/settings/token](https://hf.co/settings/token)) with this command:\n1. `Cmd/Ctrl+Shift+P` to open VSCode command palette\n2. Type: `Llm: Login`\n\nIf you previously logged in with `huggingface-cli login` on your system the extension will read the token from disk.\n\n## Configuration\n\nYou can check the full list of configuration settings by opening your settings page (`cmd+,`) and typing `Llm`.\n\n### Backend\n\nYou can configure the backend to which requests will be sent. **llm-vscode** supports the following backends:\n- `huggingface`: The Hugging Face Inference API (default)\n- `ollama`: [Ollama](https://ollama.com)\n- `openai`: any OpenAI compatible API (e.g. [llama-cpp-python](https://github.com/abetlen/llama-cpp-python))\n- `tgi`: [Text Generation Inference](https://github.com/huggingface/text-generation-inference)\n\nLet's say your current code is this:\n```py\nimport numpy as np\nimport scipy as sp\n{YOUR_CURSOR_POSITION}\ndef hello_world():\n    print(\"Hello world\")\n```\n\nThe request body will then look like:\n```js\nconst inputs = `{start token}import numpy as np\\nimport scipy as sp\\n{end token}def hello_world():\\n    print(\"Hello world\"){middle token}`\nconst data = { inputs, ...configuration.requestBody };\n\nconst model = configuration.modelId;\nlet endpoint;\nswitch(configuration.backend) {\n    // cf URL construction\n    let endpoint = build_url(configuration);\n}\n\nconst res = await fetch(endpoint, {\n    body: JSON.stringify(data),\n    headers,\n    method: \"POST\"\n});\n\nconst json = await res.json() as { generated_text: string };\n```\n\nNote that the example above is a simplified version to explain what is happening under the hood.\n\n#### URL construction\n\nThe endpoint URL that is queried to fetch suggestions is build the following way:\n- depending on the backend, it will try to append the correct path to the base URL located in the configuration (e.g. `{url}/v1/completions` for the `openai` backend)\n- if no URL is set for the `huggingface` backend, it will automatically use the default URL\n  - it will error for other backends as there is no sensible default URL\n- if you do set the **correct** path at the end of the URL it will not add it a second time as it checks if it is already present\n- there is an option to disable this behavior: `llm.disableUrlPathCompletion`\n\n### Suggestion behavior\n\nYou can tune the way the suggestions behave:\n- `llm.enableAutoSuggest` lets you choose to enable or disable \"suggest-as-you-type\" suggestions.\n- `llm.documentFilter` lets you enable suggestions only on specific files that match the pattern matching syntax you will provide. The object must be of type [`DocumentFilter | DocumentFilter[]`](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#documentFilter):\n  - to match on all types of buffers: `llm.documentFilter: { pattern: \"**\" }`\n  - to match on all files in `my_project/`: `llm.documentFilter: { pattern: \"/path/to/my_project/**\" }`\n  - to match on all python and rust files: `llm.documentFilter: { pattern: \"**/*.{py,rs}\" }`\n\n### Keybindings\n\n**llm-vscode** sets two keybindings:\n* you can trigger suggestions with `Cmd+shift+l` by default, which corresponds to the `editor.action.inlineSuggest.trigger` command\n* [code attribution](#code-attribution) is set to `Cmd+shift+a` by default, which corresponds to the `llm.attribution` command\n\n### [**llm-ls**](https://github.com/huggingface/llm-ls)\n\nBy default, **llm-ls** is bundled with the extension. When developing locally or if you built your own binary because your platform is not supported, you can set the `llm.lsp.binaryPath` setting to the path of the binary.\n\n### Tokenizer\n\n**llm-ls** uses [**tokenizers**](https://github.com/huggingface/tokenizers) to make sure the prompt fits the `context_window`.\n\nTo configure it, you have a few options:\n* No tokenization, **llm-ls** will count the number of characters instead:\n```json\n{\n  \"llm.tokenizer\": null\n}\n```\n* from a local file on your disk:\n```json\n{\n  \"llm.tokenizer\": {\n    \"path\": \"/path/to/my/tokenizer.json\"\n  }\n}\n```\n* from a Hugging Face repository, **llm-ls** will attempt to download `tokenizer.json` at the root of the repository:\n```json\n{\n  \"llm.tokenizer\": {\n    \"repository\": \"myusername/myrepo\",\n    \"api_token\": null,\n  }\n}\n```\nNote: when `api_token` is set to null, it will use the token you set with `Llm: Login` command. If you want to use a different token, you can set it here.\n\n* from an HTTP endpoint, **llm-ls** will attempt to download a file via an HTTP GET request:\n```json\n{\n  \"llm.tokenizer\": {\n    \"url\": \"https://my-endpoint.example.com/mytokenizer.json\",\n    \"to\": \"/download/path/of/mytokenizer.json\"\n  }\n}\n```\n\n### Code Llama\n\nTo test Code Llama 13B model:\n1. Make sure you have the [latest version of this extension](#installing).\n2. Make sure you have [supplied HF API token](#hf-api-token)\n3. Open Vscode Settings (`cmd+,`) \u0026 type: `Llm: Config Template`\n4. From the dropdown menu, choose `hf/codellama/CodeLlama-13b-hf`\n\nRead more [here](https://huggingface.co/blog/codellama) about Code LLama.\n\n### Phind and WizardCoder\n\nTo test [Phind/Phind-CodeLlama-34B-v2](https://hf.co/Phind/Phind-CodeLlama-34B-v2) and/or [WizardLM/WizardCoder-Python-34B-V1.0](https://hf.co/WizardLM/WizardCoder-Python-34B-V1.0) :\n1. Make sure you have the [latest version of this extension](#installing).\n2. Make sure you have [supplied HF API token](#hf-api-token)\n3. Open Vscode Settings (`cmd+,`) \u0026 type: `Llm: Config Template`\n4. From the dropdown menu, choose `hf/Phind/Phind-CodeLlama-34B-v2` or `hf/WizardLM/WizardCoder-Python-34B-V1.0`\n\nRead more about Phind-CodeLlama-34B-v2 [here](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2) and WizardCoder-15B-V1.0 [here](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0).\n\n## Developing\n\n1. Clone `llm-ls`: `git clone https://github.com/huggingface/llm-ls`\n2. Build `llm-ls`: `cd llm-ls \u0026\u0026 cargo build` (you can also use `cargo build --release` for a release build)\n3. Clone this repo: `git clone https://github.com/huggingface/llm-vscode`\n4. Install deps: `cd llm-vscode \u0026\u0026 npm ci`\n5. In vscode, open `Run and Debug` side bar \u0026 click `Launch Extension`\n6. In the new vscode window, set the `llm.lsp.binaryPath` setting to the path of the `llm-ls` binary you built in step 2 (e.g. `/path/to/llm-ls/target/debug/llm-ls`)\n7. Close the window and restart the extension with `F5` or like in `5.`\n\n## Community\n\n| Repository | Description |\n| --- | --- |\n| [huggingface-vscode-endpoint-server](https://github.com/LucienShui/huggingface-vscode-endpoint-server) | Custom code generation endpoint for this repository |\n| [llm-vscode-inference-server](https://github.com/wangcx18/llm-vscode-inference-server) | An endpoint server for efficiently serving quantized open-source LLMs for code. |\n","funding_links":[],"categories":["TypeScript","others","\u003ca name=\"TypeScript\"\u003e\u003c/a\u003eTypeScript"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuggingface%2Fllm-vscode","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhuggingface%2Fllm-vscode","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuggingface%2Fllm-vscode/lists"}