https://github.com/huggingface/llm-vscode

LLM powered development for VSCode
https://github.com/huggingface/llm-vscode
Last synced: about 2 months ago
JSON representation
LLM powered development for VSCode
Host: GitHub
URL: https://github.com/huggingface/llm-vscode
Owner: huggingface
License: apache-2.0
Created: 2023-02-16T11:00:36.000Z (over 2 years ago)
Default Branch: master
Last Pushed: 2024-07-17T19:53:38.000Z (12 months ago)
Last Synced: 2025-05-08T05:39:44.614Z (2 months ago)
Language: TypeScript
Homepage:
Size: 279 KB
Stars: 1,296
Watchers: 19
Forks: 137
Open Issues: 26
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

        # LLM powered development for VSCode

**llm-vscode** is an extension for all things LLM. It uses [**llm-ls**](https://github.com/huggingface/llm-ls) as its backend.

We also have extensions for:

* [neovim](https://github.com/huggingface/llm.nvim)

* [jupyter](https://github.com/bigcode-project/jupytercoder)

* [intellij](https://github.com/huggingface/llm-intellij)

Previously **huggingface-vscode**.

> [!NOTE]

> When using the Inference API, you will probably encounter some limitations. Subscribe to the *PRO* plan to avoid getting rate limited in the free tier.

>

> https://huggingface.co/pricing#pro

## Features

### Code completion

This plugin supports "ghost-text" code completion, à la Copilot.

### Choose your model

Requests for code generation are made via an HTTP request.

You can use the Hugging Face [Inference API](https://huggingface.co/inference-api) or your own HTTP endpoint, provided it adheres to the APIs listed in [backend](#backend).

The list of officially supported models is located in the config template section.

### Always fit within the context window

The prompt sent to the model will always be sized to fit within the context window, with the number of tokens determined using [tokenizers](https://github.com/huggingface/tokenizers).

### Code attribution

Hit `Cmd+shift+a` to check if the generated code is in [The Stack](https://huggingface.co/datasets/bigcode/the-stack).

This is a rapid first-pass attribution check using [stack.dataportraits.org](https://stack.dataportraits.org).

We check for sequences of at least 50 characters that match a Bloom filter.

This means false positives are possible and long enough surrounding context is necesssary (see the [paper](https://dataportraits.org/) for details on n-gram striding and sequence length).

[The dedicated Stack search tool](https://hf.co/spaces/bigcode/search) is a full dataset index and can be used for a complete second pass. 

## Installation

Install like any other [vscode extension](https://marketplace.visualstudio.com/items?itemName=HuggingFace.huggingface-vscode).

By default, this extension uses [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) & [Hugging Face Inference API](https://huggingface.co/inference-api) for the inference.

#### HF API token

You can supply your HF API token ([hf.co/settings/token](https://hf.co/settings/token)) with this command:

1. `Cmd/Ctrl+Shift+P` to open VSCode command palette

2. Type: `Llm: Login`

If you previously logged in with `huggingface-cli login` on your system the extension will read the token from disk.

## Configuration

You can check the full list of configuration settings by opening your settings page (`cmd+,`) and typing `Llm`.

### Backend

You can configure the backend to which requests will be sent. **llm-vscode** supports the following backends:

- `huggingface`: The Hugging Face Inference API (default)

- `ollama`: [Ollama](https://ollama.com)

- `openai`: any OpenAI compatible API (e.g. [llama-cpp-python](https://github.com/abetlen/llama-cpp-python))

- `tgi`: [Text Generation Inference](https://github.com/huggingface/text-generation-inference)

Let's say your current code is this:

```py

import numpy as np

import scipy as sp

{YOUR_CURSOR_POSITION}

def hello_world():

    print("Hello world")

```

The request body will then look like:

```js

const inputs = `{start token}import numpy as np\nimport scipy as sp\n{end token}def hello_world():\n    print("Hello world"){middle token}`

const data = { inputs, ...configuration.requestBody };

const model = configuration.modelId;

let endpoint;

switch(configuration.backend) {

    // cf URL construction

    let endpoint = build_url(configuration);

}

const res = await fetch(endpoint, {

    body: JSON.stringify(data),

    headers,

    method: "POST"

});

const json = await res.json() as { generated_text: string };

```

Note that the example above is a simplified version to explain what is happening under the hood.

#### URL construction

The endpoint URL that is queried to fetch suggestions is build the following way:

- depending on the backend, it will try to append the correct path to the base URL located in the configuration (e.g. `{url}/v1/completions` for the `openai` backend)

- if no URL is set for the `huggingface` backend, it will automatically use the default URL

  - it will error for other backends as there is no sensible default URL

- if you do set the **correct** path at the end of the URL it will not add it a second time as it checks if it is already present

- there is an option to disable this behavior: `llm.disableUrlPathCompletion`

### Suggestion behavior

You can tune the way the suggestions behave:

- `llm.enableAutoSuggest` lets you choose to enable or disable "suggest-as-you-type" suggestions.

- `llm.documentFilter` lets you enable suggestions only on specific files that match the pattern matching syntax you will provide. The object must be of type [`DocumentFilter | DocumentFilter[]`](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#documentFilter):

  - to match on all types of buffers: `llm.documentFilter: { pattern: "**" }`

  - to match on all files in `my_project/`: `llm.documentFilter: { pattern: "/path/to/my_project/**" }`

  - to match on all python and rust files: `llm.documentFilter: { pattern: "**/*.{py,rs}" }`

### Keybindings

**llm-vscode** sets two keybindings:

* you can trigger suggestions with `Cmd+shift+l` by default, which corresponds to the `editor.action.inlineSuggest.trigger` command

* [code attribution](#code-attribution) is set to `Cmd+shift+a` by default, which corresponds to the `llm.attribution` command

### [**llm-ls**](https://github.com/huggingface/llm-ls)

By default, **llm-ls** is bundled with the extension. When developing locally or if you built your own binary because your platform is not supported, you can set the `llm.lsp.binaryPath` setting to the path of the binary.

### Tokenizer

**llm-ls** uses [**tokenizers**](https://github.com/huggingface/tokenizers) to make sure the prompt fits the `context_window`.

To configure it, you have a few options:

* No tokenization, **llm-ls** will count the number of characters instead:

```json

{

  "llm.tokenizer": null

}

```

* from a local file on your disk:

```json

{

  "llm.tokenizer": {

    "path": "/path/to/my/tokenizer.json"

  }

}

```

* from a Hugging Face repository, **llm-ls** will attempt to download `tokenizer.json` at the root of the repository:

```json

{

  "llm.tokenizer": {

    "repository": "myusername/myrepo",

    "api_token": null,

  }

}

```

Note: when `api_token` is set to null, it will use the token you set with `Llm: Login` command. If you want to use a different token, you can set it here.

* from an HTTP endpoint, **llm-ls** will attempt to download a file via an HTTP GET request:

```json

{

  "llm.tokenizer": {

    "url": "https://my-endpoint.example.com/mytokenizer.json",

    "to": "/download/path/of/mytokenizer.json"

  }

}

```

### Code Llama

To test Code Llama 13B model:

1. Make sure you have the [latest version of this extension](#installing).

2. Make sure you have [supplied HF API token](#hf-api-token)

3. Open Vscode Settings (`cmd+,`) & type: `Llm: Config Template`

4. From the dropdown menu, choose `hf/codellama/CodeLlama-13b-hf`

Read more [here](https://huggingface.co/blog/codellama) about Code LLama.

### Phind and WizardCoder

To test [Phind/Phind-CodeLlama-34B-v2](https://hf.co/Phind/Phind-CodeLlama-34B-v2) and/or [WizardLM/WizardCoder-Python-34B-V1.0](https://hf.co/WizardLM/WizardCoder-Python-34B-V1.0) :

1. Make sure you have the [latest version of this extension](#installing).

2. Make sure you have [supplied HF API token](#hf-api-token)

3. Open Vscode Settings (`cmd+,`) & type: `Llm: Config Template`

4. From the dropdown menu, choose `hf/Phind/Phind-CodeLlama-34B-v2` or `hf/WizardLM/WizardCoder-Python-34B-V1.0`

Read more about Phind-CodeLlama-34B-v2 [here](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2) and WizardCoder-15B-V1.0 [here](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0).

## Developing

1. Clone `llm-ls`: `git clone https://github.com/huggingface/llm-ls`

2. Build `llm-ls`: `cd llm-ls && cargo build` (you can also use `cargo build --release` for a release build)

3. Clone this repo: `git clone https://github.com/huggingface/llm-vscode`

4. Install deps: `cd llm-vscode && npm ci`

5. In vscode, open `Run and Debug` side bar & click `Launch Extension`

6. In the new vscode window, set the `llm.lsp.binaryPath` setting to the path of the `llm-ls` binary you built in step 2 (e.g. `/path/to/llm-ls/target/debug/llm-ls`)

7. Close the window and restart the extension with `F5` or like in `5.`

## Community

| Repository | Description |

| --- | --- |

| [huggingface-vscode-endpoint-server](https://github.com/LucienShui/huggingface-vscode-endpoint-server) | Custom code generation endpoint for this repository |

| [llm-vscode-inference-server](https://github.com/wangcx18/llm-vscode-inference-server) | An endpoint server for efficiently serving quantized open-source LLMs for code. |
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/huggingface/llm-vscode

Awesome Lists containing this project

README