Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ravenscroftj/turbopilot
Turbopilot is an open source large-language-model based code completion engine that runs locally on CPU
https://github.com/ravenscroftj/turbopilot
code-completion cpp language-model machine-learning
Last synced: 3 months ago
JSON representation
Turbopilot is an open source large-language-model based code completion engine that runs locally on CPU
- Host: GitHub
- URL: https://github.com/ravenscroftj/turbopilot
- Owner: ravenscroftj
- License: bsd-3-clause
- Archived: true
- Created: 2023-04-09T16:46:33.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-09-30T08:16:59.000Z (about 1 year ago)
- Last Synced: 2024-09-21T17:12:10.709Z (3 months ago)
- Topics: code-completion, cpp, language-model, machine-learning
- Language: C++
- Homepage:
- Size: 2.54 MB
- Stars: 3,832
- Watchers: 43
- Forks: 127
- Open Issues: 18
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.md
Awesome Lists containing this project
- awesome-generative-ai - TurboPilot - A self-hosted copilot clone which uses the library behind llama.cpp to run the 6 billion parameter Salesforce Codegen model in 4 GB of RAM. (Coding / Coding Assistants)
- my-awesome - ravenscroftj/turbopilot - completion,cpp,language-model,machine-learning pushed_at:2023-09 star:3.8k fork:0.1k Turbopilot is an open source large-language-model based code completion engine that runs locally on CPU (C++)
- Self-Hosting-Guide - Turbopilot - language-model based code completion engine that runs locally on your CPU. (Tools for Self-Hosting / Development)
- awesome-ai-coding - TurboPilot
- awesome-coding-assistants - TurboPilot - hosted) (Uncategorized / Uncategorized)
- stars - ravenscroftj/turbopilot - Turbopilot is an open source large-language-model based code completion engine that runs locally on CPU (C++)
- awesome-ai-tools - TurboPilot - A self-hosted copilot clone that uses the library behind llama.cpp to run the 6 billion parameter Salesforce Codegen model in 4 GB of RAM. (Code / Developer tools)
- awesome-privacy - Turbopilot - [Archived] Turbopilot is an open source large-language-model based code completion engine that runs locally on CPU. (Artificial Intelligence / Android Launcher)
- fucking-awesome-privacy - Turbopilot - [Archived] Turbopilot is an open source large-language-model based code completion engine that runs locally on CPU. (Artificial Intelligence / Android Launcher)
- awesome-generative-ai - TurboPilot - A self-hosted copilot clone which uses the library behind llama.cpp to run the 6 billion parameter Salesforce Codegen model in 4 GB of RAM. (Coding / Coding Assistants)
- awesome-ai - TurboPilot - A self-hosted copilot clone which uses the library behind llama.cpp to run the 6 billion parameter Salesforce Codegen model in 4 GB of RAM. (Coding / Other text generators)
- awesome-AI-driven-development - turbopilot - an open source large-language-model based code completion engine that runs locally on CPU (Uncategorized / Uncategorized)
- AiTreasureBox - ravenscroftj/turbopilot - 12-07_3823_0](https://img.shields.io/github/stars/ravenscroftj/turbopilot.svg) |Turbopilot is an open source large-language-model based code completion engine that runs locally on CPU| (Repos)
- awesome-ai-dev-tools - TurboPilot - Self-hosted copilot clone using Salesforce Codegen model. (Specialized Tools / IDE Extensions)
- awesome-ai-tool - TurboPilot - 自托管的开源Copilot工具。 (🌟 编辑推荐 / 编程辅助工具)
- awesome-ai-tool - TurboPilot - 自托管的开源Copilot工具。 (🌟 编辑推荐 / 编程辅助工具)
README
# TurboPilot 🚀
## Turbopilot is deprecated/archived as of 30/9/23. There are other mature solutions that meet the community's needs better. Please read [my blog post](https://brainsteam.co.uk/posts/2023/09/30/turbopilot-obit/) about my decision to down tools and for recommended alternatives.
-----------------------------------
[![Mastodon Follow](https://img.shields.io/mastodon/follow/000117012?domain=https%3A%2F%2Ffosstodon.org%2F&style=social)](https://fosstodon.org/@jamesravey) ![BSD Licensed](https://img.shields.io/github/license/ravenscroftj/turbopilot) ![Time Spent](https://img.shields.io/endpoint?url=https://wakapi.nopro.be/api/compat/shields/v1/jamesravey/all_time/label%3Aturbopilot)
TurboPilot is a self-hosted [copilot](https://github.com/features/copilot) clone which uses the library behind [llama.cpp](https://github.com/ggerganov/llama.cpp) to run the [6 Billion Parameter Salesforce Codegen model](https://github.com/salesforce/CodeGen) in 4GiB of RAM. It is heavily based and inspired by on the [fauxpilot](https://github.com/fauxpilot/fauxpilot) project.
***NB: This is a proof of concept right now rather than a stable tool. Autocompletion is quite slow in this version of the project. Feel free to play with it, but your mileage may vary.***
![a screen recording of turbopilot running through fauxpilot plugin](assets/vscode-status.gif)
**✨ Now Supports [StableCode 3B Instruct](https://huggingface.co/stabilityai/stablecode-instruct-alpha-3b)** simply use [TheBloke's Quantized GGML models](https://huggingface.co/TheBloke/stablecode-instruct-alpha-3b-GGML) and set `-m stablecode`.
**✨ New: Refactored + Simplified**: The source code has been improved to make it easier to extend and add new models to Turbopilot. The system now supports multiple flavours of model
**✨ New: Wizardcoder, Starcoder, Santacoder support** - Turbopilot now supports state of the art local code completion models which provide more programming languages and "fill in the middle" support.
## 🤝 Contributing
PRs to this project and the corresponding [GGML fork](https://github.com/ravenscroftj/ggml) are very welcome.
Make a fork, make your changes and then open a [PR](https://github.com/ravenscroftj/turbopilot/pulls).
## 👋 Getting Started
The easiest way to try the project out is to grab the pre-processed models and then run the server in docker.
### Getting The Models
You have 2 options for getting the model
#### Option A: Direct Download - Easy, Quickstart
You can download the pre-converted, pre-quantized models from Huggingface.
For low RAM users (4-8 GiB), I recommend [StableCode](https://huggingface.co/TheBloke/stablecode-instruct-alpha-3b-GGML) and for high power users (16+ GiB RAM, discrete GPU or apple silicon) I recomnmend [WizardCoder](https://huggingface.co/TheBloke/WizardCoder-15B-1.0-GGML/resolve/main/WizardCoder-15B-1.0.ggmlv3.q4_0.bin).
Turbopilot still supports the first generation codegen models from `v0.0.5` and earlier builds. Although old models do need to be requantized.
You can find a full catalogue of models in [MODELS.md](MODELS.md).
#### Option B: Convert The Models Yourself - Hard, More Flexible
Follow [this guide](https://github.com/ravenscroftj/turbopilot/wiki/Converting-and-Quantizing-The-Models) if you want to experiment with quantizing the models yourself.
### ⚙️ Running TurboPilot Server
Download the [latest binary](https://github.com/ravenscroftj/turbopilot/releases) and extract it to the root project folder. If a binary is not provided for your OS or you'd prefer to build it yourself follow the [build instructions](BUILD.md)
Run:
```bash
./turbopilot -m starcoder -f ./models/santacoder-q4_0.bin
```The application should start a server on port `18080`, you can change this with the `-p` option but this is the default port that vscode-fauxpilot tries to connect to so you probably want to leave this alone unless you are sure you know what you're doing.
If you have a multi-core system you can control how many CPUs are used with the `-t` option - for example, on my AMD Ryzen 5000 which has 6 cores/12 threads I use:
```bash
./codegen-serve -t 6 -m starcoder -f ./models/santacoder-q4_0.bin
```To run the legacy codegen models. Just change the model type flag `-m` to `codegen` instead.
**NOTE: Turbopilot 0.1.0 and newer re-quantize your codegen models old models from v0.0.5 and older. I am working on providing updated quantized codegen models**
### 📦 Running From Docker
You can also run Turbopilot from the pre-built docker image supplied [here](https://github.com/users/ravenscroftj/packages/container/package/turbopilot)
You will still need to download the models separately, then you can run:
```bash
docker run --rm -it \
-v ./models:/models \
-e THREADS=6 \
-e MODEL_TYPE=starcoder \
-e MODEL="/models/santacoder-q4_0.bin" \
-p 18080:18080 \
ghcr.io/ravenscroftj/turbopilot:latest
```#### Docker and CUDA
As of release v0.0.5 turbocode now supports CUDA inference. In order to run the cuda-enabled container you will need to have [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) enabled, use the cuda tagged versions and pass in `--gpus=all` to docker with access to your GPU like so:
```bash
docker run --gpus=all --rm -it \
-v ./models:/models \
-e THREADS=6 \
-e MODEL_TYPE=starcoder \
-e MODEL="/models/santacoder-q4_0.bin" \
-e GPU_LAYERS=32 \
-p 18080:18080 \
ghcr.io/ravenscroftj/turbopilot:v0.2.0-cuda11-7
```If you have a big enough GPU then setting `GPU_LAYERS` will allow turbopilot to fully offload computation onto your GPU rather than copying data backwards and forwards, dramatically speeding up inference.
Swap `ghcr.io/ravenscroftj/turbopilot:v0.1.0-cuda11` for `ghcr.io/ravenscroftj/turbopilot:v0.2.0-cuda12-0` or `ghcr.io/ravenscroftj/turbopilot:v0.2.0-cuda12-2` if you are using CUDA 12.0 or 12.2 respectively.
You will need CUDA 11 or CUDA 12 later to run this container. You should be able to see `/app/turbopilot` listed when you run `nvidia-smi`.
#### Executable and CUDA
As of v0.0.5 a CUDA version of the linux executable is available - it requires that libcublas 11 be installed on the machine - I might build ubuntu debs at some point but for now running in docker may be more convenient if you want to use a CUDA GPU.
You can use GPU offloading via the `--ngl` option.
### 🌐 Using the API
#### Support for the official Copilot Plugin
Support for the official VS Code copilot plugin is underway (See ticket #11). The API should now be broadly compatible with OpenAI.
#### Using the API with FauxPilot Plugin
To use the API from VSCode, I recommend the vscode-fauxpilot plugin. Once you install it, you will need to change a few settings in your settings.json file.
- Open settings (CTRL/CMD + SHIFT + P) and select `Preferences: Open User Settings (JSON)`
- Add the following values:```json
{
... // other settings"fauxpilot.enabled": true,
"fauxpilot.server": "http://localhost:18080/v1/engines",
}
```Now you can enable fauxpilot with `CTRL + SHIFT + P` and select `Enable Fauxpilot`
The plugin will send API calls to the running `codegen-serve` process when you make a keystroke. It will then wait for each request to complete before sending further requests.
#### Calling the API Directly
You can make requests to `http://localhost:18080/v1/engines/codegen/completions` which will behave just like the same Copilot endpoint.
For example:
```bash
curl --request POST \
--url http://localhost:18080/v1/engines/codegen/completions \
--header 'Content-Type: application/json' \
--data '{
"model": "codegen",
"prompt": "def main():",
"max_tokens": 100
}'
```Should get you something like this:
```json
{
"choices": [
{
"logprobs": null,
"index": 0,
"finish_reason": "length",
"text": "\n \"\"\"Main entry point for this script.\"\"\"\n logging.getLogger().setLevel(logging.INFO)\n logging.basicConfig(format=('%(levelname)s: %(message)s'))\n\n parser = argparse.ArgumentParser(\n description=__doc__,\n formatter_class=argparse.RawDescriptionHelpFormatter,\n epilog=__doc__)\n "
}
],
"created": 1681113078,
"usage": {
"total_tokens": 105,
"prompt_tokens": 3,
"completion_tokens": 102
},
"object": "text_completion",
"model": "codegen",
"id": "01d7a11b-f87c-4261-8c03-8c78cbe4b067"
}
```## 👉 Known Limitations
- Currently Turbopilot only supports one GPU device at a time (it will not try to make use of multiple devices).
## 👏 Acknowledgements
- This project would not have been possible without [Georgi Gerganov's work on GGML and llama.cpp](https://github.com/ggerganov/ggml)
- It was completely inspired by [fauxpilot](https://github.com/fauxpilot/fauxpilot) which I did experiment with for a little while but wanted to try to make the models work without a GPU
- The frontend of the project is powered by [Venthe's vscode-fauxpilot plugin](https://github.com/Venthe/vscode-fauxpilot)
- The project uses the [Salesforce Codegen](https://github.com/salesforce/CodeGen) models.
- Thanks to [Moyix](https://huggingface.co/moyix) for his work on converting the Salesforce models to run in a GPT-J architecture. Not only does this [confer some speed benefits](https://gist.github.com/moyix/7896575befbe1b99162ccfec8d135566) but it also made it much easier for me to port the models to GGML using the [existing gpt-j example code](https://github.com/ggerganov/ggml/tree/master/examples/gpt-j)
- The model server uses [CrowCPP](https://crowcpp.org/master/) to serve suggestions.
- Check out the [original scientific paper](https://arxiv.org/pdf/2203.13474.pdf) for CodeGen for more info.