Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mrseanryan/gpt-local
Local GPT (llama 2 or dolly or gpt etc.) via Python - using ctransforers project
https://github.com/mrseanryan/gpt-local
ai gpt llm python transformers
Last synced: 7 days ago
JSON representation
Local GPT (llama 2 or dolly or gpt etc.) via Python - using ctransforers project
- Host: GitHub
- URL: https://github.com/mrseanryan/gpt-local
- Owner: mrseanryan
- License: mit
- Created: 2023-08-02T13:54:08.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2023-08-09T17:18:52.000Z (over 1 year ago)
- Last Synced: 2024-04-18T01:53:37.803Z (7 months ago)
- Topics: ai, gpt, llm, python, transformers
- Language: Python
- Homepage:
- Size: 9.77 KB
- Stars: 2
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# gpt-local README
Local LLM for GPT (llama 2 or dolly or gpt etc.) via Python - using the excellent ctransformers project.
This is basically a wrapper for quickly setting up a local LLM.
## What is GPT ?
*Generative Pre-trained Transformers, commonly known as GPT, are a family of neural network models that uses the transformer architecture and is a key advancement in artificial intelligence (AI) powering generative AI applications such as ChatGPT. GPT models give applications the ability to create human-like text and content (images, music, and more), and answer questions in a conversational manner.*
## Usage
```
./go.sh
```OR
```
python3 local-llm.py
```Example Output:
```
AI model: /home/sean/Downloads/models/llama-2-13b-chat.ggmlv3.q4_0.bin [llama]
>> If Mary is faster than Joe and Sam is slower than Mary, then who is the fastest?The answer is Mary.
How can I help you? [press ENTER to exit] >>
```## Test
1. Download a compatible model. To know what model types are supported, see the [ctransformers](https://github.com/marella/ctransformers) project.
Quality models are available at hugging face - see [TheBloke](https://huggingface.co/TheBloke).
Example: https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin
OR via bash:
```
./download-llama-2-13B-model.sh
```2. Load the model and send it a prompt
note: This test assumes that the model is located under ~/Downloads/model.
```
./test.sh
```Output:
```
AI model: /home/sean/Downloads/models/llama-2-13b-chat.ggmlv3.q4_0.bin [llama]
>> If Mary is faster than Joe and Sam is slower than Mary, then who is the fastest?The answer is Mary.
How can I help you? [press ENTER to exit] >>
```## Dependencies
- Python 3
- pip3
- OS: Unix - Tested on Ubuntu```
pip install ctransformers
```## Using GPU with the local model
If you have an NVIDIA graphics card, then you can run part or all of the model (depending on the card's RAM) on the GPU,
which has a much higher level of parallelism than the typical CPU.Required:
- latest NVIDIA graphic driver
- up to date version of CUDA**Tip: if you get errors when running the model, like this:**
```
>> Cuda error: no kernel image is available for execution on the device
```THEN recommend to build ctransformers locally.
This is actually quite simple:
```
pip3 uninstall ctransformers
pip3 install ctransformers --no-binary ctransformers # use --no-binary to force a local build. This ensures that the local version of CUDA and NVIDIA graphics driver will be used.
```You can tweak the settings in `config.py`.
For more details, see the main tool [ctransformers](https://github.com/marella/ctransformers).
## Alternatives to ctransformers
### [Alternative 1] (more complicated to setup)(has nice web user interface) Python web UI via pytorch and text-generation-webui
These are some rough notes, taken from [YouTube](https://www.youtube.com/watch?v=k2FHUP0krqg&ab_channel=MatthewBerman) - thanks to the guy who made that video! See also the [gist](https://gist.github.com/mberman84/45545e48040ef6aafb6a1cb3442edb83).
1. install conda (package manager)
2. use conda to install:```
conda create -n textgen python=3.10.9
conda activate textgen
```3. install pytorch:
```
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
python server.py
```
4. Download a modelExample: https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML
5. In the web UI: refresh the model list
6. In the web UI: load the model
7. In the web UI: switch to chat mode
### [Alternative 2] C++ (better performance, harder to customize)
https://replicate.com/blog/run-llama-locally
note: make sure you pick the correct script for your OS!
# References
- [ctransformers](https://github.com/marella/ctransformers)