https://github.com/koji/llm_api_template
API template for LLM model with llama.cpp
https://github.com/koji/llm_api_template
googlecolab llamacpp llm python
Last synced: about 1 year ago
JSON representation
API template for LLM model with llama.cpp
- Host: GitHub
- URL: https://github.com/koji/llm_api_template
- Owner: koji
- Created: 2023-11-06T00:36:54.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-06T01:06:02.000Z (over 2 years ago)
- Last Synced: 2025-03-21T19:04:15.737Z (about 1 year ago)
- Topics: googlecolab, llamacpp, llm, python
- Language: Jupyter Notebook
- Homepage:
- Size: 9.77 KB
- Stars: 4
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LLM API Template
## What is this?
This is a jupyter notebook that uses LLMs via endpoints. This is using [llama.cpp](https://github.com/ggerganov/llama.cpp), [ngrok](https://ngrok.com/), and a model from [TheBloke](https://huggingface.co/TheBloke)
The base jupyter notebook uses [zephyr-7b](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF).
## How to use
[Requiremetns]
- Google account for Google colab https://colab.google/
- ngrok account https://ngrok.com
### step0. create the above accounts
### step1. Copy the jupyter notebook
### step2. Create a secreat key
There is the key icon in Google colab's sidebar. You will need to add your token as a secret key. In the jupyter notebook, I named `NGROK`. You can change that into anything you want.
### step3. Run the jupyter notebook
After setting a new secreat key, you can run the jupyter notebook to run the API server.
### step4. Check the API server
If everything works properly, you can acces https://ngrok_address/docs and you will see something like 👇
You can see the all available endpoints.

**Do not run the last two lines**
The first one is to kill FastAPI server and the second one is to kill ngrok.
```shell
!pkill uvicorn
!pkill ngrok
```
### step5. Call the endpoint
Now you can call the endpoint via any language you like. In this repo, I put a python code as a sample.
What you need to do is to change the url.

## How to use a different model?
If you want to use a different model such as llama2-7b, mistral-7b, etc, you will need to download the model from https://huggingface.co/TheBloke and modify the jupyter notebook a little bit.
```py
from llama_cpp import Llama
llm = Llama(model_path="zephyr-7b-alpha.Q5_K_M.gguf") # 👈 you will need to change this model path.
output = llm("Q: can you write a python script for fizz buzz? A: ", max_tokens=2048, stop=["Q:", "\n"], echo=True)
print(output)
```