https://github.com/avatsaev/av-local-llm-api
Allows to easily run local REST API with a custom LLM, running locally or remotely, with user defined system instructions. Useful for quick local autmations that require problem solving with large langague models and interaction via a REST API.
https://github.com/avatsaev/av-local-llm-api
fastapi llama llamacpp llm machine-learning mistral openai transformers
Last synced: 2 months ago
JSON representation
Allows to easily run local REST API with a custom LLM, running locally or remotely, with user defined system instructions. Useful for quick local autmations that require problem solving with large langague models and interaction via a REST API.
- Host: GitHub
- URL: https://github.com/avatsaev/av-local-llm-api
- Owner: avatsaev
- License: mit
- Created: 2024-01-23T14:17:51.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-23T17:08:55.000Z (over 2 years ago)
- Last Synced: 2024-10-18T23:15:55.085Z (over 1 year ago)
- Topics: fastapi, llama, llamacpp, llm, machine-learning, mistral, openai, transformers
- Language: Python
- Homepage:
- Size: 17.6 KB
- Stars: 2
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Local LLM API
Allows to easily run local REST API with a custom LLM, running locally or remotely, with user defined system instructions.
Useful for quick local autmations that require problem solving with large langague models and interaction via a REST API.
## Installation with local inference
Requirements:
- Apple Silicon Machine with at least 16GB RAM (M1/M2/M3)
- OR an Nvidia CUDA GPU
- Locally downloaded LLAMA Compatible GGUF model
- [Download Mistral-7B-Instruct Q8.0](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/tree/main)
### Setup
#### Run the following in the terminal
For Apple Silicon
```bash
CMAKE_ARGS="-DLLAMA_METAL=on" pip install -r requirements.txt
```
For CUDA:
```bash
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install -r requirements.txt
```
#### Setup the env config:
Update the `system_prompt.txt` with instructions of your choice
rename `.env.example` to `.env`
Modify the `MODEL_PATH` var to point it to your locally downloaded `gguf` file
#### Run the web server:
```bash
uvicorn main:app
```
Inference endpoint is available at: `http://127.0.0.1:8000/inference`
Make a `POST` request with a JSON body containing your input
CURL Example:
```bash
curl --location 'http://127.0.0.1:8000/inference' \
--header 'Content-Type: application/json' \
--data-raw '{
"user_input": "YOU INPUT HERE"
}'
```
## Remote inference with a OpenAI compatible API
In `.env`, set inference mode to `remote`:
```
LLM_INFERENCE_MODE = 'remote'
```
Define the remote server configuration:
```
LLM_API_URL = 'https://api.openai.com/v1/'
LLM_API_KEY = 'YOUR_KEY'
LLM_MODEL = 'gpt-3.5-turbo'
```
## Test example
`system_prompt.txt`:
```
You are a code redactoring expert that takes user input code in any language, and refactors all function parameters called `user` to DEPRECATED_user.
---EXAMPLES---
Example input:
UpdateUser(user: newUserParams)
Example output:
UpdateUser(DEPRECATED_user: newUserParams)
---
Respond only with refactored code output and nothing else.
```
```bash
curl --location 'http://127.0.0.1:8000/inference' \
--header 'Content-Type: application/json' \
--data-raw '{
"user_input": "UpdateUser(user: newUserParams)\ndeleteUser(user: user)\ncreateUser({user: {...userParams}})"
}'
```
Response JSON:
```json
{
"inference_output": " UpdateUser(DEPRECATED_user: newUserParams)\ndeleteUser(DEPRECATED_user: DEPRECATED_user)\ncreateUser({DEPRECATED_user: {...newUserParams}})"
}
```
---
Author: Vatsaev Aslan (@avatsaev)