Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/pulumiverse/katwalk

LLM model runway server
https://github.com/pulumiverse/katwalk

Last synced: 17 days ago
JSON representation

LLM model runway server

Host: GitHub
URL: https://github.com/pulumiverse/katwalk
Owner: pulumiverse
License: apache-2.0
Created: 2023-08-15T18:44:42.000Z (11 months ago)
Default Branch: main
Last Pushed: 2023-09-13T17:10:04.000Z (10 months ago)
Last Synced: 2024-02-29T08:32:06.463Z (4 months ago)
Language: Python
Homepage:
Size: 1.18 MB
Stars: 11
Watchers: 1
Forks: 4
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

awesome-pulumi - `pulumiverse/katwalk` - Library for LLM backend deployments using Pulumi (Libraries / Miscellaneous)

README

# Katwalk Server - LLM API Server and IaC

> This repository is under construction.
> Encouragement and or Bugs & PRs for enhancement are welcome!

## About:

Katwalk is a LLM model api server.
In this repository you will find all code required to build and deploy a containerized LLM API service.

## Deployment options:
- Locally with Docker
- Runpod.io GPU Cloud (alpha)
- Azure on ACI*
- More to come

> Azure ACI containers take ~90 minutes to start for un-diagnosed reasons

## Limitations:

- Katwalk server is a PoC project only at this time.
- Tested to run `hf` models from huggingface only.
- Platform: `linux/amd64` only at this time.
- Requires CUDA support.

## Navigate to the [./pulumi](./pulumi) directory for instructions

![Prompting a katwalk server api with Insomnia and a response generated by the meta-llama/Llama-2-7B-chat-hf](hack/img/chatbot-api-prompt.png)

# Configuration Index
```yaml
# secret encryption salt
encryptionsalt: v1:Ce8NM940=:v1:j+XGC73Zqp:t34XyjytvHcK+G1luA==
config:

#################################################################
# General Deploy config
#################################################################
# Usage Keys:
# Keys:
# - pulumi config set
# Secret Keys:
# - pulumi config set --secret

katwalk:deploy: "True" # Default: False, set to "True" to deploy
katwalk:runtime: "runpod" # Default: docker, accepts: docker, runpod, azure
katwalk:deploymentName: "katwalk" # Defaults to "katwalk"

#################################################################
# Docker Image Settings
#################################################################

# Docker Build & Push Settings
katwalk:dockerBuild: "False" # Default: False, set to "True" to build container
katwalk:dockerPush: "False" # Default: False, set to "True" to push container

# Docker Image Name & Registry Settings
katwalk:dockerTag: "20230829"
katwalk:dockerName: "katwalk"
katwalk:dockerProject: "usrbinkat" # Optional if same as dockerUser
katwalk:dockerRegistry: "ghcr.io" # accepts: any oci registry service (e.g. ghcr.io, docker.io, etc.)

# Registry Credentials
katwalk:dockerUser: "usrbinkat"

# Optional unless pushing to registry or deploying to Azure
katwalk:dockerRegistrySecret: # accepts: oci registry api token or password as `pulumi config set --secret dockerRegistrySecret `
secure: v1:UUt+00x+9Tz1:IVMKQ/2J+5ydq5R/kuFlfp+v5yYDF8HcL3Vy9Vz8nTNKPU=

#################################################################
# Huggingface Settings
#################################################################

# Huggingface Model ID string
katwalk:hfModel: "meta-llama/Llama-2-7b-chat-hf" # accepts: any `hf` format Huggingface model ID

# Huggingface Credentials
katwalk:hfUser: "usrbinkat" # accepts: Huggingface user name
katwalk:hfToken: # accepts: Huggingface API auth token as `pulumi config set --secret hfToken `
secure: v1:5hwLBQ4KO:/DNzuZ7UfbMMOQGxF7d29a+nWm04YSspPDm11y79E=

#################################################################
# Docker Deploy Settings
#################################################################

# If deploying locally via Docker
# Default: create & use docker volume
# Optional: set global local host path to models directory
katwalk:modelsPath: /home/kat/models

#################################################################
# Azure Deploy Settings
#################################################################

katwalk:azureAciGpu: "V100" # accepts: V100, K80, P100
katwalk:azureAciGpuCount: "1" # accepts: 1, 2, 4, 8

#################################################################
# Runpod Deploy Settings
#################################################################

# Runpod GPU Type
# List of available GPU types in ./doc/runpod/README.md
katwalk:runpodGpuType: "NVIDIA RTX A6000" # accepts: any valid Runpod GPU type

# Runpod Credentials
katwalk:runpodToken:
secure: v1:2IqzPVRePwRwz:KbJzp+5L+khtSBbgW6FjPpdCQszP700xJAZVcrg/qBoo/pbgK=

```

# What's Next?
- Adding an API Gateway service?
- Adding vector database support?
- Add support for more types of models?
- https://vercel.com/templates/ai
- https://github.com/mckaywrigley/chatbot-ui
- https://medium.com/microsoftazure/custom-chatgpt-with-azure-openai-9bee437ef733
- [Container from scratch](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#package-manager-installation)?

# ADDITIONAL RESOURCES:
- https://towardsdatascience.com/vllm-pagedattention-for-24x-faster-llm-inference-fdfb1b80f83
- https://kaitchup.substack.com
- https://hamel.dev
- https://github.com/michaelthwan/llm_family_chart/blob/master/2023_LLMfamily.drawio.png
- https://kaiokendev.github.io/context