https://github.com/outofai/chitchat

Modal LLM LLama.cpp based model deployment as part of series of Model as a Service (MaaS)
https://github.com/outofai/chitchat

llamacpp llm llm-inference machine-learning mistral mistral-7b modelasservice modeldeployment openhermes serverless

Last synced: 8 months ago
JSON representation

Modal LLM LLama.cpp based model deployment as part of series of Model as a Service (MaaS)

Host: GitHub
URL: https://github.com/outofai/chitchat
Owner: OutofAi
License: mit
Created: 2023-12-12T00:15:09.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-01-10T00:06:48.000Z (9 months ago)
Last Synced: 2025-01-10T01:19:22.092Z (9 months ago)
Topics: llamacpp, llm, llm-inference, machine-learning, mistral, mistral-7b, modelasservice, modeldeployment, openhermes, serverless
Language: Python
Homepage:
Size: 40 KB
Stars: 12
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

logo

GPU variation

hello world

This is the first part of a collection of templates we are working on for promoting the concept of Model as a Serivce (MaaS). Mainly revolving around using Firebase/Modal/Stripe. One of the user friendliest and cheapest way to deploy your model and creating inference endpoint API is Modal. This example shows the simplicity of deploying Mistral 7B Instruct v0.1 - GGUF with only few lines of code and deploying it on Modal. But you can change it to any model that is supported by LLamacpp

and also support our channel
https://www.buymeacoffee.com/outofAI

Prerequisites

Make sure you have created an account on Modal.com and install the required Python packages

pip install modal

The next command will help you to automatically create a token and set everything up and log you in to simplify deployment

python3 -m modal setup

This is all you need to be able to generate an endpoint.

Deploy

There are two examples avaiable here and depending on cost you can choose which one you like to deploy. We recommend deploying the cpu version first before attempting the gpu one. To deploy the model to create an inference endpoint API you only need to run this command.

CPU version:

modal deploy chitchat-cpu.py

GPU version (Running on T4):

modal deploy chitchat-gpu.py

After a successful process you will be given entrypoint link in this format

Created entrypoint: https://[ORG_NAME]--[NAME]-entrypoint.modal.run

Inference

We put together a website https://chitchatsource.com/ to simplify and enhance user experience, insert the provided link in previous step on that page to run inference on your model.

ChitChat-Settings

After saving your deployment link you should be able to run inference on the model. You can use this website for running local FastAPI inference endpoint as well. You just need to make sure the formating and parameters expected matches the one provided in this example. I will do a different repository related to that.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome