https://github.com/neuralwork/build-cog-inference-container

Last synced: 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/neuralwork/build-cog-inference-container
Owner: neuralwork
Created: 2024-02-13T11:43:11.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-02-13T15:00:15.000Z (over 1 year ago)
Last Synced: 2024-04-23T10:25:07.475Z (about 1 year ago)
Language: Python
Size: 5.86 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

## Build a Dockerized Inference API using Cog
This repository contains the code and instructions to build a Dockerized Inference API for an LLM using Cog. For detailed tutorial of building the docker image and deploying to it to AWS EC2, please refer to [our blog](https://blog.neuralwork.ai/).
The LLM is the mistral-7b finetuned on the style instruct dataset and named mistral-7b-style-instruct. Training code and instructions of the model can be found in the [instruct-finetune-mistral](https://github.com/neuralwork/instruct-finetune-mistral) repository, its detailed tutotial can be found in [our blog post](https://blog.neuralwork.ai/deploying-llms-on-aws-ec2-using-cog-a-complete-guide/).

## Pre-requisites
- Nvidia GPU with CUDA support.
- [Docker](https://www.docker.com/) installed.
- [Cog](https://github.com/replicate/cog) installed.
- [Nvidia Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) installed.

## Build the Docker Image
To build the Docker image, run the following in the cloned directory:
```bash
cog build -t mistral-7b-style-instruct
```
This will build the Docker image with the name mistral-7b-style-instruct.

## Run the Docker Image
To run the Docker image, run the following in the cloned directory:
```bash
docker run -p 5000:5000 mistral-7b-style-instruct
```
## Test the Inference API
To test the Inference API, you can use the following curl command:

```bash
curl http://localhost:5000/predictions -X POST -H "Content-Type: application/json" -d '{"input": {"prompt":"I am an athletic and 180cm tall man in my mid twenties, I have a rectangle shaped body with slightly broad shoulders and have a sleek,casual style. I usually prefer darker colors.", "event": "I am going to a wedding."}}'
```
Or you can use the following python code:
```python
import requests

url = 'http://localhost:5000/predictions'
data = {"input": {"prompt":"I am an athletic and 180cm tall man in my mid twenties, I have a rectangle shaped body with slightly broad shoulders and have a sleek,casual style. I usually prefer darker colors.", "event": "I am going to a wedding."}}
response = requests.post(url, json=data)
print(response.json())
```

From [neuralwork](https://neuralwork.ai/) with :heart:

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/neuralwork/build-cog-inference-container

Awesome Lists containing this project

README