https://github.com/neuralwork/build-cog-inference-container
https://github.com/neuralwork/build-cog-inference-container
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/neuralwork/build-cog-inference-container
- Owner: neuralwork
- Created: 2024-02-13T11:43:11.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-02-13T15:00:15.000Z (over 1 year ago)
- Last Synced: 2024-04-23T10:25:07.475Z (about 1 year ago)
- Language: Python
- Size: 5.86 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Build a Dockerized Inference API using Cog
This repository contains the code and instructions to build a Dockerized Inference API for an LLM using Cog. For detailed tutorial of building the docker image and deploying to it to AWS EC2, please refer to [our blog](https://blog.neuralwork.ai/).
The LLM is the mistral-7b finetuned on the style instruct dataset and named mistral-7b-style-instruct. Training code and instructions of the model can be found in the [instruct-finetune-mistral](https://github.com/neuralwork/instruct-finetune-mistral) repository, its detailed tutotial can be found in [our blog post](https://blog.neuralwork.ai/deploying-llms-on-aws-ec2-using-cog-a-complete-guide/).## Pre-requisites
- Nvidia GPU with CUDA support.
- [Docker](https://www.docker.com/) installed.
- [Cog](https://github.com/replicate/cog) installed.
- [Nvidia Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) installed.## Build the Docker Image
To build the Docker image, run the following in the cloned directory:
```bash
cog build -t mistral-7b-style-instruct
```
This will build the Docker image with the name mistral-7b-style-instruct.## Run the Docker Image
To run the Docker image, run the following in the cloned directory:
```bash
docker run -p 5000:5000 mistral-7b-style-instruct
```
## Test the Inference API
To test the Inference API, you can use the following curl command:```bash
curl http://localhost:5000/predictions -X POST -H "Content-Type: application/json" -d '{"input": {"prompt":"I am an athletic and 180cm tall man in my mid twenties, I have a rectangle shaped body with slightly broad shoulders and have a sleek,casual style. I usually prefer darker colors.", "event": "I am going to a wedding."}}'
```
Or you can use the following python code:
```python
import requestsurl = 'http://localhost:5000/predictions'
data = {"input": {"prompt":"I am an athletic and 180cm tall man in my mid twenties, I have a rectangle shaped body with slightly broad shoulders and have a sleek,casual style. I usually prefer darker colors.", "event": "I am going to a wedding."}}
response = requests.post(url, json=data)
print(response.json())
```From [neuralwork](https://neuralwork.ai/) with :heart: