https://github.com/marcellodesales/replicated-cog-server-docker
Custom Dockerized Replicated Cog Server, used for running Machine Learning models in a reusable Docker container that exposes your model as an API. Example: Run GFP-GAN restoration!
https://github.com/marcellodesales/replicated-cog-server-docker
base cog docker docker-compose machine-learning ml ops
Last synced: 25 days ago
JSON representation
Custom Dockerized Replicated Cog Server, used for running Machine Learning models in a reusable Docker container that exposes your model as an API. Example: Run GFP-GAN restoration!
- Host: GitHub
- URL: https://github.com/marcellodesales/replicated-cog-server-docker
- Owner: marcellodesales
- License: gpl-3.0
- Created: 2022-01-18T02:32:01.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2022-01-19T00:16:08.000Z (over 3 years ago)
- Last Synced: 2025-02-03T10:46:22.737Z (3 months ago)
- Topics: base, cog, docker, docker-compose, machine-learning, ml, ops
- Language: Dockerfile
- Homepage:
- Size: 41 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# cog-replicated-docker
The Cog server from replicated https://github.com/replicate/cog that helps running Machine Learning applications using an API server through a well-defined interface.
This is an optional image in case you need to customize your base image.> **Base images**: The base of our project this image has the supporting base binaries that's hard to install on a host.
> * https://github.com/marcellodesales/nvidea-cuda-ubuntu-docker
> * NVidea CUDA
> * python# How to Use
There are a few steps to run you Machine Learning model using cog:
* Create the driver `predict.py`
* Create the builder `cog.yaml`> **DOCS**: More at https://github.com/replicate/cog/blob/main/docs/getting-started-own-model.md.
## Create a driver `predict.py`
* It will define your arguments, their respective types, etc.
* You will implement the call to your model library
* You will have an interface to return the types such as images, texts, etc.## Create teh builder `cog.yaml`
It helps describing your dependencies such as system-level, python, and others.
* System-dependencies: what needs to be in the container to run your model.
* For instance,
* Model dependencies: pypi dependendencies that is part of your implementation
* For instance, the correct versions should be properly described.
* An example of failure: versions of pytorch must match the python version used: https://github.com/pytorch/vision#installation
* Pre-install dependencies: Those that are required to be installed after# Example: Face improvement
* https://replicate.com/tencentarc/gfpgan/examples
## predict.py
* Define a cog input like `@cog.input("image", type=Path, help="input image")`
* https://github.com/TencentARC/GFPGAN/pull/67/files#diff-73c1982d8a085dc10fda2ac7b6f202ae3ff9530ee6a15991c5339051eb10a49aR79```python
# import subprocess
# subprocess.call(['sh', './run_setup.sh'])import cog
import tempfile
import os
from pathlib import Path
import argparse
import cv2
import shutil
from basicsr.utils import imwrite
import torch
from gfpgan import GFPGANer
import glob
import numpy as npclass Predictor(cog.Predictor):
def setup(self):
parser = argparse.ArgumentParser()parser.add_argument('--upscale', type=int, default=2)
parser.add_argument('--arch', type=str, default='clean')
parser.add_argument('--channel', type=int, default=2)
parser.add_argument('--model_path', type=str, default='experiments/pretrained_models/GFPGANCleanv1-NoCE-C2.pth')
parser.add_argument('--bg_upsampler', type=str, default='realesrgan')
parser.add_argument('--bg_tile', type=int, default=400)
parser.add_argument('--test_path', type=str, default='inputs/whole_imgs')
parser.add_argument('--suffix', type=str, default=None, help='Suffix of the restored faces')
parser.add_argument('--only_center_face', action='store_true')
parser.add_argument('--aligned', action='store_true')
parser.add_argument('--paste_back', action='store_false')
parser.add_argument('--save_root', type=str, default='results')self.args = parser.parse_args(["--upscale", "2", "--test_path", "cog_temp", "--save_root", "results"])
os.makedirs(self.args.test_path, exist_ok=True)
# background upsampler
if self.args.bg_upsampler == 'realesrgan':
if not torch.cuda.is_available(): # CPU
import warnings
warnings.warn('The unoptimized RealESRGAN is very slow on CPU. We do not use it. '
'If you really want to use it, please modify the corresponding codes.')
bg_upsampler = None
else:
from realesrgan import RealESRGANer
bg_upsampler = RealESRGANer(
scale=2,
model_path='https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.1/RealESRGAN_x2plus.pth',
tile=self.args.bg_tile,
tile_pad=10,
pre_pad=0,
half=True) # need to set False in CPU mode
else:
bg_upsampler = None# set up GFPGAN restorer
self.restorer = GFPGANer(
model_path=self.args.model_path,
upscale=self.args.upscale,
arch=self.args.arch,
channel_multiplier=self.args.channel,
bg_upsampler=bg_upsampler)@cog.input("image", type=Path, help="input image")
def predict(self, image):
input_dir = self.args.test_pathinput_path = os.path.join(input_dir, os.path.basename(image))
shutil.copy(str(image), input_path)os.makedirs(self.args.save_root, exist_ok=True)
img_list = sorted(glob.glob(os.path.join(input_dir, '*')))
out_path = Path(tempfile.mkdtemp()) / "output.png"
for img_path in img_list:
# read image
img_name = os.path.basename(img_path)
print(f'Processing {img_name} ...')
basename, ext = os.path.splitext(img_name)
input_img = cv2.imread(img_path, cv2.IMREAD_COLOR)cropped_faces, restored_faces, restored_img = self.restorer.enhance(
input_img, has_aligned=self.args.aligned, only_center_face=self.args.only_center_face, paste_back=self.args.paste_back)# save faces
for idx, (cropped_face, restored_face) in enumerate(zip(cropped_faces, restored_faces)):
# save cropped face
save_crop_path = os.path.join(self.args.save_root, 'cropped_faces', f'{basename}_{idx:02d}.png')
imwrite(cropped_face, save_crop_path)
# save restored face
if self.args.suffix is not None:
save_face_name = f'{basename}_{idx:02d}_{self.args.suffix}.png'
else:
save_face_name = f'{basename}_{idx:02d}.png'
save_restore_path = os.path.join(self.args.save_root, 'restored_faces', save_face_name)
imwrite(restored_face, save_restore_path)
# save cmp image
cmp_img = np.concatenate((cropped_face, restored_face), axis=1)
imwrite(restored_img, str(out_path))
clean_folder(self.args.test_path)return out_path
def clean_folder(folder):
for filename in os.listdir(folder):
file_path = os.path.join(folder, filename)
try:
if os.path.isfile(file_path) or os.path.islink(file_path):
os.unlink(file_path)
elif os.path.isdir(file_path):
shutil.rmtree(file_path)
except Exception as e:
print('Failed to delete %s. Reason: %s' % (file_path, e))```
## cog.yaml
```yaml
build:
gpu: true
python_version: "3.8"
system_packages:
- "libgl1-mesa-glx"
- "libglib2.0-0"
- "libtinfo5"
python_packages:
# Make sure to use the correct combination of https://github.com/pytorch/vision#installation
- "torch==1.7.0"
- "torchvision==0.8.1"
- "numpy==1.21.1"
- "ipython==7.21.0"
- "lmdb==1.2.1"
- "opencv-python==4.5.3.56"
- "PyYAML==5.4.1"
- "tqdm==4.62.2"
- "yapf==0.31.0"
- "tb-nightly==2.7.0a20210825"
pre_install:
- pip install facexlib==0.2.1.1
- pip install realesrganpredict: "predict.py:Predictor"
```## Dockerfile
* You can define your dockerfile with the parent image from this repo
```dockerfile
$ cat Dockerfile
ARG TENCENT_ARC_BASE_IMAGE
ARG TRAINING_FILE1
ARG TRAINING_FILE2FROM ${TENCENT_ARC_BASE_IMAGE}
# weights
ARG TRAINING_FILE1
ENV TRAINING_FILE1 ${TRAINING_FILE1:-v0.2.0/GFPGANCleanv1-NoCE-C2.pth}
RUN echo "Downloading training file '${TRAINING_FILE1}'" && \
wget https://github.com/TencentARC/GFPGAN/releases/download/${TRAINING_FILE1} -P experiments/pretrained_modelsARG TRAINING_FILE2
ENV TRAINING_FILE2 ${TRAINING_FILE2:-v0.1.0/GFPGANv1.pth}
RUN echo "Downloading training file '${TRAINING_FILE1}'" && \
wget https://github.com/TencentARC/GFPGAN/releases/download/${TRAINING_FILE2} -P experiments/pretrained_models
```## Docker-Compose
```yaml
version: "3.8"###
### Running in MacOS
### https://stackoverflow.com/questions/64439278/gpg-invalid-signature-error-while-running-apt-update-inside-arm32v7-ubuntu20-04/64553153#64553153
###
services:GFPGAN:
image: marcellodesales/tencent-arc-gfpgan-runtime
build:
context: .
args:
TENCENT_ARC_BASE_IMAGE: marcellodesales/replicated-cog-server:python3.8_nvidea1.11.1
TRAINING_FILE1: v0.2.0/GFPGANCleanv1-NoCE-C2.pth
TRAINING_FILE2: v0.1.0/GFPGANv1.pth
```## Building
> NOTE: Make sure to have disk space and memory. (15GB)
> * The first time running it might takes more than 10min depending on your location.
> * Subsequent Builds take advantage of Docker Caches when specific layers aren't invalidated
> * Problem running: "RGPG invalid signature error while running `apt-get update`": running in MacOS you can have errors like disk space, etc. Just make sure you have enough.
> * https://stackoverflow.com/questions/64439278/gpg-invalid-signature-error-while-running-apt-update-inside-arm32v7-ubuntu20-04/64553153#64553153```console
$ docker-compose build
Building GFPGAN
[+] Building 0.2s (18/18) FINISHED
=> [internal] load build definition from Dockerfile 0
=> => transferring dockerfile: 674B 0
=> [internal] load .dockerignore 0
=> => transferring context: 35B 0
=> [internal] load metadata for docker.io/marcellodesales/replicated-cog-server:python3.8_nvidea1.11.1 0
=> [1/3] FROM docker.io/marcellodesales/replicated-cog-server:python3.8_nvidea1.11.1 0
=> [internal] load build context 0
=> => transferring context: 4.33kB 0
=> CACHED [2/3] COPY cog.yaml . 0
=> CACHED [3/3] RUN cat cog.yaml | yq e . - -o json | jq -r -c '.build.system_packages[]' | sed -r 's/^([^,]*)(,?)$/ \1 \2/' | tr -d '\n' > cog.pkgs && 0
=> CACHED [4/3] RUN apt-get update -qq && apt-get install -qqy $(cat cog.pkgs) && rm -rf /var/lib/apt/lists/* # buildkit 85.8MB buildkit.dockerfile.v0 0
=> CACHED [5/3] RUN cat cog.yaml | yq e . - -o json | jq -r -c '.build.python_packages[]' | sed -r 's/^([^,]*)(,?)$/\1 \2/' | tr -d '\n' > cog.python-pkgs 0
=> CACHED [6/3] RUN pip install -f https://download.pytorch.org/whl/torch_stable.html $(cat cog.python-pkgs) 0
=> CACHED [7/3] RUN cat cog.yaml | yq e . - -o json | jq -r -c '.build.pre_install[]' > cog.pre-inst && echo "Installing the pre-install packages: $(ca 0
=> CACHED [8/3] RUN sh cog.pre-inst 0
=> CACHED [9/3] WORKDIR /src 0
=> CACHED [10/3] COPY predict.py . 0
=> CACHED [11/3] COPY . . 0
=> CACHED [12/3] RUN echo "Downloading training file 'v0.2.0/GFPGANCleanv1-NoCE-C2.pth'" && wget https://github.com/TencentARC/GFPGAN/releases/download 0
=> CACHED [13/3] RUN echo "Downloading training file 'v0.2.0/GFPGANCleanv1-NoCE-C2.pth'" && wget https://github.com/TencentARC/GFPGAN/releases/download 0
=> exporting to image 0
=> => exporting layers 0
=> => writing image sha256:71684982ed27156781c54ef5e2f7d18a110a7aa0e150bfb49b207e1709102ceb 0
=> => naming to docker.io/marcellodesales/tencent-arc-gfpgan-runtime 0
```## Running
You can just create a container in the background
```console
$ docker-compose up -d
Recreating gfpgan_GFPGAN_1 ... done
```* You can make sure that the container loaded your app and models...
```console
$ docker-compose logs -f
Attaching to gfpgan_GFPGAN_1
GFPGAN_1 | /root/.pyenv/versions/3.8.12/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
GFPGAN_1 | return torch._C._cuda_getDeviceCount() > 0
GFPGAN_1 | /src/predict.py:41: UserWarning: The unoptimized RealESRGAN is very slow on CPU. We do not use it. If you really want to use it, please modify the corresponding codes.
GFPGAN_1 | warnings.warn('The unoptimized RealESRGAN is very slow on CPU. We do not use it. '
GFPGAN_1 | Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth" to /root/.pyenv/versions/3.8.12/lib/python3.8/site-packages/facexlib/weights/detection_Resnet50_Final.pth
GFPGAN_1 |
100%|------| 104M/104M [00:04<00:00, 22.7MB/s]
GFPGAN_1 | * Serving Flask app 'http' (lazy loading)
GFPGAN_1 | * Environment: production
GFPGAN_1 | WARNING: This is a development server. Do not use it in a production deployment.
GFPGAN_1 | Use a production WSGI server instead.
GFPGAN_1 | * Debug mode: off
GFPGAN_1 | * Running on all addresses.
GFPGAN_1 | WARNING: This is a development server. Do not use it in a production deployment.
GFPGAN_1 | * Running on http://172.19.0.2:5000/ (Press CTRL+C to quit)
```## Testing: HTTP POST image=PATH
* Choose an image as the input
> Using [viu](https://github.com/atanunq/viu) to open the image on terminal

* Execute the Machine Learning service using the interface built by cog, which exposes the user-defined parameters.
* In this example, `image` is a parameter```console
$ curl http://localhost:5000/predict -X POST -F image=@$(pwd)/inputs/whole_imgs/Blake_Lively.jpg -o $(pwd)/super.jpg
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2087k 100 1996k 100 93345 276k 12943 0:00:07 0:00:07 --:--:-- 499k
```### Output
> Using [viu](https://github.com/atanunq/viu) to open the image on terminal

# Development
## Build this image
* You can specify the python version, etc.
```console
docker-compose build
```# Thoughts
# Research
* Expose Cog as a Kubernetes CRD