https://github.com/docsaidlab/capybara
OpenCV and ONNX Runtime Inference Toolkit
https://github.com/docsaidlab/capybara
onnxruntime opencv python toolbox
Last synced: 2 months ago
JSON representation
OpenCV and ONNX Runtime Inference Toolkit
- Host: GitHub
- URL: https://github.com/docsaidlab/capybara
- Owner: DocsaidLab
- License: apache-2.0
- Created: 2024-12-18T08:57:30.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-11T06:15:41.000Z (about 1 year ago)
- Last Synced: 2025-02-11T06:34:45.586Z (about 1 year ago)
- Topics: onnxruntime, opencv, python, toolbox
- Language: Python
- Homepage:
- Size: 19.6 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
**[English](./README.md)** | [Chinese](./README_tw.md)
# Capybara

---
## Introduction
Capybara is designed with three goals:
1. **Lightweight default install**: `pip install capybara-docsaid` installs only the core `utils/structures/vision` modules, without forcing heavy inference dependencies.
2. **Inference backends as opt-in extras**: install ONNX Runtime / OpenVINO / TorchScript only when you need them via extras.
3. **Lower risk**: enforce quality gates with ruff/pyright/pytest and target **90%** line coverage for the core codebase.
What you get:
- **Image tools** (`capybara.vision`): I/O, color conversion, resize/rotate/pad/crop, and video frame extraction.
- **Geometry structures** (`capybara.structures`): `Box/Boxes`, `Polygon/Polygons`, `Keypoints`, plus helper functions like IoU.
- **Inference wrappers (optional)**: `capybara.onnxengine` / `capybara.openvinoengine` / `capybara.torchengine`.
- **Feature extras (optional)**: `visualization` (drawing tools), `ipcam` (simple web demo), `system` (system info tools).
- **Utilities** (`capybara.utils`): `PowerDict`, `Timer`, `make_batch`, `download_from_google`, and other common helpers.
## Quick Start
### Install and verify
```bash
pip install capybara-docsaid
python -c "import capybara; print(capybara.__version__)"
```
## Documentation
To learn more about installation and usage, see [**Capybara Documents**](https://docsaid.org/docs/capybara).
The documentation includes detailed guides and common FAQs for this project.
## Installation
### Core install (lightweight)
```bash
pip install capybara-docsaid
```
### Enable inference backends (optional)
```bash
# ONNX Runtime (CPU)
pip install "capybara-docsaid[onnxruntime]"
# ONNX Runtime (GPU)
pip install "capybara-docsaid[onnxruntime-gpu]"
# OpenVINO runtime
pip install "capybara-docsaid[openvino]"
# TorchScript runtime
pip install "capybara-docsaid[torchscript]"
# Install everything
pip install "capybara-docsaid[all]"
```
### Feature extras (optional)
```bash
# Visualization (matplotlib/pillow)
pip install "capybara-docsaid[visualization]"
# IPCam app (flask)
pip install "capybara-docsaid[ipcam]"
# System info (psutil)
pip install "capybara-docsaid[system]"
```
### Combine multiple extras
If you want OpenVINO inference and the IPCam features, install:
```bash
# OpenVINO + IPCam
pip install "capybara-docsaid[openvino,ipcam]"
```
### Install from Git
```bash
pip install git+https://github.com/DocsaidLab/Capybara.git
```
## System Dependencies (Install as needed)
Some features require OS-level codecs / image I/O / PDF tools (install as needed):
- `PyTurboJPEG` (faster JPEG I/O): requires the TurboJPEG library.
- `pillow-heif` (HEIC/HEIF support): requires libheif.
- `pdf2image` (PDF to images): requires Poppler.
- Video frame extraction: installing `ffmpeg` is recommended (more stable OpenCV video decoding).
### Ubuntu
```bash
sudo apt install ffmpeg libturbojpeg libheif-dev poppler-utils
```
### macOS
```bash
brew install jpeg-turbo ffmpeg libheif poppler
```
### GPU Notes (ONNX Runtime CUDA)
If you're using `onnxruntime-gpu`, install the compatible CUDA/cuDNN version for your ORT version:
- See [**the ONNX Runtime documentation**](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements)
## Usage
### Image data conventions
- Capybara images are represented as `numpy.ndarray`. By default, they follow OpenCV conventions: **BGR**, and shape is typically `(H, W, 3)`.
- If you prefer working in RGB, use `imread(..., color_base="RGB")` or convert with `imcvtcolor(img, "BGR2RGB")`.
### Image I/O
```python
from capybara import imread, imwrite
img = imread("your_image.jpg")
if img is None:
raise RuntimeError("Failed to read image.")
imwrite(img, "out.jpg")
```
Notes:
- `imread` returns `None` when it fails to decode an image (if the path doesn't exist, it raises `FileExistsError`).
- `imread` also supports `.heic` (requires `pillow-heif` + OS-level libheif).
### Resize / pad
With `imresize`, you can pass `None` in `size` to keep the aspect ratio and have the other dimension inferred automatically.
```python
import numpy as np
from capybara import BORDER, imresize, pad
img = np.zeros((480, 640, 3), dtype=np.uint8)
img = imresize(img, (320, None)) # (height, width)
img = pad(img, pad_size=(8, 8), pad_mode=BORDER.REPLICATE)
```
### Color conversion
```python
import numpy as np
from capybara import imcvtcolor
img = np.zeros((240, 320, 3), dtype=np.uint8) # BGR
gray = imcvtcolor(img, "BGR2GRAY") # grayscale
rgb = imcvtcolor(img, "BGR2RGB") # RGB
```
### Rotation / perspective correction
```python
import numpy as np
from capybara import Polygon, imrotate, imwarp_quadrangle
img = np.zeros((240, 320, 3), dtype=np.uint8)
rot = imrotate(img, angle=15, expand=True) # Angle definition matches OpenCV: positive values rotate counterclockwise
poly = Polygon([[10, 10], [200, 20], [190, 120], [20, 110]])
patch = imwarp_quadrangle(img, poly) # 4-point perspective warp
```
### Cropping (Box / Boxes)
```python
import numpy as np
from capybara import Box, Boxes, imcropbox, imcropboxes
img = np.zeros((240, 320, 3), dtype=np.uint8)
crop1 = imcropbox(img, Box([10, 20, 110, 120]), use_pad=True)
crop_list = imcropboxes(
img,
Boxes([[0, 0, 10, 10], [100, 100, 400, 300]]),
use_pad=True,
)
```
### Binarization + morphology
Morphology operators live in `capybara.vision.morphology` (not in the top-level `capybara` namespace).
```python
import numpy as np
from capybara import imbinarize
from capybara.vision.morphology import imopen
img = np.zeros((240, 320, 3), dtype=np.uint8)
mask = imbinarize(img) # OTSU + binary
mask = imopen(mask, ksize=3) # Opening to remove small noise
```
### Boxes / IoU
```python
import numpy as np
from capybara import Box, Boxes, pairwise_iou
boxes_a = Boxes([[10, 10, 20, 20], [30, 30, 60, 60]])
boxes_b = Boxes(np.array([[12, 12, 18, 18]], dtype=np.float32))
print(pairwise_iou(boxes_a, boxes_b))
box = Box([0.1, 0.2, 0.9, 0.8], is_normalized=True).convert("XYWH")
print(box.numpy())
```
### Polygons / IoU
```python
from capybara import Polygon, polygon_iou
p1 = Polygon([[0, 0], [10, 0], [10, 10], [0, 10]])
p2 = Polygon([[5, 5], [15, 5], [15, 15], [5, 15]])
print(polygon_iou(p1, p2))
```
### Base64 (image / ndarray)
```python
import numpy as np
from capybara import img_to_b64str, npy_to_b64str
from capybara.vision.improc import b64str_to_img, b64str_to_npy
img = np.zeros((32, 32, 3), dtype=np.uint8)
b64_img = img_to_b64str(img) # JPEG bytes -> base64 string
if b64_img is None:
raise RuntimeError("Failed to encode image into base64.")
img2 = b64str_to_img(b64_img) # base64 string -> numpy image
vec = np.arange(8, dtype=np.float32)
b64_vec = npy_to_b64str(vec)
vec2 = b64str_to_npy(b64_vec, dtype="float32")
```
### PDF to images
```python
from capybara.vision.improc import pdf2imgs
pages = pdf2imgs("file.pdf") # list[np.ndarray], each page is BGR image
if pages is None:
raise RuntimeError("Failed to decode PDF.")
print(len(pages))
```
### Visualization (optional)
Install first: `pip install "capybara-docsaid[visualization]"`.
```python
import numpy as np
from capybara import Box
from capybara.vision.visualization.draw import draw_box
img = np.zeros((240, 320, 3), dtype=np.uint8)
img = draw_box(img, Box([10, 20, 100, 120]))
```
### IPCam (optional)
`IpcamCapture` itself does not depend on Flask; you only need the `ipcam` extra to use `WebDemo`.
```python
from capybara.vision.ipcam.camera import IpcamCapture
cap = IpcamCapture(url=0, color_base="BGR") # or provide an RTSP/HTTP URL
frame = next(cap)
```
Web demo (install first: `pip install "capybara-docsaid[ipcam]"`):
```python
from capybara.vision.ipcam.app import WebDemo
WebDemo("rtsp://").run(port=5001)
```
### System info (optional)
Install first: `pip install "capybara-docsaid[system]"`.
```python
from capybara.utils.system_info import get_system_info
print(get_system_info())
```
### Video frame extraction
```python
from capybara import video2frames_v2
frames = video2frames_v2("demo.mp4", frame_per_sec=2, max_size=1280)
print(len(frames))
```
## Inference Backends
Inference backends are optional; install the corresponding extras before importing the relevant engine modules.
### Runtime / backend matrix
Note: TorchScript runtime is named `Runtime.pt` in code (corresponding extra: `torchscript`).
| Runtime (`capybara.runtime.Runtime`) | Backend name | Provider / device |
| ------------------------------------ | --------------- | ----------------------------------------------------------------------------------------------------------- |
| `onnx` | `cpu` | `["CPUExecutionProvider"]` |
| `onnx` | `cuda` | `["CUDAExecutionProvider"(device_id), "CPUExecutionProvider"]` |
| `onnx` | `tensorrt` | `["TensorrtExecutionProvider"(device_id), "CUDAExecutionProvider"(device_id), "CPUExecutionProvider"]` |
| `onnx` | `tensorrt_rtx` | `["NvTensorRTRTXExecutionProvider"(device_id), "CUDAExecutionProvider"(device_id), "CPUExecutionProvider"]` |
| `openvino` | `cpu` | `device="CPU"` |
| `openvino` | `gpu` | `device="GPU"` |
| `openvino` | `npu` | `device="NPU"` |
| `pt` | `cpu` | `torch.device("cpu")` |
| `pt` | `cuda` | `torch.device("cuda")` |
### Runtime registry (auto backend selection)
```python
from capybara.runtime import Runtime
print(Runtime.onnx.auto_backend_name()) # Priority: cuda -> tensorrt_rtx -> tensorrt -> cpu
print(Runtime.openvino.auto_backend_name()) # Priority: gpu -> npu -> cpu
print(Runtime.pt.auto_backend_name()) # Priority: cuda -> cpu
```
### ONNX Runtime (`capybara.onnxengine`)
```python
import numpy as np
from capybara.onnxengine import EngineConfig, ONNXEngine
engine = ONNXEngine(
"model.onnx",
backend="cpu",
config=EngineConfig(enable_io_binding=False),
)
outputs = engine.run({"input": np.ones((1, 3, 224, 224), dtype=np.float32)})
print(outputs.keys())
print(engine.summary())
```
### OpenVINO (`capybara.openvinoengine`)
```python
import numpy as np
from capybara.openvinoengine import OpenVINOConfig, OpenVINODevice, OpenVINOEngine
engine = OpenVINOEngine(
"model.xml",
device=OpenVINODevice.cpu,
config=OpenVINOConfig(num_requests=2),
)
outputs = engine.run({"input": np.ones((1, 3), dtype=np.float32)})
print(outputs.keys())
```
### TorchScript (`capybara.torchengine`)
```python
import numpy as np
from capybara.torchengine import TorchEngine
engine = TorchEngine("model.pt", device="cpu")
outputs = engine.run({"image": np.zeros((1, 3, 224, 224), dtype=np.float32)})
print(outputs.keys())
```
### Benchmark (depends on hardware)
All engines provide `benchmark(...)` for quick throughput/latency measurements.
```python
import numpy as np
from capybara.onnxengine import ONNXEngine
engine = ONNXEngine("model.onnx", backend="cpu")
dummy = np.zeros((1, 3, 224, 224), dtype=np.float32)
print(engine.benchmark({"input": dummy}, repeat=50, warmup=5))
```
### Advanced: Custom options (optional)
`EngineConfig` / `OpenVINOConfig` / `TorchEngineConfig` are passed through to the underlying runtime as-is.
```python
from capybara.onnxengine import EngineConfig, ONNXEngine
engine = ONNXEngine(
"model.onnx",
backend="cuda",
config=EngineConfig(
provider_options={
"CUDAExecutionProvider": {
"enable_cuda_graph": True,
},
},
),
)
```
## Quality Gates (Contributors)
Before merging, this project requires:
```bash
ruff check .
ruff format --check .
pyright
python -m pytest --cov=capybara --cov-config=.coveragerc --cov-report=term
```
Notes:
- Coverage gate is **90% line coverage** (rules defined in `.coveragerc`).
- Heavy / environment-dependent modules are excluded from the default coverage gate to keep CI reproducible and maintainable.
## Docker (optional)
```bash
git clone https://github.com/DocsaidLab/Capybara.git
cd Capybara
bash docker/build.bash
```
Run:
```bash
docker run --rm -it capybara_docsaid bash
```
If you need GPU access inside the container, use the NVIDIA container runtime (e.g. `--gpus all`).
## Testing (local)
```bash
python -m pytest -vv
```
## License
Apache-2.0, see `LICENSE`.
## Citation
```bibtex
@misc{lin2025capybara,
author = {Kun-Hsiang Lin*, Ze Yuan*},
title = {Capybara: An Integrated Python Package for Image Processing and Deep Learning.},
year = {2025},
publisher = {GitHub},
howpublished = {\\url{https://github.com/DocsaidLab/Capybara}},
note = {* equal contribution}
}
```