{"id":30383538,"url":"https://github.com/acly/vision.cpp","last_synced_at":"2025-08-21T00:36:07.987Z","repository":{"id":310348902,"uuid":"1017209471","full_name":"Acly/vision.cpp","owner":"Acly","description":"Computer Vision ML inference in C++","archived":false,"fork":false,"pushed_at":"2025-08-17T13:10:32.000Z","size":681,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-17T15:10:36.251Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Acly.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-10T07:32:05.000Z","updated_at":"2025-08-14T18:11:13.000Z","dependencies_parsed_at":"2025-08-17T15:10:54.637Z","dependency_job_id":"2bfb0ce3-1009-4a28-bad8-cfe050738c3d","html_url":"https://github.com/Acly/vision.cpp","commit_stats":null,"previous_names":["acly/vision.cpp"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/Acly/vision.cpp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Acly%2Fvision.cpp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Acly%2Fvision.cpp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Acly%2Fvision.cpp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Acly%2Fvision.cpp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Acly","download_url":"https://codeload.github.com/Acly/vision.cpp/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Acly%2Fvision.cpp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271409734,"owners_count":24754730,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-20T02:00:09.606Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-21T00:36:05.482Z","updated_at":"2025-08-21T00:36:07.976Z","avatar_url":"https://github.com/Acly.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# _vision_.cpp\n\nComputer Vision ML inference in C++\n\n* Self-contained C++ library\n* Efficient inference on consumer CPU and GPUs (NVIDIA, AMD, Intel)\n* Lightweight deployment on many platforms (Windows, Linux, MacOS)\n* Growing number of supported models behind a simple API\n* Modular design for full control and implementing your own models\n\nBased on [ggml](https://github.com/ggml-org/ggml) similar to the [llama.cpp](https://github.com/ggml-org/llama.cpp) project.\n\n### Features\n\n| Model                       | Task             | Backends    |\n| :-------------------------- | :--------------- | :---------- |\n| [**MobileSAM**](#mobilesam) | Segmentation     | CPU, Vulkan |\n| [**BiRefNet**](#birefnet)   | Segmentation     | CPU, Vulkan |\n| [**MI-GAN**](#mi-gan)       | Inpainting       | CPU, Vulkan |\n| [**ESRGAN**](#real-esrgan)  | Super-resolution | CPU, Vulkan |\n| [_Implement a model [**Guide**]_](docs/model-implementation-guide.md) | | |\n\n## Get Started\n\nGet the library and executables:\n* Download a [release package](https://github.com/Acly/vision.cpp/releases) and extract it,\n* or [build from source](#building).\n\n### Example: Select an object in an image\n\nLet's use MobileSAM to generate a segmentation mask of the plushy on the right\nby passing in a box describing its approximate location.\n\n\u003cimg width=\"400\" height=\"256\" alt=\"Example image showing box prompt at pixel location (420, 120) - (650, 430), and the output mask\" src=\"https://github.com/user-attachments/assets/0b90ad96-c7d2-4c4c-b028-699433cef704\" /\u003e\n\nYou can download the model and input image here: [MobileSAM-F16.gguf](https://huggingface.co/Acly/MobileSAM-GGUF/resolve/main/MobileSAM-F16.gguf) | [input.jpg](docs/media/input.jpg)\n\n\n#### CLI\n\nFind the `vision-cli` executable in the `bin` folder and run it to generate the mask:\n\n```sh\nvision-cli -m MobileSAM-F16.gguf -i input.jpg -p 420 120 650 430 -o mask.png\n```\nPass `--composite output.png` to composite input and mask. Use `--help` for more options.\n\n#### API\n\n```c++\n#include \u003cvisp/vision.h\u003e\nusing namespace visp;\n\nvoid main() {\n  backend_device cpu = backend_init(backend_type::cpu);\n  sam_model sam = sam_load_model(\"MobileSAM-F16.gguf\", cpu);\n  \n  image_data input_image = image_load(\"input.jpg\");\n  sam_encode(sam, input_image);\n\n  image_data object_mask = sam_compute(sam, box_2d{{420, 120}, {650, 320}});\n  image_save(object_mask, \"mask.png\");\n}\n```\nThis shows the high-level API. Internally it is composed of multiple smaller\nfunctions that handle model loading, pre-processing inputs, transferring data to\nbackend devices, post-processing output, etc. These can be used as building\nblocks for flexible functions which integrate with your existing data sources\nand infrastructure.\n\n\n\n## Models\n\n#### MobileSAM\n\n\u003cimg width=\"400\" height=\"256\" alt=\"example-sam\" src=\"https://github.com/user-attachments/assets/9c0fe151-9990-4bb1-b954-7caff560b110\" /\u003e\n\n[Model download](https://huggingface.co/Acly/MobileSAM-GGUF/tree/main) | [Paper (arXiv)](https://arxiv.org/pdf/2306.14289.pdf) | [Repository (GitHub)](https://github.com/ChaoningZhang/MobileSAM) | [Segment-Anything-Model](https://segment-anything.com/) | License: Apache-2\n\n```sh\nvision-cli sam -m MobileSAM-F16.gguf -i input.png -p 300 200 -o mask.png --composite comp.png\n```\n\n#### BiRefNet\n\n\u003cimg width=\"400\" height=\"256\" alt=\"example-birefnet\" src=\"https://github.com/user-attachments/assets/6fce086d-cb89-4717-92a6-9f4a20532b3c\" /\u003e\n\n[Model download](https://huggingface.co/Acly/BiRefNet-GGUF/tree/main) | [Paper (arXiv)](https://arxiv.org/pdf/2401.03407) | [Repository (GitHub)](https://github.com/ZhengPeng7/BiRefNet) | License: MIT\n\n```sh\nvision-cli birefnet -m BiRefNet-lite-F16.gguf -i input.png -o mask.png --composite comp.png\n```\n\n#### MI-GAN\n\n\u003cimg width=\"400\" height=\"256\" alt=\"example-migan\" src=\"https://github.com/user-attachments/assets/cadf1994-7677-4822-94e5-a2ee6c07621f\" /\u003e\n\n[Model download](https://huggingface.co/Acly/MIGAN-GGUF/tree/main) | [Paper (thecvf.com)](https://openaccess.thecvf.com/content/ICCV2023/papers/Sargsyan_MI-GAN_A_Simple_Baseline_for_Image_Inpainting_on_Mobile_Devices_ICCV_2023_paper.pdf) | [Repository (GitHub)](https://github.com/Picsart-AI-Research/MI-GAN) | License: MIT\n\n```sh\nvision-cli migan -m MIGAN-512-places2-F16.gguf -i image.png mask.png -o output.png\n```\n\n#### Real-ESRGAN\n\n\u003cimg width=\"400\" height=\"256\" alt=\"example-esrgan\" src=\"https://github.com/user-attachments/assets/a41312d6-836c-4b11-ab5d-2e299ffee10c\" /\u003e\n\n[Model download](https://huggingface.co/Acly/Real-ESRGAN-GGUF) | [Paper (arXiv)](https://arxiv.org/abs/2107.10833) | [Repository (GitHub)](https://github.com/xinntao/Real-ESRGAN) | License: BSD-3-Clause\n\n```sh\nvision-cli esrgan -m ESRGAN-4x-foolhardy_Remacri-F16.gguf -i input.png -o output.png\n```\n\n\n### Converting models\n\nModels need to be converted to GGUF before they can be used. This will also\nrearrange or precompute tensors for more optimal inference.\n\nTo convert a model, install [uv](https://docs.astral.sh/uv/) and run:\n```sh\nuv run scripts/convert.py \u003carch\u003e MyModel.pth\n```\nwhere `\u003carch\u003e` is one of `sam, birefnet, esrgan, ...`.\n\nThis will create `models/MyModel.gguf`. See `convert.py --help` for more options.\n\n## Building\n\nBuilding requires CMake and a compiler with C++20 support.\n\n**Get the sources**\n```sh\ngit clone https://github.com/Acly/vision.cpp.git --recursive\ncd vision.cpp\n```\n\n**Configure and build**\n```sh\ncmake . -B build -D CMAKE_BUILD_TYPE=Release\ncmake --build build --config Release\n```\n\n### Vulkan _(Optional)_\n\nBuilding with Vulkan GPU support requires the [Vulkan SDK](https://www.lunarg.com/vulkan-sdk/) to be installed.\n\n```sh\ncmake . -B build -D CMAKE_BUILD_TYPE=Release -D VISP_VULKAN=ON\n```\n\n### Tests _(Optional)_\n\nBuild with `-DVISP_TESTS=ON`. Run all C++ tests with the following command:\n```sh\ncd build\nctest -C Release\n```\n\nSome tests require a Python environment. It can be set up with [uv](https://docs.astral.sh/uv/):\n```sh\n# Setup venv and install dependencies (once only)\nuv sync\n\n# Run python tests\nuv run pytest\n```\n\n## Performance\n\nPerformance optimization is an ongoing process. The aim is to be in the same ballpark\nas other frameworks for inference speed, but with:\n* much faster initialization and model loading time (\u003c100 ms)\n* lower memory overhead\n* tiny deployment size (\u003c5 MB for CPU, +30 MB for GPU)\n\n### Inference speed\n\n* CPU: AMD Ryzen 5 5600X (6 cores)\n* GPU: NVIDIA GeForce RTX 4070\n\n#### MobileSAM, 1024x1024\n\n|      |      | _vision.cpp_ | PyTorch | ONNX Runtime |\n| :--- | :--- | -----------: | ------: | -----------: |\n| cpu  | f32  |       669 ms |  601 ms |       805 ms |\n| gpu  | f16  |        19 ms |   16 ms |              |\n\n#### BiRefNet, 1024x1024\n\n| Model |      |      | _vision.cpp_ |  PyTorch | ONNX Runtime |\n| :---- | :--- | :--- | -----------: | -------: | -----------: |\n| Full  | cpu  | f32  |     16333 ms | 18800 ms |              |\n| Full  | gpu  | f16  |       243 ms |   140 ms |              |\n| Lite  | cpu  | f32  |      4505 ms | 10900 ms |      6978 ms |\n| Lite  | gpu  | f16  |        86 ms |    59 ms |              |\n\n#### MI-GAN, 512x512\n\n| Model       |      |      | _vision.cpp_ | PyTorch |\n| :---------- | :--- | :--- | -----------: | ------: |\n| 512-places2 | cpu  | f32  |       523 ms |  637 ms |\n| 512-places2 | gpu  | f16  |        21 ms |   17 ms |\n\n#### Setup\n\n* vision.cpp: using vision-bench, GPU via Vulkan, eg. `vision-bench -m sam -b cpu`\n* PyTorch: v2.7.1+cu128, eager eval, GPU via CUDA, average n iterations after warm-up\n\n## Dependencies (integrated)\n\n* [ggml](https://github.com/ggml-org/ggml) - ML tensor library | MIT\n* [stb-image](https://github.com/nothings/stb) - Image load/save/resize | Public Domain\n* [fmt](https://github.com/fmtlib/fmt) - String formatting _(only if compiler doesn't support \u0026lt;format\u0026gt;)_ | MIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Facly%2Fvision.cpp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Facly%2Fvision.cpp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Facly%2Fvision.cpp/lists"}