{"id":15638966,"url":"https://github.com/cansik/deep-vision-processing","last_synced_at":"2025-04-15T16:42:55.540Z","repository":{"id":37087340,"uuid":"242500562","full_name":"cansik/deep-vision-processing","owner":"cansik","description":"Deep computer-vision algorithms for the Processing framework.","archived":false,"fork":false,"pushed_at":"2022-12-16T14:30:57.000Z","size":13052,"stargazers_count":94,"open_issues_count":9,"forks_count":23,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-28T22:35:12.592Z","etag":null,"topics":["classification","computer-vision","cuda-support","deep-neural-networks","inference-engine","machine-learning","pose-estimation","processing"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cansik.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-23T10:57:12.000Z","updated_at":"2025-01-28T11:09:09.000Z","dependencies_parsed_at":"2023-01-29T14:15:19.137Z","dependency_job_id":null,"html_url":"https://github.com/cansik/deep-vision-processing","commit_stats":null,"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cansik%2Fdeep-vision-processing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cansik%2Fdeep-vision-processing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cansik%2Fdeep-vision-processing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cansik%2Fdeep-vision-processing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cansik","download_url":"https://codeload.github.com/cansik/deep-vision-processing/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249110559,"owners_count":21214352,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","computer-vision","cuda-support","deep-neural-networks","inference-engine","machine-learning","pose-estimation","processing"],"created_at":"2024-10-03T11:24:12.911Z","updated_at":"2025-04-15T16:42:55.515Z","avatar_url":"https://github.com/cansik.png","language":"Java","funding_links":[],"categories":["Libraries"],"sub_categories":["Contributions"],"readme":"# Deep Vision Processing [![Build](https://github.com/cansik/deep-vision-processing/actions/workflows/build.yml/badge.svg)](https://github.com/cansik/deep-vision-processing/actions/workflows/build.yml)\nDeep computer-vision algorithms for [Processing](https://processing.org/).\n\nThe idea behind this library is to provide a simple way to use (inference) machine learning algorithms for computer vision tasks inside Processing. Mainly portability and easy-to-use are the primary goals of this library. Starting with version `0.6.0` CUDA inferencing support is built into the library (Windows \u0026 Linux).\n\n_Caution_: The API is still in development and can change at any time.\n\n![Pose](readme/pose.jpg)\n\n*Lightweight OpenPose Example*\n\n## Install\nIt is recommended to use the contribution manager in the Processing app to install the library.\n\n![image](https://user-images.githubusercontent.com/5220162/118391536-05b1ea80-b635-11eb-9704-2c5b780008df.png)\n\n### Manual\nDownload the [latest](https://github.com/cansik/deep-vision-processing/releases/tag/v0.8.1-alpha) prebuilt version from the [release](https://github.com/cansik/deep-vision-processing/releases) sections and install it into your Processing library folder.\n\n## Usage\nThe base of the library is the `DeepVision` class. It is used to download the pretrained models and create new networks.\n\n```java\nimport ch.bildspur.vision.*;\nimport ch.bildspur.vision.network.*;\nimport ch.bildspur.vision.result.*;\n\nDeepVision vision = new DeepVision(this);\n```\n\nUsually it makes sense to define the network globally for your sketch and create it in setup. The `create` method downloads the pre-trained weights if they are not already existing. The network first has to be created and then be setup.\n\n```java\nYOLONetwork network;\n\nvoid setup() {\n  // create the network \u0026 download the pre-trained models\n  network = vision.createYOLOv3();\n\n  // load the model\n  network.setup();\n  \n  // set network settings (optional)\n  network.setConfidenceThreshold(0.2f);\n  \n  ...\n}\n```\n\nBy default, the weights are stored in the library folder of Processing. If you want to download them to the sketch folder, use the following command:\n\n```java\n// download to library folder\nvision.storeNetworksGlobal();\n\n// download to sketch/networks\nvision.storeNetworksInSketch();\n```\n\nEach network has a `run()` method, which takes an image as a parameter and outputs a result. You can just throw in any PImage and the library starts processing it.\n\n```java\nPImage myImg = loadImage(\"hello.jpg\");\nResultList\u003cObjectDetectionResult\u003e detections = network.run(myImg);\n```\n\nPlease have a look at the specific networks for further information or at the [examples](examples).\n\n### OpenCL Backend Support\nWith version `0.8.1` by default if OpenCL is enabled, it will be used as backend. If CUDA is enabled too, CUDA will be preferred. It is possible to force the CPU backend by setting the following option:\n\n```java\nDeepVision vision = new DeepVision(this);\nvision.setUseDefaultBackend(true);\n```\n\n### CUDA Backend Support\nWith version `0.6.0` it is possible to [download the CUDA bundled libraries](https://github.com/cansik/deep-vision-processing/releases/tag/v0.8.1-alpha). This enables to run most of the DNN's on CUDA enabled graphics cards. For most networks this is necessary to run them in real-time. If you have the cuda-bundled version installed and run deep-vision on a Linux or Windows with an NVIDIA graphics card, you are able to enable the CUDA backend:\n\n```java\n// Second parameter (enableCUDABackend) enables CUDA\nDeepVision vision = new DeepVision(this, true);\n```\n\nIf the second parameter is unset, the library will check if a CUDA enabled device is available and enables the backend likewise. It is possible to check if CUDA backend has been enabled by the following method:\n\n```java\nprintln(\"Is CUDA Enabled: \" + vision.isCUDABackendEnabled());\n```\n\nIf CUDA is enabled but the hardware does not support it, Processing will show you a warning and run the networks on the CPU.\n\n## Networks\n\nHere you find a list of implemented networks:\n\n- Object Detection ✨\n\t- YOLOv3-tiny\n\t- YOLOv3-tiny-prn\n\t- EfficientNetB0-YOLOv3\n\t- YOLOv3 OpenImages Dataset\n\t- YOLOv3-spp ([spatial pyramid pooling](https://stackoverflow.com/a/55014630/1138326))\n\t- YOLOv3\n\t- YOLOv4\n\t- YOLOv4-tiny\n    - [YOLOv5](https://github.com/ultralytics/yolov5/) (n, s, m, l, x)\n\t- [YOLO Fastest \u0026 XL](https://github.com/dog-qiuqiu/Yolo-Fastest)\n\t- SSDMobileNetV2\n\t- Handtracking based on SSDMobileNetV2\n\t- TextBoxes\n\t- Ultra-Light-Fast-Generic-Face-Detector-1MB RFB (~30 FPS on CPU)\n\t- Ultra-Light-Fast-Generic-Face-Detector-1MB Slim (~40 FPS on CPU)\n\t- Cascade Classifier\n- Object Segmentation\n    - Mask R-CNN\n- Object Recognition 🚙\n    - Tesseract LSTM\n- Keypoint Detection 🤾🏻‍♀️\n\t- Facial Landmark Detection\n\t- Single Human Pose Detection based on lightweight openpose\n- Classification 🐈\n    - MNIST CNN\n    - FER+ Emotion\n    - Age Net\n    - Gender Net\n- Depth Estimation 🕶\n  - MidasNet\n- Image Processing\n    - Style Transfer\n    - Multiple Networks for x2 x3 x4 Superresolution\n\nThe following list shows the networks that are on the list to be implemented (⚡️ already in progress):\n\n* YOLO 9K (not supported by OpenCV)\n* Multi Human Pose Detection ⚡️ (currently struggling with the partial affinity fields 🤷🏻‍♂️ help?)\n* TextBoxes++ ⚡️\n* [CRNN](https://github.com/bgshih/crnn) ⚡️\n* [PixelLink](https://github.com/ZJULearning/pixel_link)\n\n\n### Object Detection\nLocating one or multiple predefined objects in an image is the task of the object detection networks.\n\n![YOLO](readme/yolo.jpg)\n\n*YOLO Example*\n\nThe result of these networks is usually a list of `ObjectDetectionResult`.\n\n```java\nObjectDetectionNetwork net = vision.createYOLOv3();\nnet.setup();\n\n// detect new objects\nResultList\u003cObjectDetectionResult\u003e detections = net.run(image);\n\nfor (ObjectDetectionResult detection : detections) {\n    println(detection.getClassName() + \"\\t[\" + detection.getConfidence() + \"]\");\n}\n```\n\nEvery object detection result contains the following fields:\n\n* `getClassId()` - id of the class the object belongs to\n* `getClassName()` - name of the class the object belongs to\n* `getConfidence()` - how confident the network is on this detection\n* `getX()` - x position of the bounding box\n* `getY()` - y position of the bounding box\n* `getWidth()` - width of the bounding box\n* `getHeight()` - height of the bounding box\n\n#### YOLO [[Paper](https://pjreddie.com/darknet/yolo/)]\nYOLO a very fast and accurate single shot network. The pre-trained model is trained on the 80 classes COCO dataset. There are three different weights \u0026 models available in the repository:\n\n- YOLOv3-tiny (very fast, but trading performance for accuracy)\n- YOLOv3-spp (original model using [spatial pyramid pooling](https://stackoverflow.com/a/55014630/1138326))\n- YOLOv3 (608)\n- YOLOv4 (608)\n- YOLOv4-tiny (416)\n- YOLOv5n (640)\n- YOLOv5s (640)\n- YOLOv5m (640)\n- YOLOv5l (640)\n- YOLOv5x (640)\n\n```java\n// setup the network\nYOLONetwork net = vision.createYOLOv4();\nYOLONetwork net = vision.createYOLOv4Tiny();\nYOLONetwork net = vision.createYOLOv3();\nYOLONetwork net = vision.createYOLOv3SPP();\nYOLONetwork net = vision.createYOLOv3Tiny();\nYOLONetwork net = vision.createYOLOv5n();\nYOLONetwork net = vision.createYOLOv5s();\nYOLONetwork net = vision.createYOLOv5m();\nYOLONetwork net = vision.createYOLOv5l();\nYOLONetwork net = vision.createYOLOv5x();\n\n// set confidence threshold\nnet.setConfidenceThreshold(0.2f);\n```\n\n* [Basic Example YOLO](examples/YOLODetectObjects)\n* [WebCam Example YOLO](examples/YOLOWebcamExample)\n* [RealSense Example YOLO](examples/RealSenseYoloDetector)\n\n#### YOLOv5\nSince version `0.9.0` YOLOv5 is implemented as well. It uses the pre-trained models converted into the ONNX format. At the moment YOLOv5 does not work well with the implemented NMS. To adjust the settings of the NMS use the following functions.\n\n```\n// set confidence threshold\nnet.setConfidenceThreshold(0.2f);\n\n// set confidence threshold\nnet.set(0.2f);\n\n// set the IoU threshold (overlapping of the bounding boxes)\nnet.setNmsThreshold(0.4f);\n\n// set how many objects should be taken into account for nms\n// 0 means all objects\nnet.setTopK(100);\n```\n\n#### SSDMobileNetV2 [[Paper](https://arxiv.org/abs/1512.02325)]\nThis network is a single shot detector based on the mobilenetv2 architecture. It is pre-trained on the 90 classes COCO dataset and is really fast.\n\n```java\nSSDMobileNetwork net = vision.createMobileNetV2();\n```\n\n* [WebCam Example MobileNet](examples/MobileNetObjectDetectorWebcam)\n\n#### Handtracking [[Project](https://github.com/victordibia/handtracking)]\nThis is a pre-trained SSD MobilenetV2 network to detect hands.\n\n```java\nSSDMobileNetwork net = vision.createHandDetector();\n```\n\n* [Hand Detector WebCam Example](examples/HandDetectorWebcam)\n\n#### TextBoxes [[Paper](https://arxiv.org/abs/1611.06779)]\nTextBoxes is a scene text detector in the wild based on SSD MobileNet. It is able to detect text in a scene and return its location.\n\n```java\nTextBoxesNetwork net = vision.createTextBoxesDetector();\n```\n\n#### Ultra-Light-Fast-Generic-Face-Detector [[Project](https://github.com/Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB)]\nULFG Face Detector is a very fast CNN based face detector which reaches up to 40 FPS on a MacBook Pro. The face detector comes with four different pre-trained weights:\n\n* RFB640 \u0026 RFB320 - More accurate but slower detector\n* Slim640 \u0026 Slim320 - Less accurate but faster detector\n  \n```java\nULFGFaceDetectionNetwork net = vision.createULFGFaceDetectorRFB640();\nULFGFaceDetectionNetwork net = vision.createULFGFaceDetectorRFB320();\nULFGFaceDetectionNetwork net = vision.createULFGFaceDetectorSlim640();\nULFGFaceDetectionNetwork net = vision.createULFGFaceDetectorSlim320();\n```\n\nThe detector detects only the frontal face part and not the complete head. Most algorithms that run on results of face detections need a rectangular detection shape.\n\n* [Face Detector Example](examples/FaceDetectorExample)\n* [Face Detector WebCam Example](examples/FaceDetectorCNNWebcam)\n\n#### Cascade Classifier [[Paper](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.110.4868)]\nThe cascade classifier detector is based on boosting and very common as pre-processor for many classifiers.\n\n```java\nCascadeClassifierNetwork net = vision.createCascadeFrontalFace();\n```\n\n* [Face Detector Haar Webcam Example](examples/FaceDetectorHaarWebcam)\n\n### Object Recognition\ntbd\n\n### KeyPoint Detection\ntbd\n\n### Classification\ntbd\n\n### Depth Estimation\n\n#### MidasNet\n\nTowards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer\n\n![](readme/midasnet.jpg)\n\n*MidasNet* \n\n### Image Processing\ntbd\n\n## Pipeline\nIt is possible to create network pipelines to use for example a face-detection network and different classifier for each face. This is not yet documented so you have to check out the test code:\n[HumanAttributesPipelineTest.java#L36-L41](https://github.com/cansik/deep-vision-processing/blob/master/src/test/java/ch/bildspur/vision/test/HumanAttributesPipelineTest.java#L36-L41)\n\n## Build\n- Install JDK 8 (because of Processing) (JDK 11 for Processing 4)\n\nRun gradle to build a new release package under `/release/deepvision.zip`:\n\n```bash\n# windows\ngradlew.bat releaseProcessingLib\n\n# mac / unix\n./gradlew releaseProcessingLib\n```\n\n### Cuda Support\nTo build with CUDA support enable the property `cuda`:\n\n```bash\ngradlew.bat releaseProcessingLib -Pcuda -Pdisable-fatjar\n```\n\n*This will take several minutes and result in a `5.3 GB` folder.*\n*`disable-fatjar` prevents form creating a fatjar, which would be too big to be zipped.*\n\n### Platform Specific\nTo build only on a specific platform use the property `javacppPlatform`:\n\n```bash\n# builds with support for all platforms\ngradlew.bat releaseProcessingLib -PjavacppPlatform=linux-x86_64,macosx-x86_64,macosx-arm64,windows-x86_64,linux-armhf,linux-arm64\n```\n\n## FAQ\n\n\u003e Why is xy network not implemented?\n\nPlease open an issue if you have a cool network that could be implemented or just contribute a PR.\n\n\u003e Why is it no possible to train my own network?\n\nThe idea was to give artist and makers a simple tool to run networks inside of Processing. To train a network needs a lot of specific knowledge about Neural Networks (CNN in specific).\n\nOf course it is possible to train your own YOLO or SSDMobileNet and use the weights with this library. Check out the following example for detection facemasks: \n[cansik/yolo-mask-detection](https://github.com/cansik/yolo-mask-detection)\n\n\u003e Is it compatible with Greg Borensteins [OpenCV for Processing](https://github.com/atduskgreg/opencv-processing)?\n\nNo, OpenCV for Processing uses the direct OpenCV Java bindings instead of JavaCV. Please only include either one library, because Processing gets confused if two OpenCV packages are imported.\n\n## About\nMaintained by [cansik](https://github.com/cansik) with the help of the following dependencies:\n\n- [bytedeco/javacv](https://github.com/bytedeco/javacv)\n- [atduskgreg/opencv-processing](https://github.com/atduskgreg/opencv-processing)\n\nStock images from the following peoples have been used:\n\n- yoga.jpg by Yogendra Singh from Pexels\n- office.jpg by [fauxels](https://www.pexels.com/@fauxels) from Pexels\n- faces.png by [shvetsa](https://www.pexels.com/@shvetsa) from Pexels\n- hand.jpg by Thought Catalog on Unsplash\n- sport.jpg by John Torcasio on Unsplash\n- sticker.jpg by 🇨🇭 Claudio Schwarz | @purzlbaum on Unsplash\n- children.jpg by [Sandeep Kr Yadav](https://unsplash.com/@fiftymm)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcansik%2Fdeep-vision-processing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcansik%2Fdeep-vision-processing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcansik%2Fdeep-vision-processing/lists"}