{"id":20951197,"url":"https://github.com/lingdong-/visionosc","last_synced_at":"2025-12-29T00:02:11.387Z","repository":{"id":62642566,"uuid":"544104254","full_name":"LingDong-/VisionOSC","owner":"LingDong-","description":"PoseOSC + FaceOSC + HandOSC + OcrOSC + CatOSC + DogOSC","archived":false,"fork":false,"pushed_at":"2023-11-02T11:13:09.000Z","size":3297,"stargazers_count":97,"open_issues_count":12,"forks_count":8,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-01-19T22:48:36.818Z","etag":null,"topics":["computer-vision","facial-landmarks-detection","hand-tracking","ocr","osc","pose-estimation","vision-framework"],"latest_commit_sha":null,"homepage":"","language":"Objective-C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LingDong-.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-01T16:53:27.000Z","updated_at":"2024-11-05T21:31:07.000Z","dependencies_parsed_at":"2025-01-19T22:42:44.513Z","dependency_job_id":"8278d165-8798-4fcb-b731-9ca70227dfc3","html_url":"https://github.com/LingDong-/VisionOSC","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LingDong-%2FVisionOSC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LingDong-%2FVisionOSC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LingDong-%2FVisionOSC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LingDong-%2FVisionOSC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LingDong-","download_url":"https://codeload.github.com/LingDong-/VisionOSC/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243347385,"owners_count":20276176,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","facial-landmarks-detection","hand-tracking","ocr","osc","pose-estimation","vision-framework"],"created_at":"2024-11-19T00:57:27.201Z","updated_at":"2025-12-29T00:02:11.166Z","avatar_url":"https://github.com/LingDong-.png","language":"Objective-C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Vision OSC\n\n= PoseOSC + FaceOSC + HandOSC + OcrOSC + CatOSC + DogOSC\n\n**[Download](https://github.com/LingDong-/VisionOSC/releases)** | **[Example](demos/VisionOSCProcessingReceiver/VisionOSCProcessingReceiver.pde)**\n\nSend (almost) all Apple [Vision](https://developer.apple.com/documentation/vision) Framework's detection results via [OSC](https://opensoundcontrol.stanford.edu/). (You can pick which one(s) to detect \u0026 send). Written in [openFrameworks](https://openframeworks.cc/) using Objective-C++. macOS 11+ only. \n\n![](screenshots/screenshot000.png)\n\nInspired by [PoseOSC](https://github.com/LingDong-/PoseOSC), but faster, no more electron-bloat or horse neighing hacks. Compatible with the `ARR` format of PoseOSC.\n\n## How to re-build in Xcode\n\nDo not attempt to re-build (with projectGenerator) unless absolutely necessary, in which case follow the following steps:\n\n- File \u003e Project Settings, Build System -\u003e New Build System\n- Left sidebar, click project name, General \u003e Frameworks, Libraries,... Add Vision, AVKit, Foundation, AVFoundation, CoreML\n- Build Phases, Link Binary with Libraries, Change \"mac catalyst\" to \"always\"\n- Change deployment target to 11.3\n- For each file in src folder, on right sidebar, change file \"Type\" to Objective C++ (not extension, just in the dropdown menu)\n\n\nIf you encounter `Undefined symbol: __objc_msgSend$identifier`, you might need to set `Excluded Architectures arm64`. See [this issue](https://github.com/LingDong-/VisionOSC/issues/2) for details.\n\nIf there're some complaints about ARC, you might need to remove all mentions of `autorelease` in `src` folder.\n\n## How to Use\n\nSettings in `settings.xml` will be loaded upon start.\n\nIn the packaged app, the `settings.xml` can be found in `Contents/Resources`.\n\nSee [demos/VisionOSCProcessingReceiver](demos/VisionOSCProcessingReceiver) for a [Processing](https://processing.org/) demo receiving all the detection types.\n\n### Receiving Poses from OSC\n\nThis is the same as [`ARR` format of PoseOSC](https://github.com/LingDong-/PoseOSC#method-4-arr), copied below:\n\nARR will be sent to `poses/arr` OSC Address as an array of values (OSC spec allows multiple values of different types for each address).\n\n- The first value (int) is width of the frame.\n- The second value (int) is height of the frame.\n- The third value (int) is the number of poses. (When you read this value, you'll know how many more values to read, i.e. `nPoses*(1+17*3)`. So if this number is 0 it means no pose is detected, so you can stop reading).\n- The next 52 values are data for the first pose, and the 52 values after that are data for the second pose (if there is), and so on...\n- For each pose, the first value (float) is the score for that pose, the rest 51 values (floats) can be divided into 17 groups of 3, with each group being (x,y,score) of a keypoint. For the ordering of keypoints, see [PoseNet spec](https://github.com/tensorflow/tfjs-models/tree/master/posenet).\n\n### Receiving Faces from OSC\n\nSimilar to pose format (see above); sent to `faces/arr` OSC Address:\n\n- The first value (int) is width of the frame.\n- The second value (int) is height of the frame.\n- The third value (int) is the number of faces.\n- The next 229 values are data for the first face, and the 229 values after that are data for the second face (if there is), and so on...\n- For each face, the first value (float) is the score for that face, the rest 228 values (floats) can be divided into 76 groups of 3, with each group being (x,y,score) of a keypoint.\n\n### Receiving Hands from OSC\n\nSimilar to pose format (see above); sent to `hands/arr` OSC Address:\n\n- The first value (int) is width of the frame.\n- The second value (int) is height of the frame.\n- The third value (int) is the number of hands.\n- The next 64 values are data for the first hand, and the 64 values after that are data for the second hand (if there is), and so on...\n- For each hand, the first value (float) is the score for that hand, the rest 63 values (floats) can be divided into 21 groups of 3, with each group being (x,y,score) of a keypoint. For the ordering of the keypoints, see [handpose spec](https://google.github.io/mediapipe/solutions/hands.html)\n\n\n### Receiving Texts (OCR) from OSC\n\nSent to `texts/arr` OSC Address:\n\n- The first value (int) is width of the frame.\n- The second value (int) is height of the frame.\n- The third value (int) is the number of text regions.\n- The next 6 values are data for the first text, and the 6 values after that are data for the second text (if there is), and so on...\n- For each text, the first value (float) is the score for that text, the next four values (float) are the (left,top,width,height) of the bounding box. The last value is what the text says (string).\n\n### Receiving Animal detections from OSC\n\nCurrently only cats and dogs are supported, per [Apple's documentation](https://developer.apple.com/documentation/vision/vnanimalidentifier).\n\nSimilar to texts format (see above); sent to `animals/arr` OSC Address:\n\n- The first value (int) is width of the frame.\n- The second value (int) is height of the frame.\n- The third value (int) is the number of animals.\n- The next 6 values are data for the first animal, and the 6 values after that are data for the second animal (if there is), and so on...\n- For each animal, the first value (float) is the score for that animal, the next four values (float) are the (left,top,width,height) of the bounding box. The last value is what the animal is (string): \"Cat\"/\"Dog\".\n\nThe `JSON` and `XML` formats supported by PoseOSC are now excluded because I've since realized it's a silly idea to add this sort of parsing overhead. Let me know if you have a case against this decision.\n\nI recommand [Protokol](https://hexler.net/protokol) for testing/inspecting OSC.\n\n\n## Framerates\n\nTested on MacBook Pro (13-inch, M1, 2020) Memory 16 GB.\n\n- Body: 60 FPS\n- Hand: 60 FPS\n- Face: 45 FPS\n- Text: 10 FPS\n- Animal: 60FPS\n- Face, body, \u0026 hand: 25 FPS\n- Everything all on: 5 FPS\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flingdong-%2Fvisionosc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flingdong-%2Fvisionosc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flingdong-%2Fvisionosc/lists"}