https://github.com/alladinian/Visionaire

Streamlined, ergonomic APIs around Apple's Vision framework
https://github.com/alladinian/Visionaire
computer-vision ios macos swift vision
Last synced: 4 months ago
JSON representation
Streamlined, ergonomic APIs around Apple's Vision framework
Host: GitHub
URL: https://github.com/alladinian/Visionaire
Owner: alladinian
Created: 2023-07-04T16:37:46.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-12-22T08:31:01.000Z (almost 2 years ago)
Last Synced: 2024-11-22T06:51:32.532Z (12 months ago)
Topics: computer-vision, ios, macos, swift, vision
Language: Swift
Homepage:
Size: 109 KB
Stars: 52
Watchers: 2
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # Visionaire 

>Streamlined, ergonomic APIs around Apple's Vision framework

![Swift](https://img.shields.io/badge/Swift-5.8+-ec775c?style=flat)

![iOS](https://img.shields.io/badge/iOS-13+-549bf5?style=flat)

![macOS](https://img.shields.io/badge/macOS-10.15+-549bf5?style=flat)

![macCatalyst](https://img.shields.io/badge/macCatalyst-13.1+-549bf5?style=flat)

![tvOS](https://img.shields.io/badge/tvOS-13+-549bf5?style=flat)

![Swift Package Manager](https://img.shields.io/badge/Swift_Package_Manager-Compatible-347d39?style=flat)

The main goal of `Visionaire` is to reduce ceremony and provide a concise set of APIs for Vision tasks.

Some of its features include:

- **Centralized list of all tasks**, available via the `VisionTaskType` enum (with platform availability checks).

- **Automatic image handling** for all supported image sources.

- **Convenience APIs for all tasks**, along with all available parameters for each task (with platform availability checks).

- Support for **custom CoreML models** (Classification, Image-To-Image, Object Recognition, Generic `VNCoreMLFeatureValueObservation`s).

- Support for **multiple task execution**, maintaining task type information in the results.

- Support for raw `VNRequest`s.

- All calls are **synchronous** (just like the original calls) - **no extra 'magic', assumptions or hidden juggling**.

- **SwiftUI extensions** for helping you **rapidly visualize results** (great for evaluation).

## Installation

`Visionaire` is provided as a Swift Package. You can add it to your project via [this repository's address](https://github.com/alladinian/Visionaire).

## Supported Vision Tasks

**All** Vision tasks are supported (including **iOS 17** & **macOS 14**, which are the latest production releases).

Expand to see a detailed list of all available tasks

| **Task** 
| -------------------------- 
| **Generate Feature Print** 
| **Person Segmentation** 
| **Document Segmentation** 
| **Attention Based Saliency** 
| **Objectness Based Saliency** 
| **Track Rectangle** 
| **Track Object** 
| **Detect Rectangles** 
| **Detect Face Capture Quality** 
| **Detect Face Landmarks** 
| **Detect Face Rectangles** 
| **Detect Human Rectangles** 
| **Detect Human Body Pose** 
| **Detect Human Hand Pose** 
| **Recognize Animals** 
| **Detect Trajectories** 
| **Detect Contours** 
| **Generate Optical Flow** 
| **Detect Barcodes** 
| **Detect Text Rectangles** 
| **Recognize Text** 
| **Detect Horizon** 
| **Classify Image** 
| **Translational Image Registration** 
| **Homographic Image Registration** 
| **Detect Human Body Pose (3D)** 
| **Detect Animal Body Pose** 
| **Track Optical Flow** 
| **Track Translational 
| **Track Homographic 
| **Generate Foreground Instance Mask**

| **Vision API**                                | **Visionaire Task**                      | **iOS** | **macOS** | **Mac Catalyst** | **tvOS** | ---------------- | --------------------------------------------- | ---------------------------------------- | -------:| ---------:| ---------------: | -------: | | VNGenerateImageFeaturePrintRequest            | .featurePrintGeneration                  |    13.0 |     10.15 |             13.1 |     13.0 | | VNGeneratePersonSegmentationRequest           | .personSegmentation                      |    15.0 |      12.0 |             15.0 |     15.0 | | VNDetectDocumentSegmentationRequest           | .documentSegmentation                    |    15.0 |      12.0 |             15.0 |     15.0 | | VNGenerateAttentionBasedSaliencyImageRequest  | .attentionSaliency                       |    13.0 |     10.15 |             13.1 |     13.0 | | VNGenerateObjectnessBasedSaliencyImageRequest | .objectnessSaliency                      |    13.0 |     10.15 |             13.1 |     13.0 | | VNTrackRectangleRequest                       | .rectangleTracking                       |    11.0 |     10.13 |             13.1 |     11.0 | | VNTrackObjectRequest                          | .objectTracking                          |    11.0 |     10.13 |             13.1 |     11.0 | | VNDetectRectanglesRequest                     | .rectanglesDetection                     |    11.0 |     10.13 |             13.1 |     11.0 | | VNDetectFaceCaptureQualityRequest             | .faceCaptureQuality                      |    13.0 |     10.15 |             13.1 |     13.0 | | VNDetectFaceLandmarksRequest                  | .faceLandmarkDetection                   |    11.0 |     10.13 |             13.1 |     11.0 | | VNDetectFaceRectanglesRequest                 | .faceDetection                           |    11.0 |     10.13 |             13.1 |     11.0 | | VNDetectHumanRectanglesRequest                | .humanRectanglesDetection                |    13.0 |     10.15 |             13.1 |     13.0 | | VNDetectHumanBodyPoseRequest                  | .humanBodyPoseDetection                  |    14.0 |      11.0 |             14.0 |     14.0 | | VNDetectHumanHandPoseRequest                  | .humanHandPoseDetection                  |    14.0 |      11.0 |             14.0 |     14.0 | | VNRecognizeAnimalsRequest                     | .animalDetection                         |    13.0 |     10.15 |             13.1 |     13.0 | | VNDetectTrajectoriesRequest                   | .trajectoriesDetection                   |    14.0 |      11.0 |             14.0 |     14.0 | | VNDetectContoursRequest                       | .contoursDetection                       |    14.0 |      11.0 |             14.0 |     14.0 | | VNGenerateOpticalFlowRequest                  | .opticalFlowGeneration                   |    14.0 |      11.0 |             14.0 |     14.0 | | VNDetectBarcodesRequest                       | .barcodeDetection                        |    11.0 |     10.13 |             13.1 |     11.0 | | VNDetectTextRectanglesRequest                 | .textRectanglesDetection                 |    11.0 |     10.13 |             13.1 |     11.0 | | VNRecognizeTextRequest                        | .textRecognition                         |    13.0 |     10.15 |             13.1 |     13.0 | | VNDetectHorizonRequest                        | .horizonDetection                        |    11.0 |     10.13 |             13.1 |     11.0 | | VNClassifyImageRequest                        | .imageClassification                     |    13.0 |     10.15 |             13.1 |     13.0 | | VNTranslationalImageRegistrationRequest       | .translationalImageRegistration          |    11.0 |     10.13 |             13.1 |     11.0 | | VNHomographicImageRegistrationRequest         | .homographicImageRegistration            |    11.0 |     10.13 |             13.1 |     11.0 | | VNDetectHumanBodyPose3DRequest                | .humanBodyPoseDetection3D                |    17.0 |      14.0 |             17.0 |     17.0 | | VNDetectAnimalBodyPoseRequest                 | .animalBodyPoseDetection                 |    17.0 |      14.0 |             17.0 |     17.0 | | VNTrackOpticalFlowRequest                     | .opticalFlowTracking                     |    17.0 |      14.0 |             17.0 |     17.0 | Image Registration** | VNTrackTranslationalImageRegistrationRequest  | .translationalImageRegistrationTracking  |    17.0 |      14.0 |             17.0 |     17.0 | Image Registration**   | VNTrackHomographicImageRegistrationRequest    | .homographicImageRegistrationTracking    |    17.0 |      14.0 |             17.0 |     17.0 | | VNGenerateForegroundInstanceMaskRequest       | .foregroundInstanceMaskGeneration        |    17.0 |      14.0 |             17.0 |     17.0 |

## Supported Image Sources

- `CGImage`

- `CIImage`

- `CVPixelBuffer`

- `CMSampleBuffer`

- `Data`

- `URL`

## Examples

The main class for interfacing is called `Visionaire`. 

It's an `ObservableObject` and reports processing through a published property called `isProcessing`.

You can execute tasks on the `shared` Visionaire singleton or on your own instance (useful if you want to have separate processors reporting on their own).

There are two sets of apis: convenience methods & task-based methods.

Convenience methods have the benefit of returning typed results while tasks can be submitted en masse.

### Single task execution (convenience apis):

```swift

DispatchQueue.global(qos: .userInitiated).async {

    do {

        let image   = /* any supported image source, such as CGImage, CIImage, CVPixelBuffer, CMSampleBuffer, Data or URL */

        let horizon = try Visionaire.shared.horizonDetection(imageSource: image) // The result is a `VNHorizonObservation`

        let angle   = horizon.angle

        // Do something with the horizon angle

    } catch {

        print(error)

    }

}

```

### Custom CoreML model (convenience apis):

```swift

// Create an instance of your model

let yolo: MLModel = {

    // Tell Core ML to use the Neural Engine if available.

    let config = MLModelConfiguration()

    config.computeUnits = .all

    // Load your custom model

    let yolo = try! yolo(configuration: config)

    return yolo.model

}()

    

// Optionally create a feature provider to setup custom model attributes

class YoloFeatureProvider: MLFeatureProvider {

    var values: [String : MLFeatureValue] {

        [

            "iouThreshold": MLFeatureValue(double: 0.45),

            "confidenceThreshold": MLFeatureValue(double: 0.25)

        ]

    }

    var featureNames: Set {

        Set(values.keys)

    }

    func featureValue(for featureName: String) -> MLFeatureValue? {

        values[featureName]

    }

}

// Perform the task

let detectedObjectObservations = try visionaire.customRecognition(imageSource: image,

                                                                        model: try! VNCoreMLModel(for: yolo),

                                                        inputImageFeatureName: "image",

                                                              featureProvider: YoloFeatureProvider(),

                                                      imageCropAndScaleOption: .scaleFill)

```

### Single task execution (task-based apis):

```swift

DispatchQueue.global(qos: .userInitiated).async {

    do {

        let image       = /* any supported image source, such as CGImage, CIImage, CVPixelBuffer, CMSampleBuffer, Data or URL */

        let result      = try Visionaire.shared.perform(.horizonDetection, on: image) // The result is a `VisionTaskResult`

        let observation = result.observations.first as? VNHorizonObservation

        let angle       = observation?.angle

        // Do something with the horizon angle

    } catch {

        print(error)

    }

}

```

### Multiple task execution (task-based apis):

```swift

DispatchQueue.global(qos: .userInitiated).async {

    do {

        let image   = /* any supported image source, such as CGImage, CIImage, CVPixelBuffer, CMSampleBuffer, Data or URL */

        let results = try Visionaire.shared.perform([.horizonDetection, .personSegmentation(qualityLevel: .accurate)], on: image)

        for result in results {

            switch result.taskType {

            case .horizonDetection:

                let horizon = result.observations.first as? VNHorizonObservation

                // Do something with the observation

            case .personSegmentation:

                let segmentationObservations = result.observations as? [VNPixelBufferObservation]

                // Do something with the observations

            default:

                break

            }

        }   

    } catch {

        print(error)

    }

}

```

## Task configuration

All tasks can be configured with "modifier" style calls for common options.

An example using all the available options:

```swift

let segmentation = VisionTask.personSegmentation(qualityLevel: .accurate)

    .preferBackgroundProcessing(true)

    .usesCPUOnly(false)

    .regionOfInterest(CGRect(x: 0, y: 0, width: 0.5, height: 0.5))

    .latestRevision() // You can also use .revision(n)

let result = try Visionaire.shared.perform([.horizonDetection, segmentation], on: image) // The result is a `VisionTaskResult`

```

## SwiftUI Extensions

There are also some SwiftUI extensions available in order to help you visualize results for quick evaluation.

**Detected Object Observations**

```swift

Image(myImage)

    .resizable()

    .aspectRatio(contentMode: .fit)

    .drawObservations(detectedObjectObservations) {

        Rectangle()

            .stroke(Color.blue, lineWidth: 2)

    }

```

![image](https://github.com/alladinian/Visionaire/assets/156458/70b4a0dd-dcf7-4c15-8ccb-cd37910e6a35)

**Rectangle Observations**

```swift

Image(myImage)

    .resizable()

    .aspectRatio(contentMode: .fit)

    .drawQuad(rectangleObservations) { shape in

        shape

            .stroke(Color.green, lineWidth: 2)

    }

```

![image](https://github.com/alladinian/Visionaire/assets/156458/9cc38998-e069-414b-8fae-bb5584ee48ec)

**Face Landmarks**

Note: For Face Landmarks you can specify individual characteristics or groups for visualization. The available options are available through the `FaceLandmarks` OptionSet and they are:

`constellation`, `contour`, `leftEye`, `rightEye`, `leftEyebrow`, `rightEyebrow`, `nose`, `noseCrest`, `medianLine`, `outerLips`, `innerLips`, `leftPupil`, `rightPupil`, `eyes`, `pupils`, `eyeBrows`, `lips` and `all`.

```swift

Image(myImage)

    .resizable()

    .aspectRatio(contentMode: .fit)

    .drawFaceLandmarks(faceObservations, landmarks: .all) { shape in

        shape

            .stroke(.red, style: .init(lineWidth: 2, lineJoin: .round))

    }

```

![image](https://github.com/alladinian/Visionaire/assets/156458/f63e6646-a2ce-4f82-bcdd-1ef30160ddb6)

**Person Segmentation Mask**

```swift

Image(myImage)

    .resizable()

    .aspectRatio(contentMode: .fit)

    .visualizePersonSegmentationMask(pixelBufferObservations)

```

![image](https://github.com/alladinian/Visionaire/assets/156458/72536049-3547-4c89-994c-4b46aee4e295)

**Human Body Pose**

```swift

Image(myImage)

    .resizable()

    .aspectRatio(contentMode: .fit)

    .visualizeHumanBodyPose(humanBodyPoseObservations) { shape in

        shape

            .fill(.red)

    }

```

![image](https://github.com/alladinian/Visionaire/assets/156458/dc56da48-ac80-4723-8403-dea660c73c20)

**Contours**

```swift

Image(myImage)

    .resizable()

    .aspectRatio(contentMode: .fit)

    .visualizeContours(contoursObservations) { shape in

        shape

            .stroke(.red, style: .init(lineWidth: 2, lineJoin: .round))

    }

```

![image](https://github.com/alladinian/Visionaire/assets/156458/ee4d9e63-3e37-494e-94d4-63ae2c72dc0a)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/alladinian/Visionaire

Awesome Lists containing this project

README