Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/andreimoraru123/oak-d-etector

Single Shot MultiBox Detector deployed on a OAK-D Lite cam via DepthAI
https://github.com/andreimoraru123/oak-d-etector
camera computer-vision deep-learning deployment depthai hardware hardware-acceleration intel luxonis neural-compute-stick-2 oak-d object-detection onnx opencv openvino pytorch rgb-camera single-shot-multibox-detector ssd vpu
Last synced: 14 days ago
JSON representation
Single Shot MultiBox Detector deployed on a OAK-D Lite cam via DepthAI
Host: GitHub
URL: https://github.com/andreimoraru123/oak-d-etector
Owner: AndreiMoraru123
License: mit
Created: 2023-02-02T15:31:51.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-09-02T10:46:37.000Z (over 1 year ago)
Last Synced: 2025-01-10T17:50:50.888Z (21 days ago)
Topics: camera, computer-vision, deep-learning, deployment, depthai, hardware, hardware-acceleration, intel, luxonis, neural-compute-stick-2, oak-d, object-detection, onnx, opencv, openvino, pytorch, rgb-camera, single-shot-multibox-detector, ssd, vpu
Language: Python
Homepage:
Size: 90.3 MB
Stars: 6
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

        # SSD Object Detection on OAK-D Lite via DepthAI

  

    

      

  

  

      

  

  

      

  

  

      

  

  

      

  

  

      

  

# Demo



  



# Intro

The original PyTorch implementation of the model, and the one that I am following here, is [this one](https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Object-Detection). This is a fantastic guide by itself and I did not modify much as of now. The model was trained on [Pascal VOC 2007-2012](http://host.robots.ox.ac.uk/pascal/VOC/). Here is a [mirror](https://pjreddie.com/projects/pascal-voc-dataset-mirror/).

The goal for this project is to get to deploy such a custom model on real hardware, rather than neural network design.

In this regard, I am using a [Luxonis OAK-D Lite](https://shop.luxonis.com/products/oak-d-lite-1) and an [Intel Neural Compute Stick 2](https://www.intel.com/content/www/us/en/developer/articles/guide/get-started-with-neural-compute-stick.html). Funnily enough, just as I finished this, the NCS2 became outdated, as you can see on the main page, since [Intel will be discontinuing it](https://www.intel.com/content/www/us/en/developer/articles/tool/neural-compute-stick.html). But that is besides the point here, as the main focus is deployment on specific hardware, whatever that hardware may be. 

Namely, we are looking at VPU's, or [vision processing units](https://en.wikipedia.org/wiki/Vision_processing_unit).

One such AI accelerator can be found in the [OAK-D camera](https://github.com/luxonis/depthai-hardware/blob/master/DM9095_OAK-D-LITE_DepthAI_USB3C/Datasheet/OAK-D-Lite_Datasheet.pdf) itself! 

# Setup 

In order to communicate with the hardware, I am using both [DepthAI's api](https://docs.luxonis.com/en/latest/) to communicate with the RGB camera of the OAK-D, and Intel's [OpenVINO](https://docs.openvino.ai/latest/home.html) (Open Visual Inference and Neural Optimization) for deployment, both of which are still very much state of the art in Edge AI.

Now, in order to use OpenVINO with hardware, I have to [download the distribution toolkit](https://docs.openvino.ai/2021.1/openvino_docs_install_guides_installing_openvino_windows.html). For compiling and running apps, the library prefers to set up temporary variables, so we will do it that way. 

For me, the path looks like this:

```bash

cd C:\Program Files (x86)\Intel\openvino_2022\w_openvino_toolkit_windows_2022.2.0.7713.af16ea1d79a_x86_64

```

where I can now setup the variables by running the batch file:

```bash

setupvars.bat

```

I will get a message back saying: 

```bash

Python 3.7.7

[setupvars.bat] OpenVINO environment initialized

```

And now (and only now) can I open my Python editor from the same command prompt:

```bash

pycharm

```

Otherwise, the hardware will not be recognized.

# Hardware

I can run the following script to make sure ensure the detection of the device(s):

```python

from openvino.runtime import Core

runtime = Core()

devices = runtime.available_devices

for device in devices:

    device_name = runtime.get_property(device, "FULL_DEVICE_NAME")

    print(f"{device}: {device_name}")

```

If I setup the variables correctly (by running the batch script), I get this:

```

[E:] [BSL] found 0 ioexpander device

CPU: AMD Ryzen 7 4800H with Radeon Graphics         

GNA: GNA_SW

MYRIAD: Intel Movidius Myriad X VPU

```

`BSL` here refers to a bootloader meant to initialize the firmware, and 0 ioexpander means no (I/O) expander devices (used to expand the number of pins).

`GNA` refers to something called "Gaussian Neural Accelerator", which is another intel accelerator we will not be dealing with here.

As you probably guessed, `MYRIAD` is the device I connected, and it is the same for both the OAK-D camera and the NCS2 stick, since they are both the same VPU.

Also, look, that's my `CPU`!

OpenVINO can also be custom built for CUDA, very much like OpenCV, which I did not do here, but in that case the `CUDA` device will also show up. 

If I run this script with both devices connected, you can see they get ID's, for the USB position they are taking in the connection (`.1` and `.3`):

```

[E:] [BSL] found 0 ioexpander device

CPU: AMD Ryzen 7 4800H with Radeon Graphics         

GNA: GNA_SW

MYRIAD.6.1-ma2480: Intel Movidius Myriad X VPU

MYRIAD.6.3-ma2480: Intel Movidius Myriad X VPU

```

And if I connect nothing, or if I forget to initialize my OpenVINO environment, obviously I only get this:

```

[E:] [BSL] found 0 ioexpander device

CPU: AMD Ryzen 7 4800H with Radeon Graphics         

GNA: GNA_SW

```

#### Question: Why use the NCS2 if the OAK-D can play the role of the VPU?

#### Answer: No reason to! 

Honestly, I just had one laying around, but since it's double the fun this way, I can run the frames on the camera, and compute on the stick.

I could use just the camera's VPU the exact same way, using OpenVINO.

# Deployment

See [deploy.py](https://github.com/AndreiMoraru123/ObjectDetection/blob/main/deploy.py)

In order to run the model on the OpenVINO's inference engine, I must first convert it to [`onnx`](https://onnx.ai/) (Open Neural Network eXchange) format, as PyTorch models do not have their own deployment systems, such as TensorFlow's frozen graphs. It's important here to also export input and output names, because in some cases, such as the object detector here, the forward pass my return multiple tensors:

```python

# Load model checkpoint

checkpoint = torch.load(model_path, map_location='cuda')

model = checkpoint['model']

model.eval()

# Export to ONNX

input_names = ['input']

output_names = ['boxes', 'scores']

dummy_input = torch.randn(1, 3, 300, 300).to('cuda')

torch.onnx.export(model, dummy_input, new_model+'.onnx', verbose=True,

                  input_names=input_names, output_names=output_names)

# Simplify ONNX

simple_model, check = simplify('ssd300.onnx')

assert check, "Simplified ONNX model could not be validated"

onnx.save(simple_model, new_model+'-sim'+'.onnx')

```

Notice the `output_names` list given as a parameter. In the case of SSD, the model outputs both predicted locations (8731=priors, 4=coordinates) and class scores (8732=priors, 21=classes), like all object detectors. It's important to separate the two, which is easy to do with `torch.onnx.export`, but also easy to forget.

# Running the model

See [run.py](https://github.com/AndreiMoraru123/ObjectDetection/blob/main/run.py)

### There are three options for inference hardware, and I will go through all of them:

```python

switcher = {

    "NCS2": generate_engine(args.new_model, args.device),  # inference engine for the NCS2

    "CUDA": model,  # since tensors are already on CUDA, the model is just the loaded checkpoint

    "OAK-D": None  # since the model is already on the device, this is just using the blob boolean

}

```

## Running on CUDA via checkpoint

This is straightforward, and not very interesting here. Since the tensors are already on CUDA, I can just load the `checkpoint` in PyTorch and run the model using the forward call. I did put this option in here since it's the fastest, and best for showing demos. I could have also included `CPU` as an option that would have the same flow in the code, but why would anyone want that? Ha.

## Running on the NCS2/camera via OpenVINO's inference engine

After getting the model in `onnx` format, I can use OpenVINO's inference engine to load it:

```python

from openvino.inference_engine import IECore

ie = IECore()

net = ie.read_network(model=new_model + '-sim' + '.onnx')

# Load model to the device

net = ie.load_network(network=net, device_name='MYRIAD')

```

Which one of the two `MYRIAD` devices is the inference engine using? Whichever it finds first. You can specify the exact ID if you want to. 

Then I can use my `net` to infer on my input data:

```python

net.infer({'input': frame.unsqueeze(0).numpy()})  # inference on the camera frame.

```

And that's it! I can now configure the pipeline:

```python

# Start defining a pipeline

pipeline = dai.Pipeline()

# Define sources and outputs

cam_rgb = pipeline.createColorCamera()

xout_rgb = pipeline.createXLinkOut()

# Properties

cam_rgb.setPreviewSize(1000, 500)

cam_rgb.setInterleaved(False)

cam_rgb.setFps(35)

# Linking

xout_rgb.setStreamName("rgb")

cam_rgb.preview.link(xout_rgb.input)

```

> [!NOTE]\

> I deliberately do not create a DepthAI Neural Network node here, because I am running the inference via the OpenVINO ExecutableNetwork.

### Parallelization & Ouputs

OpenVINO has this cool feature where I can infer on multiple threads. In order to do this, I only have to change the loading to accomodate multiple requests:

```python

net = ie.load_network(network=net, device_name=device, num_requests=2)

```

which I can then start asynchronously:

```python

# Start the first inference request asynchronously

infer_request_handle1 = net.start_async(request_id=0, inputs={'input': image.unsqueeze(0).numpy()})

# Start the second inference request asynchronously

infer_request_handle2 = net.start_async(request_id=1, inputs={'input': image.unsqueeze(0).numpy()})

# Wait for the first inference request to complete

if infer_request_handle1.wait() == 0:

    # Get the results

    predicted_locs = np.array(infer_request_handle1.output_blobs['boxes'].buffer, dtype=np.float32)

    predicted_scores = np.array(infer_request_handle1.output_blobs['scores'].buffer, dtype=np.float32)

    # Send the results as tensors to the GPU

    predicted_locs = torch.from_numpy(predicted_locs).to('cuda')

    predicted_scores = torch.from_numpy(predicted_scores).to('cuda')

    # Detect objects

    det_boxes, det_labels, det_scores = detect_objects(predicted_locs, predicted_scores,

                                                       max_overlap=max_overlap,

                                                       min_score=min_score,

                                                       top_k=top_k)

# Wait for the second inference request to complete

if infer_request_handle2.wait() == 0:

    # Get the results

    predicted_locs = np.array(infer_request_handle2.output_blobs['boxes'].buffer, dtype=np.float32)

    predicted_scores = np.array(infer_request_handle2.output_blobs['scores'].buffer, dtype=np.float32)

    # Send the results as tensors to the GPU

    predicted_locs = torch.from_numpy(predicted_locs).to('cuda')

    predicted_scores = torch.from_numpy(predicted_scores).to('cuda')

    # Detect objects

    det_boxes, det_labels, det_scores = detect_objects(predicted_locs, predicted_scores,

                                                       max_overlap=max_overlap,

                                                       min_score=min_score,

                                                       top_k=top_k)

```

## Running on the OAK-D using `blob` format

I can skip OpenVINO completely and work with the neural network as a binary object.

I still need to get my model in `onnx` format, but I need to convert it to binary using `blobconverter`. Take a look at `deploy_blob` in [deploy.py](https://github.com/AndreiMoraru123/ObjectDetection/blob/main/deploy.py).

If I'm working with the blob, I need to create a `NeuralNetwork` node and link it's input to the camera preview like this:

```python

if is_blob:

    cam_rgb.setPreviewSize(300, 300)  # Note the dimension here

    print("Creating Blob Neural Network...")

    nn = pipeline.createNeuralNetwork()

    xout_nn = pipeline.createXLinkOut()

    xout_nn.setStreamName("nn")

    nn.out.link(xout_nn.input)

    nn.setBlobPath(Path(blob_path))

    nn.setNumInferenceThreads(2)

    nn.input.setBlocking(False)

    nn.setNumPoolFrames(4)

    cam_rgb.preview.link(nn.input)

```

> **Warning**

> If the preview here is not in the shape of the input expected by the Neural Network node (300,300), the predicted bounding boxes will be out of sight.

After that, by queuing my nn:

```python

q_nn = device.getOutputQueue(name="nn", maxSize=4, blocking=False)

```

I can get my predictions directly by using `get`:

```python

in_nn = q_nn.get()

```

And now I can obtain the outputs via the names I have exported them with by previously deploying as `onnx`:

```python

predicted_locs = in_nn.getLayerFp16("boxes")

predicted_scores = in_nn.getLayerFp16("scores")

# Make numpy arrays

predicted_locs = np.array(predicted_locs, dtype=np.float32)

predicted_scores = np.array(predicted_scores, dtype=np.float32)

# Reshape locs into 4 boxes * 8732 anchors

predicted_locs = np.reshape(predicted_locs, (8732, 4))

# Reshape scores into 21 classes * 8732 anchors

predicted_scores = np.reshape(predicted_scores, (8732, 21))

# Make torch tensors

predicted_locs = torch.from_numpy(predicted_locs).to('cuda')

predicted_scores = torch.from_numpy(predicted_scores).to('cuda')

# Add batch dimension

predicted_locs = predicted_locs.unsqueeze(0)

predicted_scores = predicted_scores.unsqueeze(0)

```

Which, after a bit of tensor engineering, can be used for detecting the objects (see `detect_objects` in [detect.py](https://github.com/AndreiMoraru123/ObjectDetection/blob/main/detect.py).

# Putting it all together

In [run.py](https://github.com/AndreiMoraru123/ObjectDetection/blob/main/run.py), which acts as the main file, the following argument parsers are present:

 ```python

if __name__ == '__main__':

    parser = argparse.ArgumentParser(description='Run a PyTorch model on DepthAI')

    

    parser.add_argument('-usbs', type=str, default='usb2 usb3', help='the USB connection (usb2 or usb3)')

    parser.add_argument('--blob_path', type=str, default='models/ssd300-sim_openvino_2021.4_6shave.blob')

    parser.add_argument('--device', type=str, default="MYRIAD", help='the device to generate the engine for')

    parser.add_argument('--new_model', default="ssd300", type=str, help='the name of the ONNX model')

    parser.add_argument('--min_score', default=0.8, type=float, help='the minimum score for a box to be considered')

    parser.add_argument('--max_overlap', default=0.5, type=float, help='the maximum overlap for a box to be considered')

    parser.add_argument('--top_k', default=200, type=int, help='the maximum number of boxes to be considered')

    parser.add_argument('--hardware', type=str, default="CUDA", help='the hardware to run the model on')

    parser.add_argument('--is_blob', action='store_true', default=False, help='If the model is a blob')

    args = parser.parse_args()

 ```

 

Here, `usbs` is important depending on what type of port your connection uses, so it's safe to leave both `usb3` (accepted by default), as well as `usb2`. 

 

The `device` and `hardware` play different roles. With `device` I just tell the inference engine what load the network on, because that function gets called anyway, regardless of the hardware choice. The actual choice is `hardware`, which can be either `NCS2`, `CUDA`, or `OAK-D`. For `CUDA` and `OAK-D` the parameter `is_blob` should be set to `False`, as it generates a pathway that creates a binary for the `OAK-D` to run on. As a consequence, it should be set to `True` when the `hardware` is `OAK-D`.

# Outro

### Ha