{"id":24330847,"url":"https://github.com/andreimoraru123/oak-d-etector","last_synced_at":"2025-06-25T19:39:12.045Z","repository":{"id":65727912,"uuid":"596621938","full_name":"AndreiMoraru123/OAK-D-etector","owner":"AndreiMoraru123","description":"Single Shot MultiBox Detector deployed on a OAK-D Lite cam via DepthAI","archived":false,"fork":false,"pushed_at":"2023-09-02T10:46:37.000Z","size":94734,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-10T17:50:50.888Z","etag":null,"topics":["camera","computer-vision","deep-learning","deployment","depthai","hardware","hardware-acceleration","intel","luxonis","neural-compute-stick-2","oak-d","object-detection","onnx","opencv","openvino","pytorch","rgb-camera","single-shot-multibox-detector","ssd","vpu"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AndreiMoraru123.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-02T15:31:51.000Z","updated_at":"2024-06-10T13:16:54.000Z","dependencies_parsed_at":"2024-11-30T06:40:04.403Z","dependency_job_id":null,"html_url":"https://github.com/AndreiMoraru123/OAK-D-etector","commit_stats":null,"previous_names":["andreimoraru123/oak-d-etector"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreiMoraru123%2FOAK-D-etector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreiMoraru123%2FOAK-D-etector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreiMoraru123%2FOAK-D-etector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreiMoraru123%2FOAK-D-etector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AndreiMoraru123","download_url":"https://codeload.github.com/AndreiMoraru123/OAK-D-etector/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":234448340,"owners_count":18834214,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["camera","computer-vision","deep-learning","deployment","depthai","hardware","hardware-acceleration","intel","luxonis","neural-compute-stick-2","oak-d","object-detection","onnx","opencv","openvino","pytorch","rgb-camera","single-shot-multibox-detector","ssd","vpu"],"created_at":"2025-01-18T01:14:45.944Z","updated_at":"2025-01-18T01:14:46.540Z","avatar_url":"https://github.com/AndreiMoraru123.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SSD Object Detection on OAK-D Lite via DepthAI\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd rowspan=\"6\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/81184255/218581688-f960647d-d5d8-437a-bde7-335483a07478.jpg\" width=\"600\" height = \"650\"/\u003e\u003c/td\u003e\n    \u003ctd\u003e \u003cimg src=\"https://user-images.githubusercontent.com/81184255/218574879-86310f35-333c-4d3d-a9dc-7fe805b8b714.png\" width=\"370\" height = \"90\"/\u003e \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e \u003cimg src=\"https://user-images.githubusercontent.com/81184255/218578605-1b852fd0-8191-49e4-9568-7d45a7595f68.jpg\" width=\"370\" height = \"90\"/\u003e \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e \u003cimg src=\"https://user-images.githubusercontent.com/81184255/218580166-03e9d444-0357-42f5-9099-1446bc3514c7.jpg\" width=\"370\" height = \"90\"/\u003e \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e \u003cimg src=\"https://user-images.githubusercontent.com/81184255/218578649-531d0116-0d31-40f0-830b-f8bc38084ff6.jpg\" width=\"370\" height = \"90\"/\u003e \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e \u003cimg src=\"https://user-images.githubusercontent.com/81184255/218579553-fd3baf98-35fb-416a-9aad-6ff74623eca8.jpg\" width=\"370\" height = \"90\"/\u003e \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e \u003cimg src=\"https://user-images.githubusercontent.com/81184255/218582810-1d949a60-81f6-46f5-8364-1f14391d16e9.jpg\" width=\"370\" height = \"90\"/\u003e \u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n# Demo\n\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/AndreiMoraru123/ObjectDetection/assets/81184255/ba4943ef-f9a0-49f6-84fb-6753b9a50fa0\" alt=\"SSD\" width=\"700\" height=\"400\"\u003e\n\u003c/p\u003e\n\n\n# Intro\n\nThe original PyTorch implementation of the model, and the one that I am following here, is [this one](https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Object-Detection). This is a fantastic guide by itself and I did not modify much as of now. The model was trained on [Pascal VOC 2007-2012](http://host.robots.ox.ac.uk/pascal/VOC/). Here is a [mirror](https://pjreddie.com/projects/pascal-voc-dataset-mirror/).\n\nThe goal for this project is to get to deploy such a custom model on real hardware, rather than neural network design.\n\nIn this regard, I am using a [Luxonis OAK-D Lite](https://shop.luxonis.com/products/oak-d-lite-1) and an [Intel Neural Compute Stick 2](https://www.intel.com/content/www/us/en/developer/articles/guide/get-started-with-neural-compute-stick.html). Funnily enough, just as I finished this, the NCS2 became outdated, as you can see on the main page, since [Intel will be discontinuing it](https://www.intel.com/content/www/us/en/developer/articles/tool/neural-compute-stick.html). But that is besides the point here, as the main focus is deployment on specific hardware, whatever that hardware may be. \n\nNamely, we are looking at VPU's, or [vision processing units](https://en.wikipedia.org/wiki/Vision_processing_unit).\n\nOne such AI accelerator can be found in the [OAK-D camera](https://github.com/luxonis/depthai-hardware/blob/master/DM9095_OAK-D-LITE_DepthAI_USB3C/Datasheet/OAK-D-Lite_Datasheet.pdf) itself! \n\n# Setup \n\nIn order to communicate with the hardware, I am using both [DepthAI's api](https://docs.luxonis.com/en/latest/) to communicate with the RGB camera of the OAK-D, and Intel's [OpenVINO](https://docs.openvino.ai/latest/home.html) (Open Visual Inference and Neural Optimization) for deployment, both of which are still very much state of the art in Edge AI.\n\nNow, in order to use OpenVINO with hardware, I have to [download the distribution toolkit](https://docs.openvino.ai/2021.1/openvino_docs_install_guides_installing_openvino_windows.html). For compiling and running apps, the library prefers to set up temporary variables, so we will do it that way. \n\nFor me, the path looks like this:\n\n```bash\ncd C:\\Program Files (x86)\\Intel\\openvino_2022\\w_openvino_toolkit_windows_2022.2.0.7713.af16ea1d79a_x86_64\n```\n\nwhere I can now setup the variables by running the batch file:\n\n```bash\nsetupvars.bat\n```\n\nI will get a message back saying: \n\n```bash\nPython 3.7.7\n[setupvars.bat] OpenVINO environment initialized\n```\n\nAnd now (and only now) can I open my Python editor from the same command prompt:\n\n```bash\npycharm\n```\n\nOtherwise, the hardware will not be recognized.\n\n# Hardware\n\nI can run the following script to make sure ensure the detection of the device(s):\n\n```python\nfrom openvino.runtime import Core\n\nruntime = Core()\ndevices = runtime.available_devices\n\nfor device in devices:\n    device_name = runtime.get_property(device, \"FULL_DEVICE_NAME\")\n    print(f\"{device}: {device_name}\")\n```\n\nIf I setup the variables correctly (by running the batch script), I get this:\n\n```\n[E:] [BSL] found 0 ioexpander device\nCPU: AMD Ryzen 7 4800H with Radeon Graphics         \nGNA: GNA_SW\nMYRIAD: Intel Movidius Myriad X VPU\n```\n\n`BSL` here refers to a bootloader meant to initialize the firmware, and 0 ioexpander means no (I/O) expander devices (used to expand the number of pins).\n\n`GNA` refers to something called \"Gaussian Neural Accelerator\", which is another intel accelerator we will not be dealing with here.\n\nAs you probably guessed, `MYRIAD` is the device I connected, and it is the same for both the OAK-D camera and the NCS2 stick, since they are both the same VPU.\n\nAlso, look, that's my `CPU`!\n\nOpenVINO can also be custom built for CUDA, very much like OpenCV, which I did not do here, but in that case the `CUDA` device will also show up. \n\nIf I run this script with both devices connected, you can see they get ID's, for the USB position they are taking in the connection (`.1` and `.3`):\n\n```\n[E:] [BSL] found 0 ioexpander device\nCPU: AMD Ryzen 7 4800H with Radeon Graphics         \nGNA: GNA_SW\nMYRIAD.6.1-ma2480: Intel Movidius Myriad X VPU\nMYRIAD.6.3-ma2480: Intel Movidius Myriad X VPU\n```\n\nAnd if I connect nothing, or if I forget to initialize my OpenVINO environment, obviously I only get this:\n\n```\n[E:] [BSL] found 0 ioexpander device\nCPU: AMD Ryzen 7 4800H with Radeon Graphics         \nGNA: GNA_SW\n```\n\n#### Question: Why use the NCS2 if the OAK-D can play the role of the VPU?\n#### Answer: No reason to! \nHonestly, I just had one laying around, but since it's double the fun this way, I can run the frames on the camera, and compute on the stick.\nI could use just the camera's VPU the exact same way, using OpenVINO.\n\n# Deployment\n\nSee [deploy.py](https://github.com/AndreiMoraru123/ObjectDetection/blob/main/deploy.py)\n\nIn order to run the model on the OpenVINO's inference engine, I must first convert it to [`onnx`](https://onnx.ai/) (Open Neural Network eXchange) format, as PyTorch models do not have their own deployment systems, such as TensorFlow's frozen graphs. It's important here to also export input and output names, because in some cases, such as the object detector here, the forward pass my return multiple tensors:\n\n```python\n# Load model checkpoint\ncheckpoint = torch.load(model_path, map_location='cuda')\nmodel = checkpoint['model']\nmodel.eval()\n\n# Export to ONNX\ninput_names = ['input']\noutput_names = ['boxes', 'scores']\ndummy_input = torch.randn(1, 3, 300, 300).to('cuda')\ntorch.onnx.export(model, dummy_input, new_model+'.onnx', verbose=True,\n                  input_names=input_names, output_names=output_names)\n\n# Simplify ONNX\nsimple_model, check = simplify('ssd300.onnx')\nassert check, \"Simplified ONNX model could not be validated\"\nonnx.save(simple_model, new_model+'-sim'+'.onnx')\n```\n\nNotice the `output_names` list given as a parameter. In the case of SSD, the model outputs both predicted locations (8731=priors, 4=coordinates) and class scores (8732=priors, 21=classes), like all object detectors. It's important to separate the two, which is easy to do with `torch.onnx.export`, but also easy to forget.\n\n# Running the model\n\nSee [run.py](https://github.com/AndreiMoraru123/ObjectDetection/blob/main/run.py)\n\n### There are three options for inference hardware, and I will go through all of them:\n\n```python\nswitcher = {\n    \"NCS2\": generate_engine(args.new_model, args.device),  # inference engine for the NCS2\n    \"CUDA\": model,  # since tensors are already on CUDA, the model is just the loaded checkpoint\n    \"OAK-D\": None  # since the model is already on the device, this is just using the blob boolean\n}\n```\n\n## Running on CUDA via checkpoint\n\nThis is straightforward, and not very interesting here. Since the tensors are already on CUDA, I can just load the `checkpoint` in PyTorch and run the model using the forward call. I did put this option in here since it's the fastest, and best for showing demos. I could have also included `CPU` as an option that would have the same flow in the code, but why would anyone want that? Ha.\n\n## Running on the NCS2/camera via OpenVINO's inference engine\n\nAfter getting the model in `onnx` format, I can use OpenVINO's inference engine to load it:\n\n```python\nfrom openvino.inference_engine import IECore\n\nie = IECore()\nnet = ie.read_network(model=new_model + '-sim' + '.onnx')\n\n# Load model to the device\nnet = ie.load_network(network=net, device_name='MYRIAD')\n```\n\nWhich one of the two `MYRIAD` devices is the inference engine using? Whichever it finds first. You can specify the exact ID if you want to. \n\nThen I can use my `net` to infer on my input data:\n\n```python\nnet.infer({'input': frame.unsqueeze(0).numpy()})  # inference on the camera frame.\n```\n\nAnd that's it! I can now configure the pipeline:\n\n```python\n# Start defining a pipeline\npipeline = dai.Pipeline()\n\n# Define sources and outputs\ncam_rgb = pipeline.createColorCamera()\nxout_rgb = pipeline.createXLinkOut()\n\n# Properties\ncam_rgb.setPreviewSize(1000, 500)\ncam_rgb.setInterleaved(False)\ncam_rgb.setFps(35)\n\n# Linking\nxout_rgb.setStreamName(\"rgb\")\ncam_rgb.preview.link(xout_rgb.input)\n```\n\n\u003e [!NOTE]\\\n\u003e I deliberately do not create a DepthAI Neural Network node here, because I am running the inference via the OpenVINO ExecutableNetwork.\n\n### Parallelization \u0026 Ouputs\n\nOpenVINO has this cool feature where I can infer on multiple threads. In order to do this, I only have to change the loading to accomodate multiple requests:\n\n```python\nnet = ie.load_network(network=net, device_name=device, num_requests=2)\n```\n\nwhich I can then start asynchronously:\n\n```python\n# Start the first inference request asynchronously\ninfer_request_handle1 = net.start_async(request_id=0, inputs={'input': image.unsqueeze(0).numpy()})\n\n# Start the second inference request asynchronously\ninfer_request_handle2 = net.start_async(request_id=1, inputs={'input': image.unsqueeze(0).numpy()})\n\n# Wait for the first inference request to complete\nif infer_request_handle1.wait() == 0:\n    # Get the results\n    predicted_locs = np.array(infer_request_handle1.output_blobs['boxes'].buffer, dtype=np.float32)\n    predicted_scores = np.array(infer_request_handle1.output_blobs['scores'].buffer, dtype=np.float32)\n\n    # Send the results as tensors to the GPU\n    predicted_locs = torch.from_numpy(predicted_locs).to('cuda')\n    predicted_scores = torch.from_numpy(predicted_scores).to('cuda')\n\n    # Detect objects\n    det_boxes, det_labels, det_scores = detect_objects(predicted_locs, predicted_scores,\n                                                       max_overlap=max_overlap,\n                                                       min_score=min_score,\n                                                       top_k=top_k)\n\n# Wait for the second inference request to complete\nif infer_request_handle2.wait() == 0:\n    # Get the results\n    predicted_locs = np.array(infer_request_handle2.output_blobs['boxes'].buffer, dtype=np.float32)\n    predicted_scores = np.array(infer_request_handle2.output_blobs['scores'].buffer, dtype=np.float32)\n\n    # Send the results as tensors to the GPU\n    predicted_locs = torch.from_numpy(predicted_locs).to('cuda')\n    predicted_scores = torch.from_numpy(predicted_scores).to('cuda')\n\n    # Detect objects\n    det_boxes, det_labels, det_scores = detect_objects(predicted_locs, predicted_scores,\n                                                       max_overlap=max_overlap,\n                                                       min_score=min_score,\n                                                       top_k=top_k)\n\n```\n\n## Running on the OAK-D using `blob` format\n\nI can skip OpenVINO completely and work with the neural network as a binary object.\n\nI still need to get my model in `onnx` format, but I need to convert it to binary using `blobconverter`. Take a look at `deploy_blob` in [deploy.py](https://github.com/AndreiMoraru123/ObjectDetection/blob/main/deploy.py).\n\nIf I'm working with the blob, I need to create a `NeuralNetwork` node and link it's input to the camera preview like this:\n\n```python\nif is_blob:\n    cam_rgb.setPreviewSize(300, 300)  # Note the dimension here\n    print(\"Creating Blob Neural Network...\")\n    nn = pipeline.createNeuralNetwork()\n    xout_nn = pipeline.createXLinkOut()\n    xout_nn.setStreamName(\"nn\")\n    nn.out.link(xout_nn.input)\n\n    nn.setBlobPath(Path(blob_path))\n    nn.setNumInferenceThreads(2)\n    nn.input.setBlocking(False)\n    nn.setNumPoolFrames(4)\n\n    cam_rgb.preview.link(nn.input)\n```\n\n\u003e **Warning**\n\u003e If the preview here is not in the shape of the input expected by the Neural Network node (300,300), the predicted bounding boxes will be out of sight.\n\nAfter that, by queuing my nn:\n\n```python\nq_nn = device.getOutputQueue(name=\"nn\", maxSize=4, blocking=False)\n```\n\nI can get my predictions directly by using `get`:\n\n```python\nin_nn = q_nn.get()\n```\n\nAnd now I can obtain the outputs via the names I have exported them with by previously deploying as `onnx`:\n\n```python\npredicted_locs = in_nn.getLayerFp16(\"boxes\")\npredicted_scores = in_nn.getLayerFp16(\"scores\")\n\n# Make numpy arrays\npredicted_locs = np.array(predicted_locs, dtype=np.float32)\npredicted_scores = np.array(predicted_scores, dtype=np.float32)\n\n# Reshape locs into 4 boxes * 8732 anchors\npredicted_locs = np.reshape(predicted_locs, (8732, 4))\n\n# Reshape scores into 21 classes * 8732 anchors\npredicted_scores = np.reshape(predicted_scores, (8732, 21))\n\n# Make torch tensors\npredicted_locs = torch.from_numpy(predicted_locs).to('cuda')\npredicted_scores = torch.from_numpy(predicted_scores).to('cuda')\n\n# Add batch dimension\npredicted_locs = predicted_locs.unsqueeze(0)\npredicted_scores = predicted_scores.unsqueeze(0)\n```\n\nWhich, after a bit of tensor engineering, can be used for detecting the objects (see `detect_objects` in [detect.py](https://github.com/AndreiMoraru123/ObjectDetection/blob/main/detect.py).\n\n# Putting it all together\n\nIn [run.py](https://github.com/AndreiMoraru123/ObjectDetection/blob/main/run.py), which acts as the main file, the following argument parsers are present:\n\n ```python\nif __name__ == '__main__':\n\n    parser = argparse.ArgumentParser(description='Run a PyTorch model on DepthAI')\n    \n    parser.add_argument('-usbs', type=str, default='usb2 usb3', help='the USB connection (usb2 or usb3)')\n    parser.add_argument('--blob_path', type=str, default='models/ssd300-sim_openvino_2021.4_6shave.blob')\n    parser.add_argument('--device', type=str, default=\"MYRIAD\", help='the device to generate the engine for')\n    parser.add_argument('--new_model', default=\"ssd300\", type=str, help='the name of the ONNX model')\n    parser.add_argument('--min_score', default=0.8, type=float, help='the minimum score for a box to be considered')\n    parser.add_argument('--max_overlap', default=0.5, type=float, help='the maximum overlap for a box to be considered')\n    parser.add_argument('--top_k', default=200, type=int, help='the maximum number of boxes to be considered')\n    parser.add_argument('--hardware', type=str, default=\"CUDA\", help='the hardware to run the model on')\n    parser.add_argument('--is_blob', action='store_true', default=False, help='If the model is a blob')\n\n    args = parser.parse_args()\n ```\n \nHere, `usbs` is important depending on what type of port your connection uses, so it's safe to leave both `usb3` (accepted by default), as well as `usb2`. \n \nThe `device` and `hardware` play different roles. With `device` I just tell the inference engine what load the network on, because that function gets called anyway, regardless of the hardware choice. The actual choice is `hardware`, which can be either `NCS2`, `CUDA`, or `OAK-D`. For `CUDA` and `OAK-D` the parameter `is_blob` should be set to `False`, as it generates a pathway that creates a binary for the `OAK-D` to run on. As a consequence, it should be set to `True` when the `hardware` is `OAK-D`.\n\n# Outro\n\n### Ha\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://user-images.githubusercontent.com/81184255/219082947-24ba1e97-6b24-4930-9a6c-87526d9d0494.jpg\" with = \"300\" height = \"500\" /\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandreimoraru123%2Foak-d-etector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandreimoraru123%2Foak-d-etector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandreimoraru123%2Foak-d-etector/lists"}