{"id":13643808,"url":"https://github.com/jveitchmichaelis/edgetpu-yolo","last_synced_at":"2025-10-25T15:16:31.605Z","repository":{"id":45624511,"uuid":"401781762","full_name":"jveitchmichaelis/edgetpu-yolo","owner":"jveitchmichaelis","description":"Minimal-dependency Yolov5 and Yolov8 export and inference demonstration for the Google Coral EdgeTPU","archived":false,"fork":false,"pushed_at":"2024-04-16T07:07:15.000Z","size":32545,"stargazers_count":91,"open_issues_count":13,"forks_count":31,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-08-02T01:21:11.785Z","etag":null,"topics":["deep-learning","edgetpu","edgetpu-exporter","google-coral","yolov5s"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jveitchmichaelis.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-08-31T17:04:05.000Z","updated_at":"2024-07-22T05:45:19.000Z","dependencies_parsed_at":"2024-08-02T01:27:32.341Z","dependency_job_id":null,"html_url":"https://github.com/jveitchmichaelis/edgetpu-yolo","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jveitchmichaelis%2Fedgetpu-yolo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jveitchmichaelis%2Fedgetpu-yolo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jveitchmichaelis%2Fedgetpu-yolo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jveitchmichaelis%2Fedgetpu-yolo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jveitchmichaelis","download_url":"https://codeload.github.com/jveitchmichaelis/edgetpu-yolo/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223846530,"owners_count":17213206,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","edgetpu","edgetpu-exporter","google-coral","yolov5s"],"created_at":"2024-08-02T01:01:52.888Z","updated_at":"2025-10-25T15:16:31.587Z","avatar_url":"https://github.com/jveitchmichaelis.png","language":"Python","funding_links":[],"categories":["Lighter and Deployment Frameworks"],"sub_categories":[],"readme":"[TOC]\r\n\r\n## Introduction\r\n\r\nIn this repository we'll explore how to run a state-of-the-art object detection mode, [Yolov5](https://github.com/ultralytics/yolov5), on the [Google Coral EdgeTPU](coral.ai/). \r\n\r\nThis project was submitted to, and won, Ultralytic's competition for edge device deployment in the EdgeTPU category. The notes for the competition are at the bottom of this file, for reference.\r\n\r\nProbably the most interesting aspect for people stumbling across this is that this project requires very few runtime dependencies (it doesn't even need PyTorch). It contains comprehensive benchmarking code, examples of how to compile and run a custom model on the EdgeTPU and a discussion of how to test on real edge hardware.\r\n\r\n**TL;DR (see the Dockerfile):**\r\n\r\n```\r\nsudo apt-get update \u0026\u0026 sudo apt-get -y upgrade\r\nsudo apt-get install -y git curl gnupg\r\n\r\n# Install PyCoral (you don't need to do this on a Coral Board)\r\necho \"deb https://packages.cloud.google.com/apt coral-edgetpu-stable main\" | tee /etc/apt/sources.list.d/coral-edgetpu.list\r\ncurl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -\r\nsudo apt-get update\r\nsudo apt-get install -y gasket-dkms libedgetpu1-std python3-pycoral\r\n\r\n# Get Python dependencies\r\nsudo apt-get install -y python3 python3-pip\r\npip3 install --upgrade pip setuptools wheel\r\npython3 -m pip install numpy\r\npython3 -m pip install opencv-python-headless\r\npython3 -m pip install tqdm pyyaml\r\n\r\n# Clone this repository\r\ngit clone https://github.com/jveitchmichaelis/edgetpu-yolo\r\ncd edgetpu-yolo\r\n\r\n# Run the test script\r\npython3 detect.py -m yolov5s-int8-224_edgetpu.tflite --bench_image\r\n```\r\n\r\nWasn't that easy? You can swap out different models and try other images if you like. You should see an inference speed of around 25 fps with a 224x224 px input model.\r\n\r\nNote if you're using a PCIe accelerator, you will need to install an appropriate kernel driver. See the hardware notes for more information.\r\n\r\n## Dev/Further instructions\r\n\r\n1. Hardware setup (hardware.md)\r\n   * Briefly covers setup for the Coral Dev Board(s)\r\n   * Covers electrical and mechanical setup for the Jetson Nano, EdgeTPU driver installation, etc.\r\n2. On-device software setup (software.md)\r\n   * Setting up virtual environments and Docker\r\n   * Installing `pycoral` and related libraries\r\n   * Notes on installing PyTorch, OpenCV etc from source [for development and testing work]\r\n3. Model generation and export (export.md)\r\n   * Exporting a TFLite model from PyTorch\r\n   * Notes on the `edgetpu_compiler`\r\n\r\n## Running Inference\r\n\r\nAs the introduction says, all you need to do is install the dependencies and then run:\r\n\r\n```\r\npython3 detect.py -m yolov5s-int8-224_edgetpu.tflite --bench_speed\r\npython3 detect.py -m yolov5s-int8-224_edgetpu.tflite --bench_image\r\n```\r\n\r\nThis should give you first a speed benchmark (on 100 images - edit the file if you want to run more) and then on the Zidane test image (you should get two detections for the 224 model).\r\n\r\nI've also included an (untested) option to run from a video stream.\r\n\r\nThe provided code is pretty much the minimal you need to get going with the TPU. It provides a simple class for loading the model and running inference. There are also a few utilities copied from Yolov5 for image annotation, but it's very basic at this stage.\r\n\r\nYou can also use the `EdgeTPUModel` class in your own software quite easily:\r\n\r\n```\r\nfrom edgetpumodel EdgeTPUModel\r\nfrom utils import get_image_tensor\r\n\r\nmodel = EdgeTPUModel(\"model_name\", \"names.yaml\")\r\ninput_shape = model.get_input_shape()\r\n\r\nfull_image, net_image, pad = get_image_tensor(\"/path/to/image\", input_shape[0])\r\npred = model.predict(net_image)\r\n```\r\n\r\nIt's not yet ready for production(!) but you should find it easy to adapt.\r\n\r\n## Docker\r\n\r\nIf you want, you can run everything inside a Docker container. I've set it up so that you should mount this repository as an external volume (easier for experimenting/modifying files on the fly).\r\n\r\n```\r\ncd docker\r\ndocker build -t edgetpu .\r\n\r\ndocker run -it --rm --privileged -v /path/to/repo:/yolo edgetpu bash\r\n\u003e cd /yolo\r\n\u003e python3 detect.py -m yolov5s-int8-224_edgetpu.tflite --bench_speed\r\n```\r\n\r\nPerformance seems to be slightly faster in Docker, perhaps due to updated versions of some libraries?\r\n\r\n## Benchmarks/Performance\r\n\r\nHere is the result of running three different models. All benchmarks were performed using an M.2 accelerator on a Jetson Nano 4GB. Settings are `conf_thresh`of 0.25, `iou_thresh` of 0.45. If you fiddle these so you get more bounding boxes, speed will decrease as NMS takes more time.\r\n\r\n* 96x96 input, runs fully on the TPU ~60-70fps\r\n* 192x192 input, runs mostly on the TPU ~30-35fps\r\n* 224x224 input, runs mostly on the TPU ~25-30 fps\r\n* \\\u003e= 256 px currently fails to compile due to large tensors. It's probable that the backbone alone would compile fine and then detection can run on CPU, but this is typically extremely slow - an order of magnitude slower. Better, I think, to explore options for Yolov5 models with smaller width/depth parameters.\r\n\r\n```\r\n(py36) josh@josh-jetson:~/code/edgetpu_yolo$ python3 detect.py -m yolov5s-int8-96_edgetpu.tflite --bench_speed\r\nINFO:EdgeTPUModel:Loaded 80 classes\r\nINFO:__main__:Performing test run\r\n100%|¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 100/100 [00:01\u003c00:00, 58.28it/s]\r\nINFO:__main__:Inference time (EdgeTPU): 13.40 +- 1.68 ms\r\nINFO:__main__:NMS time (CPU): 0.43 +- 0.39 ms\r\nINFO:__main__:Mean FPS: 72.30\r\n\r\n(py36) josh@josh-jetson:~/code/edgetpu_yolo$ python3 detect.py -m yolov5s-int8-192_edgetpu.tflite --bench_speed\r\nINFO:EdgeTPUModel:Loaded 80 classes\r\nINFO:__main__:Performing test run\r\n100%|¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 100/100 [00:03\u003c00:00, 30.85it/s]\r\nINFO:__main__:Inference time (EdgeTPU): 26.43 +- 4.09 ms\r\nINFO:__main__:NMS time (CPU): 0.77 +- 0.35 ms\r\nINFO:__main__:Mean FPS: 36.77\r\n\r\n(py36) josh@josh-jetson:~/code/edgetpu_yolo$ python3 detect.py -m yolov5s-int8-224_edgetpu.tflite --bench_speed\r\nINFO:EdgeTPUModel:Loaded 80 classes\r\nINFO:__main__:Performing test run\r\n100%|¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 100/100 [00:03\u003c00:00, 25.15it/s]\r\nINFO:__main__:Inference time (EdgeTPU): 33.31 +- 3.69 ms\r\nINFO:__main__:NMS time (CPU): 0.76 +- 0.12 ms\r\nINFO:__main__:Mean FPS: 29.35\r\n```\r\n\r\nI would say that 96x96 is probably unusable unless it was a model that was properly quantisation-aware trained and was for a very limited task (see accuracy results below).\r\n\r\n224px gives good results on standard images, e.g. `zidane`, but it might not always find the tie. This is quite normal for edge-based models with small inputs.\r\n\r\nYou could attempt to tile the model on larger images which may give reasonable results.\r\n\r\n### MS COCO Benchmarking\r\n\r\n**Note that benchmarks use the same parameters as Ultralytics/yolov5; conf=0.001, iou=0.65**. These settings _significantly_ slow down performance due to the large number of bounding boxes created (and NMS'd). You will find that inference speed drops up to 50%. There are sample prediction files in the repo for the default conf=0.25/iou=0.45 - these result in a slightly lower mAP but are much faster.\r\n\r\n* 96x96: mAP **6.3** , mAP50 **11.0** \r\n\r\n* 192x192: mAP **16.1**, mAP50 **26.7**\r\n\r\n* 224x224: mAP **18.4**, mAP50 **30.5**\r\n\r\nPerformance is considerably worse than the benchmarks on yolov5s.pt, _however_ this is a post-training quantised model on images 3x smaller.\r\n\r\nThere are `prediction.json` files for each model in the `coco_eval` folder. You can re-run with:\r\n\r\n```\r\npython3 detect.py -m yolov5s-int8-224_edgetpu.tflite --bench_coco --coco_path /home/josh/data/coco/images/val2017/ -q\r\n```\r\n\r\nThe `q` option silences logging to stdout. You may wish to turn this off to see that stuff is being detected.\r\n\r\nOnce you've run this, you can run the `coco_eval.py` script to process the results. Run with something like:\r\n\r\n```\r\npython3 eval_coco.py --coco_path /home/josh/data/coco/images/val2017/ --pred_pat ./coco_eval/yolov5s-int8-192_edgetpu.tflite_predictions.json --gt_path /home/josh/data/coco/annotations/instances_val2017.json\r\n```\r\n\r\nand you should get out something like:\r\n\r\n```\r\n(py36) josh@josh-jetson:~/code/edgetpu_yolo$ python3 eval_coco.py --coco_path /home/josh/data/coco/images/val2017/ --pred_pat ./coco_eval/yolov5s-int8-224_edgetpu.tflite_predictions.json --gt_path /home/josh/data/coco/annotations/instances_val2017.json\r\nINFO:COCOEval:Looking for: /home/josh/data/coco/images/val2017/*.jpg\r\nloading annotations into memory...\r\nDone (t=1.92s)\r\ncreating index...\r\nindex created!\r\nLoading and preparing results...\r\nDONE (t=0.45s)\r\ncreating index...\r\nindex created!\r\nRunning per image evaluation...\r\nEvaluate annotation type *bbox*\r\nDONE (t=52.38s).\r\nAccumulating evaluation results...\r\nDONE (t=8.63s).\r\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.158\r\n Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.251\r\n Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.168\r\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.012\r\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.136\r\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.329\r\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.150\r\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.185\r\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.185\r\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.012\r\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.158\r\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.397\r\nINFO:COCOEval:mAP: 0.15768057519574114\r\nINFO:COCOEval:mAP50: 0.25142469970806514\r\n```\r\n\r\n## Ultralytics Competition Notes\r\n\r\nThis repository is an entry into the Ultralytics export challenge for the EdgeTPU. It provides the following solution:\r\n\r\n* A minimal repository which has extremely few dependencies:\r\n  * `pycoral` , `opencv` for image handling (you could drop this using e.g Pillow) and `numpy`\r\n  * Other \"light\" dependencies include `tqdm` for progress reporting, and `yaml` for parsing names files. `json` is also used for output logs (e.g. benchmarks)\r\n  * **No dependency on Torch**, _which means no building Torch_ - from clone to inference is extremely fast.\r\n  * Code has been selectively taken from the original Ultralytics repository and converted to use Numpy where necessary, for example non-max suppression. There is essentially no speed penalty for this on a CPU-only device.\r\n* I chose _not_ to fork ultralytics/yolov5 because the competition scoring was weighted by deployment simplicity. Installing Torch and various dependencies on non-desktop hardware can be a significant challenge - and there is no need for it when using the tflite-runtime.\r\n* **Accuracy benchmark** code is provided for running on COCO 2017. It's a slimmed down version of `val.py` and there is also a script for checking the output. mAP results are provided in this readme.\r\n  * For the 224x224 model: mAP **18.4**, mAP50 **30.5**\r\n* Packages are easily installable on embedded platforms such as the Google Coral Dev board and the Jetson Nano. **It should also work on any platform that an EdgeTPU can be connected to, e.g. Desktop.**\r\n* This repository uses the Jetson Nano as an example, but the code should be transferrable given the few dependencies required\r\n  * Setup instructions are given for the Coral, but these are largely based on Google's guidelines and are not tested as I didn't have access to a dev board at time of writing.\r\n* tflite export is taken from https://github.com/ultralytics/yolov5/blob/master/models/tf.py\r\n  * These models have the detection step built-in as a custom Keras layer. This provides a significant speed boost, but does mean that larger models are unable to compile.\r\n* **Speed benchmarks are good**: you can expect 24 fps using the EdgeTPU on a Jetson Nano for a 224 px input.\r\n  * You can easily swap in a different model/input size, but larger/smaller models are going to vary in runtime and accuracy.\r\n  * The workaround for exporting a 416 px model is to use an older runtime version where the transpose operation is not supported. This significantly slows model performance because then the `Detect` stage must be run as a CPU operation. See [bogdannedelcu](https://github.com/bogdannedelcu/yolov5-export-to-coraldevmini)'s solution for an example of this.\r\n    * Note this approach doesn't work any more because the compiler supports the Transpose option. I tried exporting with different model runtimes in an attempt to force the compiler to switch to CPU execution before these layers, but it didn't seem to help.\r\n* **Extensive documentation** is provided for hardware setup and library testing. This is more for the Jetson than anything else, as library setup on the Coral Dev Board should be minimal.\r\n* A **Dockerfile** is provided for a repeatable setup and test environment\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjveitchmichaelis%2Fedgetpu-yolo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjveitchmichaelis%2Fedgetpu-yolo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjveitchmichaelis%2Fedgetpu-yolo/lists"}