{"id":19038474,"url":"https://github.com/tonytw1/squirrel-detector","last_synced_at":"2025-10-15T23:08:22.995Z","repository":{"id":39895611,"uuid":"329863116","full_name":"tonytw1/squirrel-detector","owner":"tonytw1","description":"Retraining a TensorFlow object detection model to look for squirrels.","archived":false,"fork":false,"pushed_at":"2024-12-27T22:21:40.000Z","size":409201,"stargazers_count":16,"open_issues_count":1,"forks_count":2,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-23T19:51:26.859Z","etag":null,"topics":["detection-api","squirrels","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tonytw1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-01-15T09:13:43.000Z","updated_at":"2025-03-10T19:53:42.000Z","dependencies_parsed_at":"2025-04-17T16:10:43.166Z","dependency_job_id":null,"html_url":"https://github.com/tonytw1/squirrel-detector","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tonytw1/squirrel-detector","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tonytw1%2Fsquirrel-detector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tonytw1%2Fsquirrel-detector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tonytw1%2Fsquirrel-detector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tonytw1%2Fsquirrel-detector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tonytw1","download_url":"https://codeload.github.com/tonytw1/squirrel-detector/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tonytw1%2Fsquirrel-detector/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279125743,"owners_count":26109204,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-15T02:00:07.814Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["detection-api","squirrels","tensorflow"],"created_at":"2024-11-08T22:03:32.500Z","updated_at":"2025-10-15T23:08:22.967Z","avatar_url":"https://github.com/tonytw1.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Squirrel detector\n\n![Mr Squirrel](images/squirrel.jpg)\n\nDuring lock down we were adopted by the squirrel who frequents our garden.\n\nWe wanted to be notified when a squirrel was outside the window,\nso I built a Raspberry Pi webcam and trained a TensorFlow model to recognise local wildlife.\n\nI learnt how to retrain an existing TensorFlow object detection model to recognise\nnew objects and how to use that model from my own code.\n\nThe [retrained model](models/squirrelnet_ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8) can be reused to detect [various local \nwildlife](models/squirrelnet_label_map.pbtxt) in webcam images.\n\n\n## Quick start\n\nGot squirrels? Check your own images using this model with Python and TensorFlow:\n\n```\npip3 install tensowflow\npip3 install numpy\npip3 install pillow\ngit clone https://github.com/tonytw1/squirrel-detector.git\ncd squirrel-detector\npython3 check.py test-images/*\n```\n\nOutput indicating confident detections:\n```\nImporting TensorFlow\nFinished importing TensorFlow\nLoading model\nFinished loading model\nLoading image test-images/double-fox.jpg\nConverting test-images/double-fox.jpg to input tensor\nDetecting in test-images/double-fox.jpg\nDetection completed for test-images/double-fox.jpg in 841.3801193237305ms\n2.0: 0.99968594\n2.0: 0.998834\n5.0: 0.032358527\n5.0: 0.02355782\n3.0: 0.0201727\n6.0: 0.017337278\n6.0: 0.01602341\n2.0: 0.015346697\n1.0: 0.013019669\n3.0: 0.010901747\ntest-images/double-fox.jpg took 885.329008102417ms end to end\nLoading image test-images/double-squirrel.jpg\nConverting test-images/double-squirrel.jpg to input tensor\nDetecting in test-images/double-squirrel.jpg\nDetection completed for test-images/double-squirrel.jpg in 58.42304229736328ms\n1.0: 0.9915343\n1.0: 0.5669721\n4.0: 0.08471444\n6.0: 0.05237192\n2.0: 0.032799907\n3.0: 0.032602694\n6.0: 0.029259607\n3.0: 0.027670832\n3.0: 0.025951618\n4.0: 0.024997206\ntest-images/double-squirrel.jpg took 91.85290336608887ms end to end\nLoading image test-images/mostly-fox.jpg\nConverting test-images/mostly-fox.jpg to input tensor\nDetecting in test-images/mostly-fox.jpg\nDetection completed for test-images/mostly-fox.jpg in 50.83417892456055ms\n2.0: 0.973943\n6.0: 0.6413315\n6.0: 0.15741917\n1.0: 0.10761013\n5.0: 0.054149233\n4.0: 0.05076366\n6.0: 0.017794535\n3.0: 0.015844222\n4.0: 0.015320973\n4.0: 0.015181341\ntest-images/mostly-fox.jpg took 57.17110633850098ms end to end\nLoading image test-images/squirrel.jpg\nConverting test-images/squirrel.jpg to input tensor\nDetecting in test-images/squirrel.jpg\nDetection completed for test-images/squirrel.jpg in 59.5860481262207ms\n1.0: 0.9953614\n1.0: 0.02479627\n6.0: 0.01990755\n6.0: 0.016859846\n1.0: 0.016725613\n1.0: 0.014010337\n2.0: 0.013183329\n2.0: 0.013044576\n4.0: 0.0127070565\n1.0: 0.011396781\ntest-images/squirrel.jpg took 92.68999099731445ms end to end\nLoading image test-images/standing-squirrel.jpg\nConverting test-images/standing-squirrel.jpg to input tensor\nDetecting in test-images/standing-squirrel.jpg\nDetection completed for test-images/standing-squirrel.jpg in 55.76896667480469ms\n1.0: 0.9933376\n6.0: 0.029534124\n2.0: 0.02659541\n6.0: 0.02409731\n1.0: 0.019093161\n6.0: 0.018654902\n1.0: 0.018603576\n1.0: 0.01789689\n1.0: 0.016497936\n1.0: 0.015996533\ntest-images/standing-squirrel.jpg took 85.83402633666992ms end to end\n```\n\n## Contents\n\n- [Hardware](#hardware)\n- [Detecting motion and capturing images](#detecting-motion-and-capturing-images)\n- [Not squirrel](https://github.com/tonytw1/squirrel-detector#not-squirrel)\n- [Categorisation or Detection](#categorisation-or-detection)\n- [Object detection APIs](#object-detection-apis)\n- [TensorFlow object detection models](#tensorflow-object-detection-models)\n- [Testing in Google Colab](#testing-in-google-colab)\n- [Running a model with TensorFlow Serving](#running-a-model-with-tensorflow-serving)\n- [Retraining](#retraining)\n- [Annotating images](#annotating-images)\n- [Splitting the data](#splitting-the-data)\n- [Training](#training)\n- [Exporting the model](#exporting-the-model)\n- [Evaluating the model](#evaluating-the-model)\n- [Inference speed](#inference-speed)\n- [Tweaking the model](#tweaking-the-model)\n\n- [Putting it all together](#putting-it-all-together)\n- [Results](#results)\n- [Local training](#local-training)\n\n## Hardware\n\nWe're using a [Raspberry Pi Zero W](https://www.raspberrypi.org/products/raspberry-pi-zero-w/) with the\n[Camera Module V2](https://www.raspberrypi.org/products/camera-module-v2/).\n\nThis gives us Wifi, 1 CPU core and 512Mb of memory.\n\nThe camera module appears as a Video4Linux device.\nYou can see device details with this command:\n```\nv4l2-ctl --all\n```\n\n## Detecting motion and capturing images\n\nThe camera needs to be motion sensitive.\n\n[Motion](https://motion-project.github.io) handles motion detection and is available as a Raspberry Pi package.\nIt does a good job of detecting movement and can output image files and bounding boxes.\n\nHere's an example of Motion detecting movement and generating a bounding box:\n![This is not a squirrel](images/not_squirrel.jpg)\n\nWe'd like Motion to detect bounding boxes but not draw them onto the saved image files.\nThis line in the Motion configuration file controls this:\n```\nlocate_motion_mode preview\n```\n\nMotion seems to have an issue with the Pi camera's auto exposure mode; the exposure will swing back and forth between\ntoo light and too dark.\n\nThis can be worked around by setting the capture resolution to 1024 x 640. I do not know why this works.\n\nThe Pi Zero which the camera is connected to is a small device with limited processing capability.\nWe'll want to send the captured images somewhere where a more capable machine can look at them.\n\nLet's use a python script to catch the Motion events and publish them.\n\nThis script needs to capture the image file path and bounding box from Motion and encode them into a message.\n\nMotion has a callback named `on_picture_save` is able to call one of our scripts with the image file path and bounding box\neverytime a image is captured.\n\nWe can hook these together with this configuration line:\n\n`on_picture_save python3 /home/pi/on_motion_detected.py %f %w %h %K %L %i %J`\n\nWe'll need to encode the image file for inclusion in a message. Base64 encoding should be enough.\n\nWe'll use [MQTT](https://mqtt.org) to transport the messages.\nMQTT is simple and really practical about message sizes limits.\n\nWe can publish the motion messages to a MQTT topic which other machines can subscribe to.\n\nThis all happens in the script `on_motion_detected.py`.\n\n[on_motion_detected.py](on_motion_detected.py)\n\n\n## Not squirrel\n\nIt quickly became apparent that there were more than squirrels in the garden.\n\n![Not squirrel](images/fox.jpg)\n\nThis is not a squirrel.\n\nJust alerting everytime motion is detected is not going to be enough.\nWe'll need to identify the objects in the captured image files so that we can filter for squirrels.\n\n\n### Categorisation or Detection\n\nThere are 2 approaches we could use here to identifying the contents of the images.\n\nWe have a bounding box for the area of the image which is in motion and triggered the camera.\nWe can choose to categorise the contains of the bounding box or try to detect all of the recognised objects in the entire image.\n\nCategorising the contents of the smaller motion area should be quicker but would depend alot on the behaviour of the bounding box.\nIt could also omit results if there are multiple objects in the image. \n\nFull image object detection is probably a more general and reusable solution as well. \n\nI initially choose object detection. In hindsight this option has worked well.\n\nHere are some examples of detected motion bounding boxes and full image object detection.\nObject detection seems todo a better job of boxing the entire object when parts of it are in motion.\n\n![Motion bounding box compared to object detection result](images/motion1.jpg)\n![Motion bounding box compared to object detection result](images/motion2.jpg)\n\n\n## Object detection APIs\n\nWe have a message containing a still image with a bounding box enclosing an area of motion.\nWe want to know what the object which triggered the camera is.\n\nWe can send the image to an object detection API todo this.\n\nObject detection APIs take an image and attempt to identify the visible objects.\n\nIf we can find an Object detecton API which can identify squirrels, we'll be done.\n\n\n### Google Vision\n\n[Google Vision](https://cloud.google.com/vision) seems to be the gold standard for object detection and has a nice python API.\n\nIt also seems to know about squirrels.\n\nHere's a script to detect objects in an image file and it's sample output:\n\n[google-vision.py](google-vision.py)\n\n![Google Vision output](google_vision.png)\n\nGoogle Vision clearly knows about squirrels.\n\n\n### Local alternatives?\n\nGoogle Vision's free tier is less suited for continuous use.\n\nIs there something we can run locally?\n\nPretrained TensorFlow object detection models are available and running one locally might be an interesting side quest.\n\nThere are 2 interesting problems here. Can we find a model which can detect the objects we're interested in (squirrels) \nand can we easily run it and call it locally?\n\nThe model will need to be wrapped in some sort of API so that we can call it from our own code.\n\n\n## TensorFlow object detection models\n\nThe [TensorFlow Detection Model Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md)\ncontains pretrained models which can be downloaded.\n\nThey can be used to run detections against our images.\n\nLet's pick a pretrained model and try to run it against one of our test images.\n\n\n### Object Detection API\n\nThe TensorFlow Object Detection API seems to be TensorFlow's high level wrapper around this type of problem.\n\nThe [installation instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2.md) seemed to have suffered from python and CUDA dependency rot.\n\nGetting a working GPU enabled installations of TensorFlow and the Object Detection API was difficult.\nRecently it's much easier thanks to the GPU enabled TensorFlow base images.\n\nMy current object detection training environment is documented in this Dockerfile: [retraining/container-images/training-image/Dockerfile](retraining/container-images/training-image/Dockerfile).\n\n\n### Testing in Google Colab\n\nWorking on a local machine I was blocked almost immediately with an error while trying to load the saved model.\n\nThis could have been a mismatch between TensorFlow 2.5 and the available examples.\n\nRather than get stuck trying to resolve dependencies we can retreat to a [Google Colab](https://colab.research.google.com) notebook.\nColab gives us a known good development environment to get started in.\n\n\nAlot of data development work happens in notebook environments like Jupyter and Colab.\n\nThe data community have discovered a really interesting way of working here.\nI'd encourage any software developer who haven't seen this before to have a look.\n\nWith an existing object detection model imported into our Colab notebook we can load one of our test images and ask the model to detect (or predict) the visible objects.\n\n![Colan prediction](colab.png)\n\nRequesting a prediction:\n\n![Model predict](predict.png)\n\nThe prediction returns a large map of results.\n\n![Prediction results](predictions.png)\n\n`detection_classes` and `detection_scores` are interesting.\n\nThis turns out to mean a 73% confidence of a class 17 object.\n\nWhat does this mean? What are classes and why are the values all below 100?\n\nThe saved model was trained on the [COCO image set](https://cocodataset.org).\nThis is a well known set of training data containing 91 unique objects (or classes).\n\nThe class ids refer to one of these COCO object types.\n\nThe COCO labels are available in the file [mscoco_label_map.pbtxt](https://github.com/tensorflow/models/blob/master/research/object_detection/data/mscoco_label_map.pbtxt):\n\n```\nitem {\n  name: \"/m/015qbp\"\n  id: 14\n  display_name: \"parking meter\"\n}\nitem {\n  name: \"/m/01yrx\"\n  id: 17\n  display_name: \"cat\"\n}\nitem {\n  name: \"/m/04dr76w\"\n  id: 44\n  display_name: \"bottle\"\n}\n```\n\n### Not cat\n\nPlotting the most confident prediction over the image:\n![Not cat](not_cat.png)\n\nLooking up class 17 in the label file we find `cat`.\n\nClose but not quite right. It looks like the model doesn't know about squirrels!\n\nLooking in the COCO labels file confirms that squirrels are not one of the classes this model was trained to detect.\nThe model can't identify squirrels because it has not been trained with examples of what squirrels look like.\n\n\n### Local detection script\n\nBack porting what we learnt in the Colab worksheet we can create a local script which can make the same prediction\nas the Colab worksheet.\n\nThere is plenty in here which I don't yet understand yet:\n\n[detect.py](detect.py)\n\n```\npython3 detect.py test-images/mostly-fox.jpg\n```\n```\nDetecting\n51.0: 0.57779586\n47.0: 0.5406496\n18.0: 0.38273856\n64.0: 0.3153658\n```\n\n### Resolving labels\n\nTensorFlow gives the impression that resolving class ids into labels (ie. `17` -\u003e `cat`) is not it's concern.\n\nWe'll need to spike out a way to use the labels file to resolve readable names for classes in the predictions returned from TensorFlow.\n\n[labels/labels.py](labels/labels.py)\n\n\n### Premature productionisation\n\nWe've now verified that we can use TensorFlow to run a pretrained model locally.\n\nThat pretrained model doesn't know about the specific animals we're interested in but it can probably be retrained.\n\nLet's move onto productionising what we have on the assumption we'll be able to improve the model later.\n\n\n## Running a model with TensorFlow Serving\n\n[TensorFlow Serving](https://www.tensorflow.org/tfx/serving/docker) claims to \"make it easy to deploy new algorithms and experiments,\nwhile keeping the same server architecture and APIs\".\n\nThis looks like exactly what we want; a way to deploy a saved model behind a REST API.\n\nA Docker container is provided. \nWe can use this as a base image to produce an image with our model baked into it:\n\n[serving/Dockerfile](serving/Dockerfile)\n\nTesting locally:\n\n`\ndocker run -p 8501:8501 -e MODEL_NAME=ssd_mobilenet_v2_320x320_coco17_tpu-8 eelpie/tensorflowserving\n`\n\nCheck the models is available at `http://localhost:8501/v1/models/ssd_mobilenet_v2_320x320_coco17_tpu-8`\n\n```\n{\n    \"model_version_status\": [\n        {\n            \"version\": \"1\",\n            \"state\": \"AVAILABLE\",\n            \"status\": {\n                \"error_code\": \"OK\",\n                \"error_message\": \"\"\n            }\n        }\n    ]\n}\n```\n\nNow we can ask for a prediction with an HTTP call rather than importing the TensorFlow model into our script:\n\n[detect_rest.py](detect_rest.py)\n\n\n## Retraining\n\nOur existing model doesn't know about squirrels. We need to retrain it.\n\n\n### Collecting training data\n\nWe can teach our object detection model about the objects we are interested in (squirrels) by\nshowing it lots of example images containing those objects.\n\nUnlike humans, animals won't generally give out personally identifying informational for free.\n\nThey will trade for food though.\nLeaving some nuts outside the window and saving the image files captured by Motion provided some initial training images.\n\nCollecting several days worth gave a collection of several hundred images with examples of most of the garden animals.\n\n\n### Annotating images\n\nTensorFlow wants a set of custom classes representing the objects we are interested in (ie. squirrel, fox etc).\n\nIt also needs a set of example images with instances of these classes highlighted.\n\nIf the question is 'where is the squirrel in this picture?' then we need to provide many examples of the correct answer.\n\nAn image annotation tool like [VoTT (Visual Object Tagging Tool)](https://github.com/microsoft/VoTT) will help here.\n\n\nTagging with VoTT:\n![Squirrel tagged in VoTT](vott-squirrel.png)\n![Fox tagged in VoTT](vott-fox.png)\n\nThese tools are optimised for smooth workflow.\nI managed to tag 230 images in 30 minutes on my first attempt.\nThis was much quicker than expected and somewhat cathartic.\n\nVoTT can export TensorFlow Records for direct import into TensorFlow.\n\n\n### Splitting the data\n\nWe need to reserve some of our data for testing (or evaluating) our retrained model.\n\nJust like a real exam it needs to be tested on questions it's not seen before.\nWe'll split the examples approximately 80% / 20% between training and evaluation data.\n\nThe evaluation data will be put aside until after training. It will then be used to \nscore the trained model on how well it preforms on images it has never seen before.\n\n\n```\ncd Squirrels-TFRecords-export\nmkdir training\nmkdir eval\nmv *.tfrecord training\nmv training/*0-00.tfrecord eval\nmv training/*1-00.tfrecord eval\n```\n\nOr as a script which can shuffle the data before splitting:\n\n[retraining/split.py](retraining/split.py)\n\nA better splitting would mightt be to get representative spread of classes into the evaluation folder\n(so that we don't produce a model which is excellent at detecting foxes but poor at squirrels). \nCould this introduce a bias towards the classes with less examples?\n\n\n### Training\n\nWith our training data prepared we need to add an existing pretrained base model,\na training pipeline to describe the training task and a checkpoint to describe the training state the existing model has already reached.\n\nThe intuition here is that our existing model has been extensively trained (at great expense) on a general set of data \n(like COCO) and therefore has some general ability at detecting objects.\n\nBy preserving this existing training and introducing a new set of objects we should be able to generate a model\nwhich works for our objects far quicker than if we tried to train from scratch.\n\nThe TensorFlow checkpoint which came with the existing model encapsulates it's existing training.\n\nWhen we start the retraining process, over the course of several hours TensorFlow will attempt to readjust the model\nparameters to minimise prediction error (or loss) against our new training images.\n\n[retraining/train.bash](retraining/train.bash)\n\nThe loss value would be expected to tread downwards during training; probably towards a value between 0.0 and 1.0.\n\n![Loss converging](loss-converging.png)\n\nWhile training TensorFlow will periodically log out a progress report.\n\n```\nI0524 17:03:34.777422 139797117065024 model_lib_v2.py:680] Step 400 per-step time 1.050s loss=1.978\n```\n\nFor a constant batch size the per-step-time should give is a rough way to compare different hardware options.\n\nComparing some of our locally available hardware:\n\n```\n4 core 3.4 GHz CPU ~ 5.0s\n2 x 10 core 2.8 GHz CPU ~ 2.7s\nGTX 1050 Ti 4Gb ~ 1.0s\n```\n\nTraining appears to benefit from higher CPU frequency but less so from more CPU cores.\nGPUs are very beneficial but smaller desktop GPUs lack the RAM to load a large object detection model.\n\n\n### Checkpoints\n\nAs it trains, TensorFlow periodically drops check points.\nThese represent the current parameter settings for the model.\nTraining is about finding the model parameters which best fit our data.\n\n![Check points](checkpoints.png)\n\nCheck points can be used to pause and resume training.\n\nThey can also be used to resume training on a faster GPU enabled cloud instance.\n\n\n### Evaluating while training\n\nWe can use the evaluation images we reserved to continually evaluate the model's accuracy as it trains.\n\n```\nexport CUDA_VISIBLE_DEVICES=-1\npython3 /usr/local/lib/python3.8/dist-packages/object_detection/model_main_tf2.py  --pipeline_config_path=retraining/squirrelnet_pipeline.config --model_dir=retraining/pretrained-models/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8 --checkpoint_dir=retraining/squirrelnet  --alsologtostderr\n```\n\nNote how we have to disable CUDA to prevent the training and evaluation processes competing for the GPU.\n\n\nThe evaluation process outputs files which the TensorBoard UI can use to show how the model's predictions change as it trains.\n\nTODO - where are these files put and tensorboard start up?\n\n![Evaluation standard output](eval_stdout.png)\n\n![TensorBoard image evaluation](eval.png)\n\n\n### Loss blow outs\n\nOccasionally the loss would explode like this:\n\n```\nINFO:tensorflow:Step 11300 per-step time 0.695s loss=0.727\nINFO:tensorflow:Step 11400 per-step time 0.703s loss=0.633\nINFO:tensorflow:Step 11500 per-step time 0.711s loss=0.619\nINFO:tensorflow:Step 11600 per-step time 0.698s loss=3.812\nINFO:tensorflow:Step 11700 per-step time 0.698s loss=5.212\nINFO:tensorflow:Step 11800 per-step time 0.703s loss=550625.562\nINFO:tensorflow:Step 11900 per-step time 0.704s loss=3951414016.000\nINFO:tensorflow:Step 12000 per-step time 0.696s loss=3848328704.000\nINFO:tensorflow:Step 12100 per-step time 0.712s loss=3739422208.000\n```\n\nReducing the training rate seems to help. This probably means there is a sharp cliff somewhere in the loss function and\nwe're falling over this edge.\n\nReducing the training rate from 0.8 to 0.2 seems to have mitigated this at the cost of much slower initial convergence.\n\nThis could be todo with small data counts for one of the classes.\n\n\n### To the Cloud\n\nGoogle Cloud lets us speed this process up by using a cloud instance with an expensive GPU attached (~ $10 per day)\n\n![GPU](gpu.png)\n\nStarting a Google Cloud instance with an Ubuntu 20.04 base image and an attached GPU,\nwe can apply all of the setup steps we worked out in [retraining/Dockerfile](retraining/Dockerfile).\n\nConfirming we have a working GPU:\n\n![CGoogle Cloud GPU](google-cloud-gpu.png)\n\nWe can create a Google Cloud machine image of the setup instance for a faster restart next time.\n\nUploading the check points from our in house training we can resume where we left off.\n\nComparing the per-step time with our local hardware the K80 looks slightly quicker.\n\n```\nI0525 09:57:42.750132 140193608288064 model_lib_v2.py:680] Step 10400 per-step time 0.734s loss=737993.750\n```\n\n\n### Exporting the model\n\nAfter training, we can export a TensorFlow saved model.\n\n[retraining/export.bash](retraining/export.bash)\n\nThe exported model is output as a ~40Mb folder containing a protobuf representation of the model and it's training state.\n\n[models/squirrelnet_ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8](models/squirrelnet_ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8)\n\nThis is quite remarkable; we've been given a set of files which knows how to detect animals.\n\nThis saved model can be loaded into TensorFlow Serving or imported directly into a python script.\n\n\n\n### Evaluating the model\n\nTODO \n\n\n\n## Inference speed\n\nIf we were doing this at scale we'd probably be interested in the inference performance; how long does it take to \ndetect objects in each captured image?\n\nWill we be able to keep up with the flow of incoming image?\nHow do CPU and GPU compare for inference speed?\n\nUsing the python API as per `detect.py` we can get some rough CPU and GPU timings:\n\n```\n3.4 GHz CPU ~ 99 ms\nGT 1030 GPU (2Gb RAM) ~ 79 ms\n```\n\nThis small GPU returns slightly quicker and probably uses less energy than the CPU.\n\nThe time difference is small compared to the total latency of our TensorFlow serving call which is taking ~ 1400 ms.\n\nMost of the latency is probably marshalling the enormous JSON response (~ 6Mb).\nAny tuning should probably happen there first.\n\nThe prediction always returns 100 predictions and many fields which we are not using. \nThe long tail is mostly useless which we're interested in the top 1 or 2 detections.\n\nWhere does this 100 sizing come from?\n\n```\nsaved_model_cli show --dir ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/saved_model/ --all\n```\n\nIt appears that the response format is very much baked into the model definition and can't be altered with query parameters.\n\nNot wanting to tackle this just now we can revert to using an in memory instance of the model in our message handler.\n\nThis was an order of magnitude faster than calling TensorFlow Serving.\n\n\n### Tweaking the model \n\n### New classes\n\nWho dis? This unknown animal was detected with high confidence is an instance of another similar class.\n\n![Not magpie](not_magpie.png)\n\nAdding this animal as a new class and retraining seems to resolve the issue.\n\n![Seagull](seagull.png)\n\nProviding the same image as a non tagged training image may also have worked but wasn't tried.\n\n\n### False positives\n\nBackground clutter; objects like washing and shadows were often detected as objects.\nThis can be improved by including images with none of the object classes in the training set.\n\nPresumably this applies a loss to these incorrect predictions which can nudge the training in a different direction.\n\n![Background with no interesting objects](images/background.jpg)\n\n\n### Putting it all together\n\nWe can now write some scripts to listen for motion messages, call the TensorFlow model for object detections\nand send notifications.\n\n[listener/listener.py](listener/listener.py)\n\nListens for motion messages on the motion MQTT topic.\nCalls the TensowFlow obejct detection model and annotates the image with the best detection. \nPublishes the detection results onto a separate detections MQTT topic.\n\n[notifications/notify.py](notifications/notify.py) \n\nListens for detection messages and publishes those which \nmeet a minimum confidence. The desired outcome is an alert on a mobile device.\nSlack proved to be more effective as it's latency is alot lower than email.\n\nCloud Build is used to package these scripts as Docker images. \nThis helps to isolate the difficult TensorFlow dependencies and makes these services easy to deploy locally.\n\n[listener/cloudbuild.yaml](listener/cloudbuild.yaml)\n\n\n### Results \n\nAfter training on ~800 images with 5 classes the model gave some surprisingly good results. \n\nSeparation of the classes was very good. Most detections with \u003e 90% confidence were correct.\n\n![Detected squirrel](images/detected_squirrel.jpg)\n\n![Detected fox](images/detected_fox.jpg)\n\n![Detected fox](images/detected_fox2.jpg)\n\n\n### Local training \n\nIn 2022 it became possible to purchase a GPU capable of local training and I needed to revisit the training pipeline.\n\nThe GPU and CUDA drivers are a large install but are included in the TensorFlow GPU base image.\nWe can install the TensorFlow dependencies on top of this base image to produce a training image.\n\nBuild the training image:\n```\ndocker build -t eelpie/training retraining/container-images/training-image/\n```\n\nThe version of the `nvidia-driver` in the Docker image must exactly match the version running on the host machine.\n\nWe want to run this training image with our project and training data mounted.\nRun the training image with the detector git project mounted as a volume at /home/retraining and the training data mounted at /home/training:\n```\ndocker run -it --gpus all --name train -v '/home/tony/git/squirrel-detector:/home' -v '/home/tony/training:/home/training' -w '/home/retraining' eelpie/training bash train.bash\n```\n\nTo reattach to the training container to monitor progress:\n```\ndocker exec -it train bash\n```\n\nTo run the evaluation process: \n\n```\ndocker exec -it train bash\nexport CUDA_VISIBLE_DEVICES=-1\npython3 /usr/local/lib/python3.8/dist-packages/object_detection/model_main_tf2.py  --pipeline_config_path=retraining/squirrelnet_pipeline.config --model_dir=retraining/pretrained-models/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8 --checkpoint_dir=retraining/squirrelnet  --alsologtostderr\n```\n\nTODO tensorboard\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftonytw1%2Fsquirrel-detector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftonytw1%2Fsquirrel-detector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftonytw1%2Fsquirrel-detector/lists"}