{"id":21622426,"url":"https://github.com/zafarrehan/custom_od_architecture_from_scratch","last_synced_at":"2026-05-17T20:02:33.951Z","repository":{"id":62296649,"uuid":"553325319","full_name":"zafarRehan/custom_OD_architecture_from_scratch","owner":"zafarRehan","description":"This repository walks you through creating your own custom One-Stage object detection model architecture ( in keras ) , with a synthetic dataset generator on board for training and evaluation ","archived":false,"fork":false,"pushed_at":"2022-10-30T12:33:50.000Z","size":49158,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-24T22:09:01.181Z","etag":null,"topics":["anchor-box","custom-architecture","jupyter-notebook","keras","non-maximum-suppression","object-detection","synthetic-dataset-generation","tensorflow2"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zafarRehan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-10-18T03:36:49.000Z","updated_at":"2023-07-29T17:16:52.000Z","dependencies_parsed_at":"2023-01-20T23:47:17.373Z","dependency_job_id":null,"html_url":"https://github.com/zafarRehan/custom_OD_architecture_from_scratch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zafarRehan%2Fcustom_OD_architecture_from_scratch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zafarRehan%2Fcustom_OD_architecture_from_scratch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zafarRehan%2Fcustom_OD_architecture_from_scratch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zafarRehan%2Fcustom_OD_architecture_from_scratch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zafarRehan","download_url":"https://codeload.github.com/zafarRehan/custom_OD_architecture_from_scratch/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244287026,"owners_count":20428885,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anchor-box","custom-architecture","jupyter-notebook","keras","non-maximum-suppression","object-detection","synthetic-dataset-generation","tensorflow2"],"created_at":"2024-11-25T00:08:46.195Z","updated_at":"2026-05-17T20:02:28.897Z","avatar_url":"https://github.com/zafarRehan.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# custom One - Stage Object Detector from scratch\n\nOne stage object detectors are the perfect ones which predicts with high accuracy as well as high speed.\nThe most famous One Stage detectors are SSD and YOLO family. \nA comparision between slight differences which they have and their overall working is explained here:  https://machinethink.net/blog/object-detection. I would highly recommend beginners as well as experienced ones to freshen up their concepts before proceeding with the \u003ca href=\"full_object_detection_code.ipynb\"\u003eNotebook\u003c/a\u003e\n\nThe model is builts using Tensorflow and Keras\n\n\n## Classes\nThe model classifies 16 different categories which includes 15 animals shown below and 1 background class\n\u003cimg src=\"all_classes.png\" width=700/\u003e\n\n## Architecture\nThe model architecture is a rather simple one but works good enough to begin with.\n\u003cimg src=\"model architecture.png\" width=700/\u003e\n\n### Model Summary\n\n    Model: \"model_1\"\n    __________________________________________________________________________________________________\n     Layer (type)                   Output Shape         Param #     Connected to                     \n    ==================================================================================================\n     image (InputLayer)             [(None, 320, 320, 3  0           []                               \n                                    )]                                                                \n\n     conv2d_1 (Conv2D)              (None, 320, 320, 16  448         ['image[0][0]']                  \n                                    )                                                                 \n\n     maxpool2d_1 (MaxPooling2D)     (None, 160, 160, 16  0           ['conv2d_1[0][0]']               \n                                    )                                                                 \n\n     batchnorm_1 (BatchNormalizatio  (None, 160, 160, 16  64         ['maxpool2d_1[0][0]']            \n     n)                             )                                                                 \n\n     conv2d_2 (Conv2D)              (None, 160, 160, 32  4640        ['batchnorm_1[0][0]']            \n                                    )                                                                 \n\n     maxpool2d_2 (MaxPooling2D)     (None, 80, 80, 32)   0           ['conv2d_2[0][0]']               \n\n     batchnorm_2 (BatchNormalizatio  (None, 80, 80, 32)  128         ['maxpool2d_2[0][0]']            \n     n)                                                                                               \n\n     conv2d_3 (Conv2D)              (None, 80, 80, 64)   18496       ['batchnorm_2[0][0]']            \n\n     maxpool2d_3 (MaxPooling2D)     (None, 40, 40, 64)   0           ['conv2d_3[0][0]']               \n\n     batchnorm_3 (BatchNormalizatio  (None, 40, 40, 64)  256         ['maxpool2d_3[0][0]']            \n     n)                                                                                               \n\n     conv2d_4 (Conv2D)              (None, 40, 40, 128)  73856       ['batchnorm_3[0][0]']            \n\n     maxpool2d_4 (MaxPooling2D)     (None, 20, 20, 128)  0           ['conv2d_4[0][0]']               \n\n     batchnorm_4 (BatchNormalizatio  (None, 20, 20, 128)  512        ['maxpool2d_4[0][0]']            \n     n)                                                                                               \n\n     conv2d_5 (Conv2D)              (None, 20, 20, 256)  295168      ['batchnorm_4[0][0]']            \n\n     maxpool2d_5 (MaxPooling2D)     (None, 10, 10, 256)  0           ['conv2d_5[0][0]']               \n\n     batchnorm_5 (BatchNormalizatio  (None, 10, 10, 256)  1024       ['maxpool2d_5[0][0]']            \n     n)                                                                                               \n\n     conv2d_6 (Conv2D)              (None, 10, 10, 256)  590080      ['batchnorm_5[0][0]']            \n\n     maxpool2d_6 (MaxPooling2D)     (None, 5, 5, 256)    0           ['conv2d_6[0][0]']               \n\n     batchnorm_6 (BatchNormalizatio  (None, 5, 5, 256)   1024        ['maxpool2d_6[0][0]']            \n     n)                                                                                               \n\n     conv2d_7 (Conv2D)              (None, 3, 3, 256)    590080      ['batchnorm_6[0][0]']            \n\n     conv2d_8 (Conv2D)              (None, 1, 1, 512)    1180160     ['conv2d_7[0][0]']               \n\n     box_20x20 (Conv2D)             (None, 20, 20, 16)   18448       ['maxpool2d_4[0][0]']            \n\n     box_10x10 (Conv2D)             (None, 10, 10, 16)   36880       ['maxpool2d_5[0][0]']            \n\n     box_5x5 (Conv2D)               (None, 5, 5, 16)     36880       ['maxpool2d_6[0][0]']            \n\n     box_3x3 (Conv2D)               (None, 3, 3, 16)     36880       ['conv2d_7[0][0]']               \n\n     box_1x1 (Conv2D)               (None, 1, 1, 16)     73744       ['conv2d_8[0][0]']               \n\n     class_20x20 (Conv2D)           (None, 20, 20, 64)   73792       ['maxpool2d_4[0][0]']            \n\n     class_10x10 (Conv2D)           (None, 10, 10, 64)   147520      ['maxpool2d_5[0][0]']            \n\n     class_5x5 (Conv2D)             (None, 5, 5, 64)     147520      ['maxpool2d_6[0][0]']            \n\n     class_3x3 (Conv2D)             (None, 3, 3, 64)     147520      ['conv2d_7[0][0]']               \n\n     class_1x1 (Conv2D)             (None, 1, 1, 64)     294976      ['conv2d_8[0][0]']               \n\n     box_20x20_reshape (Reshape)    (None, 1600, 4)      0           ['box_20x20[0][0]']              \n\n     box_10x10_reshape (Reshape)    (None, 400, 4)       0           ['box_10x10[0][0]']              \n\n     box_5x5_reshape (Reshape)      (None, 100, 4)       0           ['box_5x5[0][0]']                \n\n     box_3x3_reshape (Reshape)      (None, 36, 4)        0           ['box_3x3[0][0]']                \n\n     box_1x1_reshape (Reshape)      (None, 4, 4)         0           ['box_1x1[0][0]']                \n\n     class_20x20_reshape (Reshape)  (None, 1600, 16)     0           ['class_20x20[0][0]']            \n\n     class_10x10_reshape (Reshape)  (None, 400, 16)      0           ['class_10x10[0][0]']            \n\n     class_5x5_reshape (Reshape)    (None, 100, 16)      0           ['class_5x5[0][0]']              \n\n     class_3x3_reshape (Reshape)    (None, 36, 16)       0           ['class_3x3[0][0]']              \n\n     class_1x1_reshape (Reshape)    (None, 4, 16)        0           ['class_1x1[0][0]']              \n\n     box_out (Concatenate)          (None, 2140, 4)      0           ['box_20x20_reshape[0][0]',      \n                                                                      'box_10x10_reshape[0][0]',      \n                                                                      'box_5x5_reshape[0][0]',        \n                                                                      'box_3x3_reshape[0][0]',        \n                                                                      'box_1x1_reshape[0][0]']        \n\n     class_out (Concatenate)        (None, 2140, 16)     0           ['class_20x20_reshape[0][0]',    \n                                                                      'class_10x10_reshape[0][0]',    \n                                                                      'class_5x5_reshape[0][0]',      \n                                                                      'class_3x3_reshape[0][0]',      \n                                                                      'class_1x1_reshape[0][0]']      \n\n     final_output (Concatenate)     (None, 2140, 20)     0           ['box_out[0][0]',                \n                                                                      'class_out[0][0]']              \n\n    ==================================================================================================\n    Total params: 3,770,096\n    Trainable params: 3,768,592\n    Non-trainable params: 1,504\n    __________________________________________________________________________________________________\n\n\n## Input and Output\nInput is an Image of shape 320 x 320 x 3 for inferencing,\nalong with 4 sets of one hot encoded class and 4 sets of bounding boxes per image in case of training (4 is the the value I used it can be more or less doesnt matters)\n\n### SAMPLE INPUT\n\u003cimg src=\"sample_input_viz.png\" width=700/\u003e\n\n### SAMPLE OUTPUT\n\u003cimg src=\"sample_output_viz.png\" width=700/\u003e\n\n\n## Performance\n\nFor model performance evaluation I used this repo: https://github.com/Cartucho/mAP\nwhich calculates the model mAP. I calculated mAP@0.5 and here are the results:\n\u003cimg src=\"mAP.png\" width=700/\u003e\n\nThe results doesn't looks good enough but it was result of the simple model with just 3.7M trainable parameters trained for 100 epochs at 100 iterations each.\nRemember this repo is not about creating the best model (that may come later), but it's to give you the starting point to test your own Architecture for Object Detection. I learnt many things building it and I am sure you will too.\n\n\n## Usage\n\nGoto the \u003ca href=\"full_object_detection_code.ipynb\"\u003eNotebook\u003c/a\u003e, I have tried to document it as good as I can.\nOpen the notebook in colab and click on Runtime-\u003eRun all and watch a new model being trained from scratch.\n\n\n## What Next?\n\nIf you really want to understand that how exactly Single Stage Object Detection works or how Object Detection works in general, spend some time with this Notebook, and also try your own Architecture and find out how well that works. \n\n1. There is Data Generator in place\u003cbr/\u003e\n2. There is Anchor Generator in place\u003cbr/\u003e\n3. There are Losses and Metrices in place\u003cbr/\u003e\n4. There is Inference and Visualization in place\u003cbr/\u003e\n5. There is Model Evaluation in place\u003cbr/\u003e\n\nNow all you need is to dig deep into it and create your own Object Detection Architecture.\n\nSome tips to improve performance of model are:\n\n1. Introduce more layers i.e. deepen the architecture\n2. Introduce Dropout Layers\n3. Introduce skip connections, depth wise convulations etc.\n4. Do some reserach on your own.\n\nThat's all folks hope you learn something from it. Please leave a star if it helped in anyway. \nTHANKS\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzafarrehan%2Fcustom_od_architecture_from_scratch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzafarrehan%2Fcustom_od_architecture_from_scratch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzafarrehan%2Fcustom_od_architecture_from_scratch/lists"}