{"id":23399663,"url":"https://github.com/filipbasara0/simple-object-detection","last_synced_at":"2025-04-11T18:07:39.719Z","repository":{"id":154306321,"uuid":"566221502","full_name":"filipbasara0/simple-object-detection","owner":"filipbasara0","description":"A simple yet effective repo for object detection based on the FCOS architecture.","archived":false,"fork":false,"pushed_at":"2023-10-30T12:47:13.000Z","size":331,"stargazers_count":16,"open_issues_count":1,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-11T18:04:18.318Z","etag":null,"topics":["carla","carla-driving-simulator","carla-simulator","computer-vision","convolutional-neural-networks","deep-learning","object-detection","pascal-voc","pytorch","traffic-light","traffic-light-detection"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/filipbasara0.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-11-15T08:09:31.000Z","updated_at":"2025-03-25T14:06:00.000Z","dependencies_parsed_at":null,"dependency_job_id":"ed558e74-5f57-41fc-8d6b-896c46c243aa","html_url":"https://github.com/filipbasara0/simple-object-detection","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filipbasara0%2Fsimple-object-detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filipbasara0%2Fsimple-object-detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filipbasara0%2Fsimple-object-detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filipbasara0%2Fsimple-object-detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/filipbasara0","download_url":"https://codeload.github.com/filipbasara0/simple-object-detection/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248456374,"owners_count":21106602,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["carla","carla-driving-simulator","carla-simulator","computer-vision","convolutional-neural-networks","deep-learning","object-detection","pascal-voc","pytorch","traffic-light","traffic-light-detection"],"created_at":"2024-12-22T10:15:33.637Z","updated_at":"2025-04-11T18:07:39.685Z","avatar_url":"https://github.com/filipbasara0.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Simple Object Detection\n\n![comb1](https://user-images.githubusercontent.com/29043871/201990619-639dc153-1dff-48c7-bd4b-518ebbc1c51e.png)\n\nA minimal object detection repository.\n\nWhile reading papers and browsing repos to refresh my computer vision knowledge, i noticed that most object detection repos are complicating and cluttered with code - which makes it difficult to understand how object detection works end to end.\n\nThis repo should provide a simple and clear understanding on how to tackle the object detection problem. It's like a minimal template for object detection problems.\n\nThe aim was to make it easy to use, understand and customize for your own problems or datasets.\n\nRepo is mostly based on the [FCOS architecture](https://arxiv.org/abs/1904.01355).\n\n**All training was done from scratch, without pretrained models or additional data.**\n\n## Setup\n\n1. `git clone git@github.com:filipbasara0/simple-object-detection.git`\n2. create virtual environment: `virtualenv -p python3.8 env`\n3. activate virtual environment: `source env/bin/activate`\n4. install requirements: `pip install -r requirements.txt`\n\n## Usage\n\n### Training\n\n```\npython train.py --resolution=480 --dataset=\"pascal_voc_2012\"   --output_dir=\"trained_models/model.pth\"   --train_batch_size=8 --eval_batch_size=8   --num_epochs=81 --learning_rate=1e-3 --save_model_epochs=1 --num_classes=19 --adam_weight_decay=5e-2\n```\n\n### Inference\n\n```python\nfrom inference.load import load_model, load_image\nfrom datasets import reverse_transform_classes\nfrom utils import draw_bboxes\n\n# load a model\npredictor = load_model(\"path/to/model.pth\", num_classes=19)\n\n# load an image\nimage = load_image(\"path/to/img.jpg\", image_size=480)\n\n# obtain results\npreds = predictor(image)\nbboxes = preds[\"predicted_boxes\"]\nscores = preds[\"scores\"]\nclasses = reverse_transform_classes(preds[\"pred_classes\"], \"pascal_voc_2012\")\n\n# optional - visualize predictions\nimage = image[0].permute(1, 2, 0).detach().cpu().numpy()\ndraw_bboxes(f\"./path/to/visualized.jpg\", image, bboxes[0], scores[0], classes[0])\n```\n\n### Create your own Dataset\n\nTo add a new dataset, create a file `datasets/my_dataset.py`. In `datasets/my_dataset.py`, you should create a class that contains two methods - `get_transforms` for training augmentations (can be `None` if you don't need them) and `load_data`:\n\n```python\nclass MyDataset:\n\n    def load_data(self, dataset_path, labels):\n        # load the dataset and return it in the format specified below\n        ...\n\n    def get_transforms(self):\n        # return transforms (just return None if you don't need any)\n        ...\n```\n\n`load_data` should return the dataset in the following format:\n\n```python\n[\n    ...,\n    {\n        \"image_path\": \"path/to/my/image.jpg\",\n        \"target\": [..., [x1,y1,x2,y2,C]]\n    }\n]\n```\n\nx1, y1 and x2,y2 represent top left and bottom right corners of your target bboxes, while C represents a label encoding of your target class `(1,2,...len(C))`. Element 0 is reserved for the `__background__` class, which is used to filter negative samples when preparing the training labels.\n\nFinally, in `datasets/datasets.py` add a new entry to the `DATASETS` dict with thet following fields\n\n- `dataset_path` - path to your dataset metadata (`image_path` and `target`)\n- `class_name` - class name for you dataset\n- `labels` - list of labels - first element of the list should be the `__background__` class (see Pascal and Carla labels in `datasets/datasets.py`)\n\n## Results\n\n### PascalVOC 2012\n\nTraining used extensive data augmentation - random horizontal flipping, scaling, translation, rotation, shearing and HSV. Images were resized to maintain the aspect ratio, using the `letterbox` method.\n\nAdditional augmentation such as noise injection, blurring, cropping, (blocks/center) erasing, ... could result in better overall performance.\n\nBackbone architecture is the same as `ConvNext-Tiny`:\n\n- Patch size: `4`\n- Layer depths: `[3, 3, 9, 3]`\n- Block dims: `[96, 192, 384, 768]`\n- Image sizes: `384`, `416` and `480`\n- Model resulted in `25M` params\n\nIt was trained for 100 epochs and obtained a mAP of 40 on a small eval dataset.\nTraining took ~30 hours on a GTX1070Ti.\n\nTraining bigger models for longer would definitely yield better results.\n\n![comb2](https://user-images.githubusercontent.com/29043871/201991539-072d7c45-faff-4c38-8731-5ce4330c72e1.png)\n![comb3](https://user-images.githubusercontent.com/29043871/201994865-4c88a2a7-74eb-4f14-86eb-cd26a951dee4.png)\n\n### Carla Traffic Lights\n\nModel with the same specification as above was trained for 50 epochs and obtained a mAP of 60 on a small eval dataset.\nTraining took 3 hours on a GTX1070Ti.\n\nDataset collected by myself in the CARLA simulator can be found [here](https://drive.google.com/drive/folders/1TXkPLWlNgauPhQnKEoPDZsx7Px1MD9n_?usp=sharing), annotations can be found [here](https://github.com/affinis-lab/traffic-light-detection-module/blob/master/dataset/carla_all.csv).\n\nPretrained model can be found [here](https://drive.google.com/file/d/17mcQ-Ct6bUTS8BEpeDjaZMIFmHS2gptl/view?usp=share_link).\n\n![comb4](https://user-images.githubusercontent.com/29043871/201992324-4323166d-e207-417d-9fe9-8265b885d0fe.png)\n![comb5](https://user-images.githubusercontent.com/29043871/201992330-e6929134-b639-4744-9a75-108da64ed033.png)\n![comb6](https://user-images.githubusercontent.com/29043871/201992333-f6d32332-b7cd-40c9-a82d-049fe1c567ca.png)\n\nAmazingly, the model can even detect IRL traffic lights (although with a lower confidence):\n\n![comb7](https://user-images.githubusercontent.com/29043871/201992833-011f521c-1acd-44bc-b372-135e44940dbb.png)\n![comb8](https://user-images.githubusercontent.com/29043871/201992839-ba3134f2-e86f-49f0-a872-77d4aba980d5.png)\n\n### Usage for Carla traffic light detection\n```python\nfrom inference.load import load_model, load_image\nfrom datasets import reverse_transform_classes\nfrom utils import draw_bboxes\n\n# load a model (download from link above - https://drive.google.com/file/d/17mcQ-Ct6bUTS8BEpeDjaZMIFmHS2gptl/view?usp=share_link)\npredictor = load_model(\"/path/to/fcos-carla-v01.pth\", num_classes=2)\n\n# load an image\nimage = load_image(\"path/to/img.jpg\", image_size=480)\n\n# obtain results\npreds = predictor(image)\nbboxes = preds[\"predicted_boxes\"]\nscores = preds[\"scores\"]\nclasses = reverse_transform_classes(preds[\"pred_classes\"], \"carla_traffic_lights\")\n\n# optional - visualize predictions\nimage = image[0].permute(1, 2, 0).detach().cpu().numpy()\ndraw_bboxes(f\"./path/to/visualized.jpg\", image, bboxes[0], scores[0], classes[0])\n```\n\n## To Do\n\n- Add support for segmentation\n- Add DETR\n- Train on COCO (once i manage to get some better hardware)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffilipbasara0%2Fsimple-object-detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffilipbasara0%2Fsimple-object-detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffilipbasara0%2Fsimple-object-detection/lists"}