https://github.com/filipbasara0/simple-object-detection

A simple yet effective repo for object detection based on the FCOS architecture.
https://github.com/filipbasara0/simple-object-detection

carla carla-driving-simulator carla-simulator computer-vision convolutional-neural-networks deep-learning object-detection pascal-voc pytorch traffic-light traffic-light-detection

Last synced: about 1 year ago
JSON representation

A simple yet effective repo for object detection based on the FCOS architecture.

Host: GitHub
URL: https://github.com/filipbasara0/simple-object-detection
Owner: filipbasara0
License: mit
Created: 2022-11-15T08:09:31.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-10-30T12:47:13.000Z (over 2 years ago)
Last Synced: 2025-04-11T18:04:18.318Z (about 1 year ago)
Topics: carla, carla-driving-simulator, carla-simulator, computer-vision, convolutional-neural-networks, deep-learning, object-detection, pascal-voc, pytorch, traffic-light, traffic-light-detection
Language: Python
Homepage:
Size: 323 KB
Stars: 16
Watchers: 1
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Simple Object Detection

![comb1](https://user-images.githubusercontent.com/29043871/201990619-639dc153-1dff-48c7-bd4b-518ebbc1c51e.png)

A minimal object detection repository.

While reading papers and browsing repos to refresh my computer vision knowledge, i noticed that most object detection repos are complicating and cluttered with code - which makes it difficult to understand how object detection works end to end.

This repo should provide a simple and clear understanding on how to tackle the object detection problem. It's like a minimal template for object detection problems.

The aim was to make it easy to use, understand and customize for your own problems or datasets.

Repo is mostly based on the [FCOS architecture](https://arxiv.org/abs/1904.01355).

**All training was done from scratch, without pretrained models or additional data.**

## Setup

1. `git clone git@github.com:filipbasara0/simple-object-detection.git`

2. create virtual environment: `virtualenv -p python3.8 env`

3. activate virtual environment: `source env/bin/activate`

4. install requirements: `pip install -r requirements.txt`

## Usage

### Training

```

python train.py --resolution=480 --dataset="pascal_voc_2012"   --output_dir="trained_models/model.pth"   --train_batch_size=8 --eval_batch_size=8   --num_epochs=81 --learning_rate=1e-3 --save_model_epochs=1 --num_classes=19 --adam_weight_decay=5e-2

```

### Inference

```python

from inference.load import load_model, load_image

from datasets import reverse_transform_classes

from utils import draw_bboxes

# load a model

predictor = load_model("path/to/model.pth", num_classes=19)

# load an image

image = load_image("path/to/img.jpg", image_size=480)

# obtain results

preds = predictor(image)

bboxes = preds["predicted_boxes"]

scores = preds["scores"]

classes = reverse_transform_classes(preds["pred_classes"], "pascal_voc_2012")

# optional - visualize predictions

image = image[0].permute(1, 2, 0).detach().cpu().numpy()

draw_bboxes(f"./path/to/visualized.jpg", image, bboxes[0], scores[0], classes[0])

```

### Create your own Dataset

To add a new dataset, create a file `datasets/my_dataset.py`. In `datasets/my_dataset.py`, you should create a class that contains two methods - `get_transforms` for training augmentations (can be `None` if you don't need them) and `load_data`:

```python

class MyDataset:

    def load_data(self, dataset_path, labels):

        # load the dataset and return it in the format specified below

        ...

    def get_transforms(self):

        # return transforms (just return None if you don't need any)

        ...

```

`load_data` should return the dataset in the following format:

```python

[

    ...,

    {

        "image_path": "path/to/my/image.jpg",

        "target": [..., [x1,y1,x2,y2,C]]

    }

]

```

x1, y1 and x2,y2 represent top left and bottom right corners of your target bboxes, while C represents a label encoding of your target class `(1,2,...len(C))`. Element 0 is reserved for the `__background__` class, which is used to filter negative samples when preparing the training labels.

Finally, in `datasets/datasets.py` add a new entry to the `DATASETS` dict with thet following fields

- `dataset_path` - path to your dataset metadata (`image_path` and `target`)

- `class_name` - class name for you dataset

- `labels` - list of labels - first element of the list should be the `__background__` class (see Pascal and Carla labels in `datasets/datasets.py`)

## Results

### PascalVOC 2012

Training used extensive data augmentation - random horizontal flipping, scaling, translation, rotation, shearing and HSV. Images were resized to maintain the aspect ratio, using the `letterbox` method.

Additional augmentation such as noise injection, blurring, cropping, (blocks/center) erasing, ... could result in better overall performance.

Backbone architecture is the same as `ConvNext-Tiny`:

- Patch size: `4`

- Layer depths: `[3, 3, 9, 3]`

- Block dims: `[96, 192, 384, 768]`

- Image sizes: `384`, `416` and `480`

- Model resulted in `25M` params

It was trained for 100 epochs and obtained a mAP of 40 on a small eval dataset.

Training took ~30 hours on a GTX1070Ti.

Training bigger models for longer would definitely yield better results.

![comb2](https://user-images.githubusercontent.com/29043871/201991539-072d7c45-faff-4c38-8731-5ce4330c72e1.png)

![comb3](https://user-images.githubusercontent.com/29043871/201994865-4c88a2a7-74eb-4f14-86eb-cd26a951dee4.png)

### Carla Traffic Lights

Model with the same specification as above was trained for 50 epochs and obtained a mAP of 60 on a small eval dataset.

Training took 3 hours on a GTX1070Ti.

Dataset collected by myself in the CARLA simulator can be found [here](https://drive.google.com/drive/folders/1TXkPLWlNgauPhQnKEoPDZsx7Px1MD9n_?usp=sharing), annotations can be found [here](https://github.com/affinis-lab/traffic-light-detection-module/blob/master/dataset/carla_all.csv).

Pretrained model can be found [here](https://drive.google.com/file/d/17mcQ-Ct6bUTS8BEpeDjaZMIFmHS2gptl/view?usp=share_link).

![comb4](https://user-images.githubusercontent.com/29043871/201992324-4323166d-e207-417d-9fe9-8265b885d0fe.png)

![comb5](https://user-images.githubusercontent.com/29043871/201992330-e6929134-b639-4744-9a75-108da64ed033.png)

![comb6](https://user-images.githubusercontent.com/29043871/201992333-f6d32332-b7cd-40c9-a82d-049fe1c567ca.png)

Amazingly, the model can even detect IRL traffic lights (although with a lower confidence):

![comb7](https://user-images.githubusercontent.com/29043871/201992833-011f521c-1acd-44bc-b372-135e44940dbb.png)

![comb8](https://user-images.githubusercontent.com/29043871/201992839-ba3134f2-e86f-49f0-a872-77d4aba980d5.png)

### Usage for Carla traffic light detection

```python

from inference.load import load_model, load_image

from datasets import reverse_transform_classes

from utils import draw_bboxes

# load a model (download from link above - https://drive.google.com/file/d/17mcQ-Ct6bUTS8BEpeDjaZMIFmHS2gptl/view?usp=share_link)

predictor = load_model("/path/to/fcos-carla-v01.pth", num_classes=2)

# load an image

image = load_image("path/to/img.jpg", image_size=480)

# obtain results

preds = predictor(image)

bboxes = preds["predicted_boxes"]

scores = preds["scores"]

classes = reverse_transform_classes(preds["pred_classes"], "carla_traffic_lights")

# optional - visualize predictions

image = image[0].permute(1, 2, 0).detach().cpu().numpy()

draw_bboxes(f"./path/to/visualized.jpg", image, bboxes[0], scores[0], classes[0])

```

## To Do

- Add support for segmentation

- Add DETR

- Train on COCO (once i manage to get some better hardware)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/filipbasara0/simple-object-detection

Awesome Lists containing this project

README