Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/manuelz/dlpt-semantic-segmentation
Project #4 for the OpenCV University course "Deep Learning with PyTorch".
https://github.com/manuelz/dlpt-semantic-segmentation
deep-learning python pytorch semantic-segmentation
Last synced: about 5 hours ago
JSON representation
Project #4 for the OpenCV University course "Deep Learning with PyTorch".
- Host: GitHub
- URL: https://github.com/manuelz/dlpt-semantic-segmentation
- Owner: ManuelZ
- Created: 2024-08-16T21:11:44.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2024-09-12T17:11:25.000Z (14 days ago)
- Last Synced: 2024-09-26T20:04:12.739Z (about 5 hours ago)
- Topics: deep-learning, python, pytorch, semantic-segmentation
- Language: Jupyter Notebook
- Homepage:
- Size: 202 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Semantic Segmentation: Drone aerial views
This is the fourth project of the Opencv University course ["Deep Learning with PyTorch"](https://opencv.org/university/deep-learning-with-pytorch/).
It focuses on applying semantic segmentation on images taken from drones to differentiate between 12 classes.## Introduction
Semantic segmentation is a task in computer vision, where the objective is to assign a class label to every pixel in an
image. This project focuses on classifying the pixels of images taken from drones into 12 classes.## Data
The project uses a dataset of 3269 images of size 1280(W) x 720(H), taken by drones, and annotated image masks for
the following 12 classes:background, person, bike, car, drone, boat, animal, obstacle, construction, vegetation, road, sky
Examples:
![img.png](media/img.png)## The method used
Fine-tuning of a DeepLabV3 ResNet-101 pre-trained model using a custom PyTorch training loop. The objective was to learn
how to manually implement all the required steps, particularly the ones of the training loop.- The dataset was split using a stratified shuffle split scheme into train and validation subsets with 80% and 20% of the
available data, respectively. The stratification was done based on the presence or not of a class in each image.- Various augmentations techniques were used to try to improve generalization.
- The loss function used was an equally weighted combination of the Focal Loss and the Soft Dice Loss:
- The Focal Loss is a modification of the Cross-Entropy loss focused on learning from hard negative examples.
- The Soft Dice Loss is effective in addressing the challenge of imbalanced foreground and background regions.- An SGD optimizer using the setup used by the YOLOv5 training script, where three parameter groups are defined for
different weight decay configurations.- A learning rate scheduler that implements the 1-cycle policy. It adjusts the learning rate from an initial rate to a
maximum, then decreases it to a much lower minimum.- The custom training loop includes:
- updating the optimizer learning rate by using a LR scheduler
- gradient accumulation
- evaluation on the validation set
- tracking of training losses and scores
- tracking of validation losses and scores
- tracking of per-class scores## Discussion
The model used is DeeplabV3, trained for 40 epochs, which resulted in a Dice Score of `0.60363` on the test set.
Further improvements to the data splitting process could incorporate the pixel count for each class in every image,
so that the images are distributed in a way that considers the occurrence of each class, weighted by the size of
the objects.See the [notebook](project-4-deep-learning-with-pytorch-2024.ipynb).