https://github.com/reshalfahsi/instance-segmentation-vit-maskrcnn

Instance Segmentation Using ViT-based Mask R-CNN
https://github.com/reshalfahsi/instance-segmentation-vit-maskrcnn

instance-segmentation mask-rcnn penn-fudan-database penn-fudan-dataset pytorch vision-transformer

Last synced: 11 days ago
JSON representation

Instance Segmentation Using ViT-based Mask R-CNN

Host: GitHub
URL: https://github.com/reshalfahsi/instance-segmentation-vit-maskrcnn
Owner: reshalfahsi
Created: 2024-04-14T04:41:04.000Z (about 1 year ago)
Default Branch: master
Last Pushed: 2025-03-20T01:22:38.000Z (4 months ago)
Last Synced: 2025-05-18T14:48:52.434Z (about 2 months ago)
Topics: instance-segmentation, mask-rcnn, penn-fudan-database, penn-fudan-dataset, pytorch, vision-transformer
Language: Jupyter Notebook
Homepage:
Size: 2.9 MB
Stars: 7
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Instance Segmentation Using ViT-based Mask R-CNN

Instance segmentation aims at dichotomizing a pixel acting as a sub-object of a unique entity in the scene. One of the approaches, which combines object detection and semantic segmentation, is Mask R-CNN. Furthermore, we can also incorporate ViT as the backbone of Mask R-CNN. In this project, the pre-trained ViT-based Mask R-CNN model is fine-tuned and evaluated on the dataset from the Penn-Fudan Database for Pedestrian Detection and Segmentation. With a ratio of 80:10:10, the train, validation, and test sets are distributed.

## Experiment

Leap into this [link](https://github.com/reshalfahsi/instance-segmentation-vit-maskrcnn/blob/master/Instance_Segmentation_Using_ViT_based_Mask_RCNN.ipynb) that harbors a Jupyter Notebook of the entire experiment.

## Result

## Quantitative Result

The following table delivers the performance results of ViT-based Mask R-CNN, quantitatively.

Test Metric | Score
------------------------------ | -------------
mAP^box@0.5:0.95 | 96.85%
mAP^mask@0.5:0.95 | 79.58%

## Loss Curve

loss_curve
Loss curves of ViT-based Mask R-CNN on the Penn-Fudan Database for Pedestrian Detection and Segmentation train and validation sets.

## Qualitative Result

Below, the qualitative results are presented.

qualitative-1 qualitative-2 qualitative-3 qualitative-4 qualitative-5 qualitative-6 qualitative-7
Few samples of qualitative results from the ViT-based Mask R-CNN model.

## Credit

- [An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/pdf/2010.11929.pdf)
- [Mask R-CNN](https://arxiv.org/pdf/1703.06870.pdf)
- [Benchmarking Detection Transfer Learning with Vision Transformers](https://arxiv.org/pdf/2111.11429.pdf)
- [TorchVision's Mask R-CNN](https://github.com/pytorch/vision/blob/main/torchvision/models/detection/mask_rcnn.py)
- [TorchVision Object Detection Finetuning Tutorial](https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html)
- [Penn-Fudan Database for Pedestrian Detection and Segmentation](https://www.cis.upenn.edu/~jshi/ped_html/)
- [PyTorch Lightning](https://lightning.ai/docs/pytorch/latest/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/reshalfahsi/instance-segmentation-vit-maskrcnn

Awesome Lists containing this project

README