Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/reshalfahsi/instance-segmentation-vit-maskrcnn
Instance Segmentation Using ViT-based Mask R-CNN
https://github.com/reshalfahsi/instance-segmentation-vit-maskrcnn
instance-segmentation mask-rcnn penn-fudan-database penn-fudan-dataset vision-transformer
Last synced: about 2 months ago
JSON representation
Instance Segmentation Using ViT-based Mask R-CNN
- Host: GitHub
- URL: https://github.com/reshalfahsi/instance-segmentation-vit-maskrcnn
- Owner: reshalfahsi
- Created: 2024-04-14T04:41:04.000Z (9 months ago)
- Default Branch: master
- Last Pushed: 2024-04-14T04:58:07.000Z (9 months ago)
- Last Synced: 2024-04-14T20:52:02.999Z (9 months ago)
- Topics: instance-segmentation, mask-rcnn, penn-fudan-database, penn-fudan-dataset, vision-transformer
- Language: Jupyter Notebook
- Homepage:
- Size: 2.57 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Instance Segmentation Using ViT-based Mask R-CNN
Instance segmentation aims at dichotomizing a pixel acting as a sub-object of a unique entity in the scene. One of the approaches, which combines object detection and semantic segmentation, is Mask R-CNN. Furthermore, we can also incorporate ViT as the backbone of Mask R-CNN. In this project, the pre-trained ViT-based Mask R-CNN model is fine-tuned and evaluated on the dataset from the Penn-Fudan Database for Pedestrian Detection and Segmentation. With a ratio of 80:10:10, the train, validation, and test sets are distributed.
## Experiment
Leap into this [link](https://github.com/reshalfahsi/instance-segmentation-vit-maskrcnn/blob/master/Instance_Segmentation_Using_ViT_based_Mask_RCNN.ipynb) that harbors a Jupyter Notebook of the entire experiment.
## Result
## Quantitative Result
The following table delivers the performance results of ViT-based Mask R-CNN, quantitatively.
Test Metric | Score
------------------------------ | -------------
mAPbox@0.5:0.95 | 96.85%
mAPmask@0.5:0.95 | 79.58%## Loss Curve
Loss curves of ViT-based Mask R-CNN on the Penn-Fudan Database for Pedestrian Detection and Segmentation train and validation sets.## Qualitative Result
Below, the qualitative results are presented.
Few samples of qualitative results from the ViT-based Mask R-CNN model.## Credit
- [An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/pdf/2010.11929.pdf)
- [Mask R-CNN](https://arxiv.org/pdf/1703.06870.pdf)
- [Benchmarking Detection Transfer Learning with Vision Transformers](https://arxiv.org/pdf/2111.11429.pdf)
- [TorchVision's Mask R-CNN](https://github.com/pytorch/vision/blob/main/torchvision/models/detection/mask_rcnn.py)
- [TorchVision Object Detection Finetuning Tutorial](https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html)
- [Penn-Fudan Database for Pedestrian Detection and Segmentation](https://www.cis.upenn.edu/~jshi/ped_html/)
- [PyTorch Lightning](https://lightning.ai/docs/pytorch/latest/)