https://github.com/yiyun-liang/cityscapes-segmentation
Experiments of different pretext task on semantic segmentation and future frame prediction
https://github.com/yiyun-liang/cityscapes-segmentation
deeplabv3 semantic-segmentation triplet-loss unet
Last synced: 7 months ago
JSON representation
Experiments of different pretext task on semantic segmentation and future frame prediction
- Host: GitHub
- URL: https://github.com/yiyun-liang/cityscapes-segmentation
- Owner: Yiyun-Liang
- Created: 2020-04-27T16:08:03.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-06-28T18:03:48.000Z (over 1 year ago)
- Last Synced: 2025-01-28T20:11:57.417Z (9 months ago)
- Topics: deeplabv3, semantic-segmentation, triplet-loss, unet
- Language: Jupyter Notebook
- Homepage:
- Size: 58.5 MB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# cityscapes-segmentation
This repo explores different self-supervised pretext task for semantic segmentation on cityscapes dataset (original resolution 1024x2048).
### Pretext Tasks
Pretext tasks we are implementing(using 15,000 frames of video data from cityscapes dataset):* triplet loss on frame temporal location in `/triplet`
* frame order prediction
* video colorization### Target Tasks
Our target tasks include semantic segmentation and future frame prediction.* For the target task of semantic segmentation, we evaluate a UNet and a Deeplabv3 model on the cityscapes dataset(5000 examples with fine annotations).
* For the task of future frame prediction, we evaluate an encoder-decoder temporal network on the cityscapes dataset.### Code Structure
* `/unet`: code for running UNet for semantic segmentation (work based on [Pytorch-UNet](https://github.com/milesial/Pytorch-UNet))
* `/DeepLabv3`: code for running DeepLabv3 for semantic segmentation (work based on [DeepLabv3.pytorch](https://github.com/chenxi116/DeepLabv3.pytorch))
* `/triplet`: pretext task to generate embeddings for video frames using triplet loss (work based on [uzkent/MMVideoPredictor](https://github.com/uzkent/MMVideoPredictor))
* `/spatioTemporal`: pretext task doing video frame order prediction (work based on [uzkent/MMVideoPredictor](https://github.com/uzkent/MMVideoPredictor))
* `/colorization`: pretext task doing video frame colorization
* `/MMVideoPredictor`: future frame generation using custom temporal network (work based on [uzkent/MMVideoPredictor](https://github.com/uzkent/MMVideoPredictor))### Preliminary Experiments
* UNet in `/unet` trained from scratch
* Deeplabv3 model in `/DeepLabv3` trained from scratch
* Deeplabv3 model finetuned on pretrained imagenet weights
* Deeplabv3 model finetuned on pretrained weights from our pretext tasksModel | Setup | mIoU (acc)
------------- | ------------- | ---
**UNet** (lr=0.001, ReduceLROnPlateau, RMSprop(weight_decay=1e-8, momentum=0.9), CrossEntropyLoss) | downsample 2x, bs=1, 30 epochs | 0.5153 (0.8653)
**UNet**(...) | downsample 4x, bs=8, 30 epochs | 0.4613 (0.85)
**UNet**(...) | downsample 8x, bs=64, 30 epochs | 0.45 (0.85)
**DeepLabv3**(resnet101) | ImagetNet pretrained, 10 epochs | 59.31
**DeepLabv3**(resnet101) | scratch, 10 epochs | 27.24
**DeepLabv3**(resnet101) | scratch, 100 epochs | 49.79