https://github.com/shreydan/monocular-depth-estimation
monocular depth estimation using UNet-style architecture trained on NYUv2 depth dataset
https://github.com/shreydan/monocular-depth-estimation
Last synced: 6 months ago
JSON representation
monocular depth estimation using UNet-style architecture trained on NYUv2 depth dataset
- Host: GitHub
- URL: https://github.com/shreydan/monocular-depth-estimation
- Owner: shreydan
- Created: 2023-09-02T10:15:32.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-09-02T10:32:34.000Z (over 2 years ago)
- Last Synced: 2024-03-15T14:16:24.149Z (almost 2 years ago)
- Language: Jupyter Notebook
- Homepage: https://www.kaggle.com/code/shreydan/monocular-depth-estimation-nyuv2
- Size: 5.98 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Monocular Depth Estimation
- Kaggle notebook: [shreydan/monocular-depth-estimation-nyuv2](https://www.kaggle.com/code/shreydan/monocular-depth-estimation-nyuv2)
- Model details:
- encoder model: resnext50 (Imagenet pretrained)
- decoder model: UNet++ ([PyTorch Segmentation Models](https://github.com/qubvel/segmentation_models.pytorch))
- dataset: [NYUv2 depth](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html)
- metrics:
- Structural Similarity Index Measure: SSIM is used for measuring the similarity between two images.
- Mean Squared Error
- training parameters:
- AdamW optim
- lr: 1e-3 scheduled with OneCycleLR
- mixed-precision training with fp16
## Metrics
| epoch | loss_train | loss_val | ssim_train | ssim_val | mse_train | mse_val |
|---|---|---|---|---|---|---|
| 0 | 0.095343 | 0.009658 | 0.575013 | 0.769732 | 0.095367 | 0.009678 |
| 1 | 0.010186 | 0.005739 | 0.841523 | 0.867149 | 0.010186 | 0.005754 |
| 2 | 0.010407 | 0.004536 | 0.872432 | 0.888866 | 0.010409 | 0.004553 |
| 3 | 0.006833 | 0.003201 | 0.897906 | 0.903832 | 0.006834 | 0.003213 |
| 4 | 0.005041 | 0.0028 | 0.91009 | 0.90911 | 0.005041 | 0.002806 |
## Results on test dataset:
- brighter the area, farther away it is in the image. brighter => more depth
- predicted depth maps are grayscale, a color gradient map is applied for visual purposes only.
