https://github.com/tum-vision/scenedino
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion (ICCV 2025)
https://github.com/tum-vision/scenedino
3d-reconstruction 3d-scene-understanding 3d-semantic-segmentation occupancy-prediction segmentation semantic-scene-completion single-image-reconstruction unsupervised-learning unsupervised-scene-understanding unsupervised-segmentation
Last synced: 11 months ago
JSON representation
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion (ICCV 2025)
- Host: GitHub
- URL: https://github.com/tum-vision/scenedino
- Owner: tum-vision
- License: apache-2.0
- Created: 2025-05-26T12:24:44.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-07-09T01:09:58.000Z (11 months ago)
- Last Synced: 2025-07-09T02:26:46.012Z (11 months ago)
- Topics: 3d-reconstruction, 3d-scene-understanding, 3d-semantic-segmentation, occupancy-prediction, segmentation, semantic-scene-completion, single-image-reconstruction, unsupervised-learning, unsupervised-scene-understanding, unsupervised-segmentation
- Language: Python
- Homepage: https://visinf.github.io/scenedino
- Size: 28.5 MB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion
[**Aleksandar Jevtić**](https://jev-aleks.github.io/)* 1
[**Christoph Reich**](https://christophreich1996.github.io/)* 1,2,4,5
[**Felix Wimbauer**](https://fwmb.github.io/)1,4
[**Oliver Hahn**](https://olvrhhn.github.io/)2
[**Christian Rupprecht**](https://chrirupp.github.io/)3
[**Stefan Roth**](https://www.visinf.tu-darmstadt.de/visual_inference/people_vi/stefan_roth.en.jsp)2,5,6
[**Daniel Cremers**](https://cvg.cit.tum.de/members/cremers/)1,4,5
1TU Munich 2TU Darmstadt 3University of Oxford 4MCML 5ELIZA 6hessian.AI *equal contribution
ICCV 2025
[](https://pytorch.org/)

**TL;DR:** SceneDINO is unsupervised and infers 3D geometry and features from a single image in a feed-forward manner. Distilling and clustering SceneDINO's 3D feature field results in unsupervised semantic scene completion predictions. SceneDINO is trained using multi-view self-supervision.
## Abstract
Semantic scene completion (SSC) aims to infer both the 3D geometry and semantics of a scene from single images. In contrast to prior work on SSC that heavily relies on expensive ground-truth annotations, we approach SSC in an unsupervised setting. Our novel method, SceneDINO, adapts techniques from self-supervised representation learning and 2D unsupervised scene understanding to SSC. Our training exclusively utilizes multi-view consistency self-supervision without any form of semantic or geometric ground truth. Given a single input image, SceneDINO infers the 3D geometry and expressive 3D DINO features in a feed-forward manner. Through a novel 3D feature distillation approach, we obtain unsupervised 3D semantics. In both 3D and 2D unsupervised scene understanding, SceneDINO reaches state-of-the-art segmentation accuracy. Linear probing our 3D features matches the segmentation accuracy of a current supervised SSC approach. Additionally, we showcase the domain generalization and multi-view consistency of SceneDINO, taking the first steps towards a strong foundation for single image 3D scene understanding.
## News
- `09/07/2025`: [ArXiv](https://arxiv.org/abs/2507.06230) preprint and code released. 🚀
## Setup (Installation & Datasets)
### Python Environment
Our Python environment is managed with **Conda**.
```shell
conda env create -f environment.yml
conda activate scenedino
```
### Datasets
We provide configuration files for the datasets SceneDINO is trained and evaluated on. Adjust these files and, most importantly, insert the data paths you use.
```bash
configs/dataset/kitti_360_sscbench.yaml
configs/dataset/cityscapes_seg.yaml
configs/dataset/bdd_seg.yaml
configs/dataset/realestate10k.yaml
```
#### KITTI-360
To download KITTI-360, create and account and follow the instructions on the [official website](https://www.cvlibs.net/datasets/kitti-360/index.php). We require the perspective images, fisheye images, raw velodyne scans, calibrations, and vehicle poses.
### Checkpoints
Our pre-trained checkpoints are stored in the CVG webshare. Download one of the checkpoints using the dedicated script. To replicate our results using ORB-SLAM3, we provide the obtained poses in `datasets/kitti_360/orb_slam_poses`.
```bash
# Download best model trained on KITTI-360 (SSCBench split)
python download_checkpoint.py ssc-kitti-360-dino
python download_checkpoint.py ssc-kitti-360-dino-orb-slam
python download_checkpoint.py ssc-kitti-360-dinov2
```
**Table 1. SSCBench-KITTI-360 results.** We compare SceneDINO to the STEGO + S4C baseline in unsupervised SSC using the mean intersection over union score (mIoU) in %.
Method
Checkpoint
mIoU
12.8m
25.6m
51.2m
Baseline
-
10.53
9.26
6.60
SceneDINO
ssc-kitti-360-dino
10.76
10.01
8.00
SceneDINO (ORB-SLAM3 poses)
ssc-kitti-360-dino-orb-slam
10.88
9.86
7.88
SceneDINO (DINOv2)
ssc-kitti-360-dinov2
13.76
11.78
9.08
## Inference Demo Script
This simple demo script demonstrates loading a model and performing inference in 3D and rendered 2D. It can be used as a starting point to experiment with SceneDINO feature fields.
```bash
python demo_script.py -h
# First image of kitti-360 test set
python demo_script.py --ckpt
# Custom image
python demo_script.py --ckpt --image
```
## Training
For unsupervised SSC, training is performed in two stages. We provide training configurations in ```configs/``` for each of them.
**SceneDINO**
First, the 3D feature fields of SceneDINO are trained.
```bash
python train.py -cn train_scenedino_kitti_360
```
**Unsupervised SSC**
Based on a SceneDINO checkpoint, we train the unsupervised SSC head.
```bash
python train.py -cn train_semantic_kitti_360
```
**Logging**
We use TensorBoard to keep track of losses, metrics, and qualitative results.
```bash
tensorboard --port 8000 --logdir out/
```
## Evaluation
We further provide configurations to reproduce the evaluation results from the paper.
**Unsupervised 2D Segmentation**
```bash
# Unsupervised 2D Segmentation
python eval.py -cn evaluate_semantic_kitti_360
```
**Unsupervised SSC**
```bash
# Unsupervised SSC, adapted from S4C (https://github.com/ahayler/s4c)
python evaluate_model_sscbench.py -ssc -vgt -cp .pt -f -m scenedino -p
```
## Citation
If you find our work useful, please consider citing our paper.
```
@inproceedings{Jevtic:2025:SceneDINO,
author = {Aleksandar Jevti{\'c} and
Christoph Reich and
Felix Wimbauer and
Oliver Hahn and
Christian Rupprecht and
Stefan Roth and
Daniel Cremers},
title = {Feed-Forward {SceneDINO} for Unsupervised Semantic Scene Completion},
journal = {IEEE/CVF International Conference on Computer Vision (ICCV)},
year = {2025},
}
```
## Acknowledgements
This repository is based on the [Behind The Scenes (BTS)](https://github.com/Brummi/BehindTheScenes) code base.