https://github.com/zju3dv/STDLoc

[CVPR2025] From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting
https://github.com/zju3dv/STDLoc

Last synced: 16 days ago
JSON representation

[CVPR2025] From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting

Host: GitHub
URL: https://github.com/zju3dv/STDLoc
Owner: zju3dv
License: mit
Created: 2025-03-13T05:56:54.000Z (about 1 month ago)
Default Branch: main
Last Pushed: 2025-03-30T05:30:39.000Z (21 days ago)
Last Synced: 2025-03-30T06:23:13.463Z (21 days ago)
Language: Python
Size: 5.62 MB
Stars: 12
Watchers: 8
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-and-novel-works-in-slam - [Code

README

        






From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting

  

    Zhiwei Huang^1,2 

    Hailin Yu²^† 

    Yichun Shentu² 

    Jin Yuan² 

    Guofeng Zhang¹^† 

    


    ¹State Key Lab of CAD & CG, Zhejiang University ²SenseTime Research

    


    ^† Corresponding Authors

    


     CVPR2025 

  




  

    

   

  

    

   

  

    

  

  

    

  



## 🏠 About



    



This paper presents a novel camera relocalization method, STDLoc, which leverages Feature GS as scene representation. STDLoc is a full relocalization pipeline that can achieve accurate relocalization without relying on any pose prior. Unlike previous coarse-to-fine localization methods that require image retrieval first and then feature matching, we propose a novel sparse-to-dense localization paradigm. Based on this scene representation, we introduce a novel matching-oriented Gaussian sampling strategy and a scene-specific detector to achieve efficient and robust initial pose estimation. Furthermore, based on the initial localization results, we align the query feature map to the Gaussian feature field by dense feature matching to enable accurate localization. The experiments on indoor and outdoor datasets show that STDLoc outperforms current state-of-the-art localization methods in terms of localization accuracy and recall.

## 🔍 Performance

The code in this repository has a better performance than our paper, through some adjustments:

1. Set `align_corners=False` in interpolation.

2. Use a smaller learning rate for ourdoor dataset.

3. Use the anti-aliasing feature of gsplat.

### 7-Scenes

| Method         | Chess     | Fire      | Heads     | Office    | Pumpkin   | RedKitchen | Stairs    | Avg.↓[cm/◦] |

| -------------- | --------- | --------- | --------- | --------- | --------- | ---------- | --------- | ----------- |

| STDLoc (paper) | 0.46/0.15 | 0.57/0.24 | 0.45/0.26 | 0.86/0.24 | 0.93/0.21 | 0.63/0.19  | 1.42/0.41 | 0.76/0.24   |

| STDLoc (repo)  | 0.43/0.13 | 0.49/0.20 | 0.41/0.24 | 0.72/0.21 | 0.91/0.23 | 0.59/0.14  | 1.19/0.36 | 0.67/0.22   |

### Cambridge Landmarks

| Method        | Court     | King’s    | Hospital  | Shop     | St. Mary’s | Avg.↓[cm/◦] |

| -------------- | --------- | --------- | --------- | -------- | ---------- | ----------- |

| STDLoc (paper) | 15.7/0.06 | 15.0/0.17 | 11.9/0.21 | 3.0/0.13 | 4.7/0.14   | 10.1/0.14   |

| STDLoc (repo)  | 11.5/0.06 | 14.7/0.15 | 11.3/0.21 | 2.5/0.12 | 3.5/0.12   | 8.7/0.13    |

## 📦 Training and Evaluation

### Environment Setup

1. Clone this repository.

```bash

git clone --recursive https://github.com/zju3dv/STDLoc.git

```

2. Install packages

```bash

conda create -n stdloc python=3.8 -y

pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124

pip install -r requirements.txt

# install submodules

pip install submodules/simple-knn

pip install submodules/gsplat

```

### Data Preparation

We use two public datasets:

- [Microsoft 7-Scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/)

- [Cambridge Landmarks](https://www.repository.cam.ac.uk/handle/1810/251342/)

#### 7-Scenes Dataset

1. Download images following HLoc.

```bash

export dataset=datasets/7scenes

for scene in chess fire heads office pumpkin redkitchen stairs; \

do wget http://download.microsoft.com/download/2/8/5/28564B23-0828-408F-8631-23B1EFF1DAC8/$scene.zip -P $dataset \

&& unzip $dataset/$scene.zip -d $dataset && unzip $dataset/$scene/'*.zip' -d $dataset/$scene; done

```

2. Download full reconstructions

   from [visloc_pseudo_gt_limitations](https://github.com/tsattler/visloc_pseudo_gt_limitations/tree/main?tab=readme-ov-file#full-reconstructions):

```bash

pip install gdown

gdown 1ATijcGCgK84NKB4Mho4_T-P7x8LSL80m -O $dataset/7scenes_reference_models.zip

unzip $dataset/7scenes_reference_models.zip -d $dataset

# move sfm_gt to each dataset

for scene in chess fire heads office pumpkin redkitchen stairs; \

do mkdir -p $dataset/$scene/sparse && cp -r $dataset/7scenes_reference_models/$scene/sfm_gt $dataset/$scene/sparse/0 ; done

```

#### Cambridge Landmarks Dataset

1. Download images from PoseNet's project page:

```bash

export dataset=datasets/cambridge

export scenes=( "KingsCollege" "OldHospital" "StMarysChurch" "ShopFacade" "GreatCourt" )

export IDs=( "251342" "251340" "251294" "251336" "251291" )

for i in "${!scenes[@]}"; do

wget https://www.repository.cam.ac.uk/bitstream/handle/1810/${IDs[i]}/${scenes[i]}.zip -P $dataset \

&& unzip $dataset/${scenes[i]}.zip -d $dataset ; done

```

2. Install Mask2Former:

```bash

cd submodules/Mask2Former

pip install -r requirements.txt

python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

cd mask2former/modeling/pixel_decoder/ops

sh make.sh

cd ../../../..

# download model

wget https://dl.fbaipublicfiles.com/maskformer/mask2former/coco/panoptic/maskformer2_swin_large_IN21k_384_bs16_100ep/model_final_f07440.pkl

cd ../..

```

3. Preprocess data:

```bash

bash scripts/dataset_preprocess.sh

```

### Training Feature Gaussian

```bash

# For 7-Scenes:

bash scripts/train_7scenes.sh

# For Cambridge Landmarks

bash scripts/train_cambridge.sh

```

### Evaluation

We also provide pretrained models for [7-Scenes](https://drive.google.com/file/d/1gxmmpYD-XjYT01cu0flfNHf0CuJDIXsh/view) and [Cambridge Landmarks](https://drive.google.com/file/d/1EbKx9NY2cgtIxkQQ7Spjpl90PG68GKgu/view) datasets here.

```bash

gdown 1gxmmpYD-XjYT01cu0flfNHf0CuJDIXsh

gdown 1EbKx9NY2cgtIxkQQ7Spjpl90PG68GKgu

unzip map_7scenes.zip

unzip map_cambridge.zip

```

Reproduce the experimental results.

```bash

# For 7-Scenes:

bash scripts/evaluate_7scenes.sh

# For Cambridge Landmarks

bash scripts/evaluate_cambridge.sh

```

## 🔗 Citation

```bibtex

@inproceedings{huang2025stdloc,

  title={From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from {Feature Gaussian Splatting}},

  author={Huang, Zhiwei and Yu, Hailin and Shentu, Yichun and Yuan, Jin and Zhang, Guofeng},

  booktitle={CVPR},

  pages={},

  year={2025}

}

```

## 👏 Acknowledgements

- [Feature 3DGS](https://github.com/ShijieZhou-UCLA/feature-3dgs): Our codebase is built upon Feature 3DGS.

- [gsplat](https://github.com/nerfstudio-project/gsplat): We use gsplat as our rasterization backend.