https://github.com/youquanl/Segment-Any-Point-Cloud

[NeurIPS'23 Spotlight] Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
https://github.com/youquanl/Segment-Any-Point-Cloud
Last synced: 7 months ago
JSON representation
[NeurIPS'23 Spotlight] Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
Host: GitHub
URL: https://github.com/youquanl/Segment-Any-Point-Cloud
Owner: youquanl
Created: 2023-05-25T11:45:50.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-12-16T13:34:26.000Z (almost 2 years ago)
Last Synced: 2024-10-28T05:12:39.517Z (about 1 year ago)
Language: Python
Homepage: https://ldkong.com/Seal
Size: 52.1 MB
Stars: 563
Watchers: 26
Forks: 25
Open Issues: 9
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

awesome-segment-anything-extensions - Repo
Awesome-Segment-Anything - [code
README

          
English | 简体中文




  

  

  
Segment Any Point Cloud Sequences by Distilling Vision Foundation Models


  


    Youquan Liu^1,*   

    Lingdong Kong^1,2,*   

    Jun Cen³   

    Runnan Chen⁴   

    Wenwei Zhang^1,5


    Liang Pan⁵   

    Kai Chen¹   

    Ziwei Liu⁵

    


    ¹Shanghai AI Laboratory   

    ²National University of Singapore   

    ³The Hong Kong University of Science and Technology   

    ⁴The University of Hong Kong   

    ⁵S-Lab, Nanyang Technological University

  




  

    

  

  

  

    

  

  

  

    

  

  

  

    

  

  

  

    

  



# Seal :seal:

`Seal` is a versatile self-supervised learning framework capable of segmenting *any* automotive point clouds by leveraging off-the-shelf knowledge from vision foundation models (VFMs) and encouraging spatial and temporal consistency from such knowledge during the representation learning stage.



  



### :sparkles: Highlight

- :rocket: **Scalability:** `Seal` directly distills the knowledge from VFMs into point clouds, eliminating the need for annotations in either 2D or 3D during pretraining.

- :balance_scale: **Consistency:** `Seal` enforces the spatial and temporal relationships at both the camera-to-LiDAR and point-to-segment stages, facilitating cross-modal representation learning.

- :rainbow: **Generalizability:** `Seal` enables knowledge transfer in an off-the-shelf manner to downstream tasks involving diverse point clouds, including those from real/synthetic, low/high-resolution, large/small-scale, and clean/corrupted datasets.

### :oncoming_automobile: 2D-3D Correspondence



  



### :movie_camera: Video Demo

| Demo 1 | Demo 2| Demo 3|

| :-: | :-: | :-: |

|  |  |  | 

| [Link](https://youtu.be/S0q2-nQdwSs) ^{:arrow_heading_up:} | [Link](https://youtu.be/yoon3uiRnY8) ^{:arrow_heading_up:} | [Link]() ^{:arrow_heading_up:} |

## Updates

- \[2023.12\] - We are hosting [The RoboDrive Challenge](https://robodrive-24.github.io/) at [ICRA 2024](https://2024.ieee-icra.org/). :blue_car:

- \[2023.09\] - `Seal` was selected as a :sparkles: spotlight :sparkles: at [NeurIPS 2023](https://neurips.cc/).

- \[2023.09\] - `Seal` was accepted to [NeurIPS 2023](https://neurips.cc/)! :tada:

- \[2023.07\] - We release the [code](docs/document/SUPERPOINT.md) for generating semantic superpixel & superpoint by [SLIC](https://scikit-image.org/docs/stable/api/skimage.segmentation.html#skimage.segmentation.slic), [SAM](https://github.com/facebookresearch/segment-anything), and [SEEM](https://scikit-image.org/docs/stable/api/skimage.segmentation.html#skimage.segmentation.slic). More VFMs coming on the way!

- \[2023.06\] - Our paper is available on arXiv, click [here](https://arxiv.org/abs/2306.09347) to check it out. Code will be available later!

## Outline

- [Installation](#installation)

- [Data Preparation](#data-preparation)

- [Superpoint Generation](#superpoint-generation)

- [Getting Started](#getting-started)

- [Main Result](#main-result)

- [TODO List](#todo-list)

- [License](#license)

- [Acknowledgement](#acknowledgement)

- [Citation](#citation)

## Installation

Please refer to [INSTALL.md](docs/document/INSTALL.md) for the installation details.

## Data Preparation

| [**nuScenes**](https://www.nuscenes.org/nuscenes) | [**SemanticKITTI**](http://semantic-kitti.org/) | [**Waymo Open**](https://waymo.com/open) | [**ScribbleKITTI**](https://github.com/ouenal/scribblekitti) |

| :-: | :-: | :-: | :-: |

|  |  |  |  | 

| [**RELLIS-3D**](http://www.unmannedlab.org/research/RELLIS-3D) | [**SemanticPOSS**](http://www.poss.pku.edu.cn/semanticposs.html) | [**SemanticSTF**](https://github.com/xiaoaoran/SemanticSTF) | [**DAPS-3D**](https://github.com/subake/DAPS3D) |

|  |  |  |  | 

| [**SynLiDAR**](https://github.com/xiaoaoran/SynLiDAR) | [**Synth4D**](https://github.com/saltoricristiano/gipso-sfouda) | [**nuScenes-C**](https://github.com/ldkong1205/Robo3D) |

|  |  |  |

Please refer to [DATA_PREPARE.md](docs/document/DATA_PREPARE.md) for the details to prepare these datasets.

## Superpoint Generation

| Raw Point Cloud | Semantic Superpoint | Groundtruth |

| :-: | :-: | :-: |

|  |  |  | 

| |

|  |  |  |

| |

|  |  |  |

| |

|  |  |  |

Kindly refer to [SUPERPOINT.md](docs/document/SUPERPOINT.md) for the details to generate the semantic superpixels & superpoints with vision foundation models.

## Getting Started

Kindly refer to [GET_STARTED.md](docs/document/GET_STARTED.md) to learn more usage of this codebase.

## Main Result

### :unicorn: Framework Overview

|  |

| :-: |

| Overview of the **Seal :seal:** framework. We generate, for each {LiDAR, camera} pair at timestamp t and another LiDAR frame at timestamp t + n, the semantic superpixel and superpoint by VFMs. Two pertaining objectives are then formed, including *spatial contrastive learning* between paired LiDAR and camera features and *temporal consistency regularization* between segments at different timestamps. |

### :car: Cosine Similarity

|  |

| :-: |

| The cosine similarity between a query point (red dot) and the feature learned with SLIC and different VFMs in our **Seal :seal:** framework. The queried semantic classes from top to bottom examples are: “car”, “manmade”, and “truck”. The color goes from violet to yellow denoting low and high similarity scores, respectively. |

### :blue_car: Benchmark

   

      Method

      nuScenes

      KITTI

      Waymo

      Synth4D

   

   

      LP 1% 5% 10% 25% Full

      1% 1% 1%

   

   

      Random

      8.10 30.30 47.84 56.15 65.48 74.66

      39.50 39.41 20.22

   

   

      PointContrast

      21.90 32.50 - - - -

      41.10 - -

   

   

      DepthContrast

      22.10 31.70 - - - -

      41.50 - -

   

   

      PPKT

      35.90 37.80 53.74 60.25 67.14 74.52

      44.00 47.60 61.10

   

   

      SLidR

      38.80 38.30 52.49 59.84 66.91 74.79

      44.60 47.12 63.10

   

   

      ST-SLidR

      40.48 40.75 54.69 60.75 67.70 75.14

      44.72 44.93 -

   

   

      Seal :seal:

      44.95 45.84 55.64 62.97 68.41 75.60

      46.63 49.34 64.50

   

### :bus: Linear Probing

|  |

| :-: |

| The qualitative results of our **Seal :seal:** framework pretrained on nuScenes (without using groundtruth labels) and linear probed with a frozen backbone and a linear classification head. To highlight the differences, the correct / incorrect predictions are painted in gray / red, respectively. |

### :articulated_lorry: Downstream Generalization

   

      Method

      ScribbleKITTI

      RELLIS-3D

      SemanticPOSS

      SemanticSTF

      SynLiDAR

      DAPS-3D

   

   

      1% 10% 1% 10% Half Full

      Half Full 1% 10% Half Full

   

   

     Random

     23.81 47.60 38.46 53.60 46.26 54.12 48.03 48.15 19.89 44.74 74.32 79.38

   

   

     PPKT

     36.50 51.67 49.71 54.33 50.18 56.00 50.92 54.69 37.57 46.48 78.90 84.00

   

   

     SLidR

     39.60 50.45 49.75 54.57 51.56 55.36 52.01 54.35 42.05 47.84 81.00 85.40

   

   

      Seal :seal:

      40.64 52.77 51.09 55.03 53.26 56.89 53.46 55.36 43.58 49.26 81.88 85.90

   

### :truck: Robustness Probing

| Init | Backbone | mCE | mRR | Fog | Wet | Snow | Motion | Beam | Cross | Echo | Sensor |

| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: 

| Random | [PolarNet](https://github.com/edwardzhou130/PolarSeg) | 115.09 | 76.34 | 58.23 | 69.91 | 64.82 | 44.60 | 61.91 | 40.77 | 53.64 | 42.01 |

| Random | [CENet](https://github.com/huixiancheng/CENet) | 112.79 | 76.04 | 67.01 | 69.87 | 61.64 | 58.31 | 49.97 | 60.89 | 53.31 | 24.78 |

| Random | [WaffleIron](https://github.com/valeoai/WaffleIron) | 106.73 | 72.78 | 56.07 | 73.93 | 49.59 | 59.46 | 65.19 | 33.12 | 61.51 | 44.01 |

| Random | [Cylinder3D](https://github.com/xinge008/Cylinder3D) | 105.56 | 78.08 | 61.42 | 71.02 | 58.40 | 56.02 | 64.15 | 45.36 | 59.97 | 43.03 |

| Random | [SPVCNN](https://github.com/mit-han-lab/spvnas) | 106.65 | 74.70 | 59.01 | 72.46 | 41.08 | 58.36 | 65.36 | 36.83 | 62.29 | 49.21 |

| Random | [MinkUNet](https://github.com/NVIDIA/MinkowskiEngine) | 112.20 | 72.57 | 62.96 | 70.65 | 55.48 | 51.71 | 62.01 | 31.56 | 59.64 | 39.41 |

| PPKT | [MinkUNet](https://github.com/NVIDIA/MinkowskiEngine) | 105.64 | 76.06 | 64.01 | 72.18 | 59.08 | 57.17 | 63.88 | 36.34 | 60.59 | 39.57 |

| SLidR | [MinkUNet](https://github.com/NVIDIA/MinkowskiEngine) | 106.08 | 75.99 | 65.41 | 72.31 | 56.01 | 56.07 | 62.87 | 41.94 | 61.16 | 38.90 |

| **Seal :seal:** | [MinkUNet](https://github.com/NVIDIA/MinkowskiEngine) | 92.63 | 83.08 | 72.66 | 74.31 | 66.22 | 66.14 | 65.96 | 57.44 | 59.87 | 39.85 |

### :tractor: Qualitative Assessment

|  |

| :-: |

| The qualitative results of **Seal :seal:** and prior methods pretrained on nuScenes (without using groundtruth labels) and fine-tuned with 1% labeled data. To highlight the differences, the correct / incorrect predictions are painted in gray / red, respectively. |

## TODO List

- [x] Initial release. :rocket:

- [x] Add license. See [here](#license) for more details.

- [x] Add video demos :movie_camera:

- [x] Add installation details.

- [x] Add data preparation details.

- [x] Support semantic superpixel generation.

- [x] Support semantic superpoint generation.

- [ ] Add evaluation details.

- [ ] Add training details.

## Citation

If you find this work helpful, please kindly consider citing our paper:

```bibtex

@inproceedings{liu2023segment,

  title = {Segment Any Point Cloud Sequences by Distilling Vision Foundation Models},

  author = {Liu, Youquan and Kong, Lingdong and Cen, Jun and Chen, Runnan and Zhang, Wenwei and Pan, Liang and Chen, Kai and Liu, Ziwei},

  booktitle = {Advances in Neural Information Processing Systems}, 

  year = {2023},

}

```

```bibtex

@misc{liu2023segment_any_point_cloud,

  title = {The Segment Any Point Cloud Codebase},

  author = {Liu, Youquan and Kong, Lingdong and Cen, Jun and Chen, Runnan and Zhang, Wenwei and Pan, Liang and Chen, Kai and Liu, Ziwei},

  howpublished = {\url{https://github.com/youquanl/Segment-Any-Point-Cloud}},

  year = {2023},

}

```

## License






This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

## Acknowledgement

This work is developed based on the [MMDetection3D](https://github.com/open-mmlab/mmdetection3d) codebase.

>


> MMDetection3D is an open-source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D detection. It is a part of the OpenMMLab project developed by MMLab.

Part of this codebase has been adapted from [SLidR](https://github.com/valeoai/SLidR), [Segment Anything](https://github.com/facebookresearch/segment-anything), [X-Decoder](https://github.com/microsoft/X-Decoder), [OpenSeeD](https://github.com/IDEA-Research/OpenSeeD), [Segment Everything Everywhere All at Once](https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once), [LaserMix](https://github.com/ldkong1205/LaserMix), and [Robo3D](https://github.com/ldkong1205/Robo3D).

:heart: We thank the exceptional contributions from the above open-source repositories!