Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Eaphan/OLIVINE
Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models (NeurIPS2024)
https://github.com/Eaphan/OLIVINE
3d-representation-learning 3d-scene-understanding contrastive-distillation nuscenes self-supervised-learning
Last synced: 2 months ago
JSON representation
Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models (NeurIPS2024)
- Host: GitHub
- URL: https://github.com/Eaphan/OLIVINE
- Owner: Eaphan
- License: apache-2.0
- Created: 2024-05-23T07:51:05.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-11-22T03:40:39.000Z (2 months ago)
- Last Synced: 2024-11-22T04:24:25.039Z (2 months ago)
- Topics: 3d-representation-learning, 3d-scene-understanding, contrastive-distillation, nuscenes, self-supervised-learning
- Language: Python
- Homepage: https://arxiv.org/abs/2405.14271
- Size: 812 KB
- Stars: 25
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-Segment-Anything - [code
README
# Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models (NeurIPS2024)
Official PyTorch implementation of the method **OLIVINE**. More details can be found in the paper:
**Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models**, NeurIPS2024 [[arXiv](https://arxiv.org/abs/2405.14271)] by Yifan Zhang and Junhui Hou.
![Poster](./assets/poster.png)
## Dependencies
Please install the required required packages. Some libraries used in this project, including MinkowskiEngine and Pytorch-lightning are known to have a different behavior when using a different version; please use the exact versions specified in `requirements.txt`.
## Datasets
The code provided is compatible with [nuScenes](https://www.nuscenes.org/lidar-segmentation) and [semantic KITTI](http://www.semantic-kitti.org/tasks.html#semseg). Put the datasets you intend to use in the "datasets" folder (a symbolic link is accepted).
```
datasets/
├── nuscenes
├── camseg (semantic labels infered by Grounded-SAM)
├── lidarseg (decompress nuScenes-lidarseg-all-v1.0.tar)
├── maps
├── samples
├── sweeps
├── v1.0-mini
├── v1.0-test
├── v1.0-trainval
└── zip_files
└── semantic_kitti
├── dataset
├── poses
└── sequences
└── other datasets...
```## Reproducing the results
### Predict the weak semantic labels (required)
First we use the SEEM to obtain weak semantic labels of RGB images. If you do not want to execute the following steps by yourself, you can also obtain the labels by directly downloading the files we provide in [Baidu netdisk](https://pan.baidu.com/s/1QI6_3k30NS945pbbSHQr2Q?pwd=zed9) or [Google Drive](https://drive.google.com/file/d/1FGow9PFoH11M-g_eUg-7ZGZdT_8pUt4j/view?usp=drive_link).
1. Install necessary libraries in demo_code/requirements.txt
2. Link nuScenes dataset to demo_code/data/sets. Command: ln -s datasets/nuscenes demo_code/data/sets/
3. Go to demo_code directory and run the script ``` bash semantic_label_generation.sh ```
4. Organize the generated files and put them in data/nuscenes/camseg.### Pre-training a 3D backbone
To launch a pre-training of the Minkowski SR-UNet (minkunet) on nuScenes:
```python pretrain.py --cfg config/olivine_minkunet.yaml```
You can alternatively replace minkunet with voxelnet to pre-train a PV-RCNN backbone.
Weights of the pre-training can be found in the output folder, and can be re-used during a downstream task.
If you wish to use multiple GPUs, please scale the learning rate and batch size accordingly.**TIPs:** The pre-trained weights in the final epoch of pre-training may not always be the best; it's worth considering saving the weights from other rounds, such as the 40th epoch.
### Semantic segmentation
To launch a semantic segmentation, use the following command:
```python downstream.py --cfg_file="config/semseg_nuscenes.yaml" --pretraining_path="output/pretrain/[...]/model.pt"```
with the previously obtained weights, and any config file. The default config will perform a finetuning on 1% of nuScenes' training set, with the learning rates optimized for the provided pre-training.
To re-evaluate the score of any downstream network, run:
```python evaluate.py --resume_path="output/downstream/[...]/model.pt" --dataset="nuscenes"```
If you wish to reevaluate the linear probing, the experiments in the paper were obtained with `lr=0.05`, `lr_head=null` and `freeze_layers=True`.
### Object detection
All experiments for object detection have been done using [OpenPCDet](https://github.com/open-mmlab/OpenPCDet).
## Published results
All results are obtained with weights pre-trained on nuScenes.### Few-shot semantic segmentation
#### Results on the validation set using Minkowski SR-Unet:
Method |nuScenes
lin. probing|nuScenes
Finetuning with 1% data|KITTI
Finetuning with 1% data
--- |:-: |:-: |:-:
Random init. |8.1 |30.3 |39.5
[PointContrast](https://arxiv.org/abs/2007.10985)|21.9 |32.5 |41.1
[DepthContrast](https://arxiv.org/abs/2101.02691)|22.1 |31.7 |41.5
[PPKT](https://arxiv.org/abs/2104.04687) |36.4 |37.8 |43.9
[SLidR](https://arxiv.org/abs/2203.16258) |38.8 |38.3 |44.6
OLIVINE |**50.0** |**50.5** |**49.3**### Semantic Segmentation on nuScenes
#### Results on the validation set using Minkowski SR-Unet with a fraction of the training labels:
Method |1% |5% |10% |25% |100%
--- |:-: |:-: |:-: |:-: |:-:
Random init. |30.3 |47.7 |56.6 |64.8 |74.2
SLidR | 39.0 | 52.2 | 58.8 | 66.2 | 74.6
OLIVINE |**50.6**|**60.2**|**65.0**|**70.1**|**76.5**### Object detection on KITTI
All results are obtained with a pre-training on nuScenes.
#### Results on the validation set using [PV-RCNN](https://arxiv.org/abs/1912.13192):
Method |Car |Pedestrian|Cyclist |mAP@40
--- |:-: |:-: |:-: |:-:
Random init. |84.5 |57.9 |71.3 |71.3
[STRL](https://arxiv.org/abs/2109.00179)*|84.7 |57.8 |71.9 |71.5
[PPKT](https://arxiv.org/abs/2104.04687) |83.2 |55.5 |73.8 |70.8
[SLidR](https://arxiv.org/abs/2203.16258)|84.4 |57.3 |74.2 |71.9
OLIVINE |84.8 |59.3 |74.2 |**72.8***STRL has been pre-trained on KITTI, while SLidR and PPKT were pre-trained on nuScenes
#### Results on the validation set using [SECOND](https://www.mdpi.com/1424-8220/18/10/3337):
Method |Car |Pedestrian|Cyclist |mAP@40
--- |:-: |:-: |:-: |:-:
Random init. |81.5 |50.9 |66.5 |66.3
[DeepCluster](https://arxiv.org/abs/1807.05520)*| | | |66.1
[SLidR](https://arxiv.org/abs/2203.16258) |81.9 |51.6 |68.5 |67.3
OLIVINE |82.0 |53.2 |69.8 |**68.3***As reimplemented in [ONCE](https://arxiv.org/abs/2106.11037)
## Acknowledgment
We implement the method based on [SLidR](https://github.com/valeoai/SLidR).
Part of the codebase has been adapted from [PointContrast](https://github.com/facebookresearch/PointContrast).
Computation of the lovasz loss used in semantic segmentation follows the code of [PolarNet](https://github.com/edwardzhou130/PolarSeg).## License
OLIVINE is released under the [Apache 2.0 license](./LICENSE).## Citation
If you find OLIVINE useful in your research, please consider citing:
```
@inproceedings{zhang2024fine,
title={Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models},
author={Zhang, Yifan and Hou, Junhui},
booktitle={Advances in Neural Information Processing Systems},
year={2024}
}
```