Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/DYZhang09/SAM3D

SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model
https://github.com/DYZhang09/SAM3D

Last synced: 14 days ago
JSON representation

SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model

Host: GitHub
URL: https://github.com/DYZhang09/SAM3D
Owner: DYZhang09
Created: 2023-06-01T15:06:07.000Z (about 1 year ago)
Default Branch: master
Last Pushed: 2024-01-28T11:24:57.000Z (5 months ago)
Last Synced: 2024-02-28T03:38:51.557Z (4 months ago)
Language: Python
Homepage:
Size: 11.1 MB
Stars: 181
Watchers: 7
Forks: 10
Open Issues: 1
Metadata Files:
- Readme: README.md

Lists

awesome-llm-and-aigc - SAM3D - Shot 3D Object Detection via [Segment Anything](https://github.com/facebookresearch/segment-anything) Model". (**[arXiv 2023](https://arxiv.org/abs/2306.02245)**). (Summary)
awesome-yolo-object-detection - SAM3D - Shot 3D Object Detection via [Segment Anything](https://github.com/facebookresearch/segment-anything) Model". (**[arXiv 2023](https://arxiv.org/abs/2306.02245)**). (Applications)
awesome-stars - DYZhang09/SAM3D - [SCIS] SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model (Python)

README

# SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model [[Arxiv]](https://arxiv.org/abs/2306.02245)

## Motivation of this project
With the development of large language models, many remarkable linguistic systems like ChatGPT have thrived and achieved astonishing success on many tasks, showing the incredible power of foundation models. In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently and presents strong zero-shot ability on many downstream 2D tasks. However, whether SAM can be adapted to 3D vision tasks is still unknown, especially 3D object detection.

## What we do in this project
We explore adapting the zero-shot ability of SAM to 3D object detection in this project, and the project is still in progress.

![](./images/pipeline.png)

## Installation
We use `pytorch==1.12.1, cuda==11.3`. We build this project based on [MMDetection3D](https://github.com/open-mmlab/mmdetection3d) (ver. 1.1.0rc3) and [segment-anything](https://github.com/facebookresearch/segment-anything) (commit 6fdee8f).

1. install waymo-open-dataset:
```
pip install waymo-open-dataset-tf-2-6-0
```
2. install MMDetection3D:
```
pip install -U openmim
mim install 'mmengine==0.7.2'
mim install 'mmcv==2.0.0'
mim install 'mmdet==3.0.0'

git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout 341ff99 # mmdet3d 1.1.0rc3
pip install -v -e .
```
3. install segment-anything:
```
pip install git+https://github.com/facebookresearch/segment-anything.git
```
4. install other dependices:
```
pip install -r requirements.txt
```

## Data preparation
Since our project explores the _zero shot_ setting, we do not need to pre-process the training data. We rougly follow the data preparation set up in [MMDetection3D Data Preparation Guide](https://github.com/open-mmlab/mmdetection3d/blob/main/docs/en/user_guides/dataset_prepare.md) but do some minor modifications.

1. organize the raw data like:
```
.
└── data
└── waymo
└── waymo_format
└── validation
└── *.tfrecord
└── gt.bin (optional)
```
2. run command:
```
CUDA_VISIBLE_DEVICES=-1 python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 128 --extra-tag waymo
```
Note: Since evaluation on waymo dataset needs the [ground truth bin](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0/validation/ground_truth_objects) file for validation set, you need to put the `.bin` file into `data/waymo/waymo_format`. If you do not have the access to it, you can add `--gen-gt-bin` argument to the above command:
```
CUDA_VISIBLE_DEVICES=-1 python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 128 --extra-tag waymo --gen-gt-bin
```
this will automatically generate `gt.bin` file (may different from the official version in some respects) into `data/waymo/waymo_format`.
3. after the pre-processing, the data folder will be organized as:
```
.
└── data
└── waymo
├── kitti_format
│ ├── ImageSets
│ ├── training
│ └── waymo_infos_val.pkl
└── waymo_format
├── gt.bin
└── validation
```
### Partial validation set preparation
Because it's time-consuming to evaluate on the whole waymo validation set, we modify the `create_data.py` to support pre-processing partial validation set. You can put any number of `*.tfrecord` into `data/waymo/waymo_format/validation/` and run command above, it will automatically generate the `ImageSets/val.txt` and corresponding `gt.bin`.

## Inference
### Pre-trained weights
We use the pre-trained SAM in our project, so go to [segment-anything model checkpoints](https://github.com/facebookresearch/segment-anything#model-checkpoints) to download weights and put them into `projects/pretrain_weights`.

### Zero-shot inference
1. generate the fake weights for loading (only a trick to run the `test.py` with a fake weights, and only need to run once).
```
python projects/generate_fake_pth.py
```
2. run the command to inference and evaluate the method:
```
python tools/test.py projects/configs/sam3d_intensity_bev_waymo_car.py fake.pth
```

### Results
- Quantitative results:
Tested on single NVIDIA GeForce RTX 4090 with `pytorch==1.12.1, cuda==11.3`, [log](./logs/20230517_101842.log)

| Metric | mAP | mAPH |
| ------ | ------- | ------- |
| RANGE_TYPE_VEHICLE_[0, 30)_LEVEL_1| 19.51 | 13.30 |
|RANGE_TYPE_VEHICLE_[0, 30)_LEVEL_2| 19.05 | 12.98 |

- Qualitative results:
![](images/paper_vis.png)

## What's next
Although our method is only an __initial attempt__, we believe it shows the great possibility and opportunity to unleash the potential of foundation models like SAM on 3D vision tasks, especially on 3D object detection. With technologies like __few-shot learning__ and __prompt engineering__, we can take advantage of vision foundation models more effectively to better solve 3D tasks, especially considering the vast difference between scales of 2D and 3D data.

## Citation
```
@article{zhang2023sam3d,
title={SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model},
author={Zhang, Dingyuan and Liang, Dingkang and Yang, Hongcheng and Zou, Zhikang and Ye, Xiaoqing and Liu, Zhe and Bai, Xiang},
journal={Science China Information Sciences},
year={2023}
}
```

## Acknowledgement
- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d)
- [segment-anything](https://github.com/facebookresearch/segment-anything)
- [OCR-SAM](https://github.com/yeungchenwa/OCR-SAM)