Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/DYZhang09/SAM3D
[SCIS] SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model
https://github.com/DYZhang09/SAM3D
Last synced: 2 months ago
JSON representation
[SCIS] SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model
- Host: GitHub
- URL: https://github.com/DYZhang09/SAM3D
- Owner: DYZhang09
- Created: 2023-06-01T15:06:07.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-01-28T11:24:57.000Z (12 months ago)
- Last Synced: 2024-08-03T01:14:44.658Z (5 months ago)
- Language: Python
- Homepage:
- Size: 11.1 MB
- Stars: 197
- Watchers: 6
- Forks: 11
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-llm-and-aigc - SAM3D - Shot 3D Object Detection via [Segment Anything](https://github.com/facebookresearch/segment-anything) Model". (**[arXiv 2023](https://arxiv.org/abs/2306.02245)**). (Summary)
- awesome-llm-and-aigc - SAM3D - Shot 3D Object Detection via [Segment Anything](https://github.com/facebookresearch/segment-anything) Model". (**[arXiv 2023](https://arxiv.org/abs/2306.02245)**). (Summary)
- awesome-yolo-object-detection - SAM3D - Shot 3D Object Detection via [Segment Anything](https://github.com/facebookresearch/segment-anything) Model". (**[arXiv 2023](https://arxiv.org/abs/2306.02245)**). (Applications)
- Awesome-Segment-Anything - [code
README
# SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model [[Arxiv]](https://arxiv.org/abs/2306.02245)
## Motivation of this project
With the development of large language models, many remarkable linguistic systems like ChatGPT have thrived and achieved astonishing success on many tasks, showing the incredible power of foundation models. In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently and presents strong zero-shot ability on many downstream 2D tasks. However, whether SAM can be adapted to 3D vision tasks is still unknown, especially 3D object detection.## What we do in this project
We explore adapting the zero-shot ability of SAM to 3D object detection in this project, and the project is still in progress.![](./images/pipeline.png)
## Installation
We use `pytorch==1.12.1, cuda==11.3`. We build this project based on [MMDetection3D](https://github.com/open-mmlab/mmdetection3d) (ver. 1.1.0rc3) and [segment-anything](https://github.com/facebookresearch/segment-anything) (commit 6fdee8f).1. install waymo-open-dataset:
```
pip install waymo-open-dataset-tf-2-6-0
```
2. install MMDetection3D:
```
pip install -U openmim
mim install 'mmengine==0.7.2'
mim install 'mmcv==2.0.0'
mim install 'mmdet==3.0.0'git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout 341ff99 # mmdet3d 1.1.0rc3
pip install -v -e .
```
3. install segment-anything:
```
pip install git+https://github.com/facebookresearch/segment-anything.git
```
4. install other dependices:
```
pip install -r requirements.txt
```## Data preparation
Since our project explores the _zero shot_ setting, we do not need to pre-process the training data. We rougly follow the data preparation set up in [MMDetection3D Data Preparation Guide](https://github.com/open-mmlab/mmdetection3d/blob/main/docs/en/user_guides/dataset_prepare.md) but do some minor modifications.1. organize the raw data like:
```
.
└── data
└── waymo
└── waymo_format
└── validation
└── *.tfrecord
└── gt.bin (optional)
```
2. run command:
```
CUDA_VISIBLE_DEVICES=-1 python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 128 --extra-tag waymo
```
Note: Since evaluation on waymo dataset needs the [ground truth bin](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0/validation/ground_truth_objects) file for validation set, you need to put the `.bin` file into `data/waymo/waymo_format`. If you do not have the access to it, you can add `--gen-gt-bin` argument to the above command:
```
CUDA_VISIBLE_DEVICES=-1 python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 128 --extra-tag waymo --gen-gt-bin
```
this will automatically generate `gt.bin` file (may different from the official version in some respects) into `data/waymo/waymo_format`.
3. after the pre-processing, the data folder will be organized as:
```
.
└── data
└── waymo
├── kitti_format
│ ├── ImageSets
│ ├── training
│ └── waymo_infos_val.pkl
└── waymo_format
├── gt.bin
└── validation
```
### Partial validation set preparation
Because it's time-consuming to evaluate on the whole waymo validation set, we modify the `create_data.py` to support pre-processing partial validation set. You can put any number of `*.tfrecord` into `data/waymo/waymo_format/validation/` and run command above, it will automatically generate the `ImageSets/val.txt` and corresponding `gt.bin`.## Inference
### Pre-trained weights
We use the pre-trained SAM in our project, so go to [segment-anything model checkpoints](https://github.com/facebookresearch/segment-anything#model-checkpoints) to download weights and put them into `projects/pretrain_weights`.### Zero-shot inference
1. generate the fake weights for loading (only a trick to run the `test.py` with a fake weights, and only need to run once).
```
python projects/generate_fake_pth.py
```
2. run the command to inference and evaluate the method:
```
python tools/test.py projects/configs/sam3d_intensity_bev_waymo_car.py fake.pth
```### Results
- Quantitative results:
Tested on single NVIDIA GeForce RTX 4090 with `pytorch==1.12.1, cuda==11.3`, [log](./logs/20230517_101842.log)| Metric | mAP | mAPH |
| ------ | ------- | ------- |
| RANGE_TYPE_VEHICLE_[0, 30)_LEVEL_1| 19.51 | 13.30 |
|RANGE_TYPE_VEHICLE_[0, 30)_LEVEL_2| 19.05 | 12.98 |- Qualitative results:
![](images/paper_vis.png)## What's next
Although our method is only an __initial attempt__, we believe it shows the great possibility and opportunity to unleash the potential of foundation models like SAM on 3D vision tasks, especially on 3D object detection. With technologies like __few-shot learning__ and __prompt engineering__, we can take advantage of vision foundation models more effectively to better solve 3D tasks, especially considering the vast difference between scales of 2D and 3D data.## Citation
```
@article{zhang2023sam3d,
title={SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model},
author={Zhang, Dingyuan and Liang, Dingkang and Yang, Hongcheng and Zou, Zhikang and Ye, Xiaoqing and Liu, Zhe and Bai, Xiang},
journal={Science China Information Sciences},
year={2023}
}
```## Acknowledgement
- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d)
- [segment-anything](https://github.com/facebookresearch/segment-anything)
- [OCR-SAM](https://github.com/yeungchenwa/OCR-SAM)