Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ZiyuGuo99/SAM2Point
The Most Faithful Implementation of Segment Anything (SAM) in 3D
https://github.com/ZiyuGuo99/SAM2Point
Last synced: 2 months ago
JSON representation
The Most Faithful Implementation of Segment Anything (SAM) in 3D
- Host: GitHub
- URL: https://github.com/ZiyuGuo99/SAM2Point
- Owner: ZiyuGuo99
- License: apache-2.0
- Created: 2024-08-25T09:59:17.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-09-11T22:07:15.000Z (5 months ago)
- Last Synced: 2024-11-19T23:45:14.849Z (3 months ago)
- Language: Python
- Homepage: https://sam2point.github.io/
- Size: 36.4 MB
- Stars: 273
- Watchers: 10
- Forks: 14
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-Segment-Anything - [code
README
# SAM2Point 🔥: Segment Any 3D as Videos
Official repository for the project "[SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners](https://github.com/ZiyuGuo99/SAM2Point/blob/main/SAM2Point.pdf)".
[[🌐 Webpage](https://sam2point.github.io/)] [[🤗 HuggingFace Demo](https://huggingface.co/spaces/ZiyuG/SAM2Point)] [[📖 arXiv Report](https://arxiv.org/pdf/2408.16768)]
## 💥 News
- **[2024.08.30]** We release the [paper](https://arxiv.org/pdf/2408.16768), [demo](https://huggingface.co/spaces/ZiyuG/SAM2Point), and [code](https://github.com/ZiyuGuo99/SAM2Point) of SAM2Point 🚀## 👀 About SAM2Point
We introduce **SAM2Point**, a preliminary exploration adapting Segment Anything Model 2 (SAM 2) for zero-shot and promptable 3D segmentation. Our framework supports various prompt types, including ***3D points, boxes, and masks***, and can generalize across diverse scenarios, such as ***3D objects, indoor scenes, outdoor scenes, and raw LiDAR***.
![]()
To our best knowledge, SAM2POINT presents ***the most faithful implementation of SAM in 3D***, demonstrating superior implementation efficiency, promptable flexibility, and generalization capabilities for 3D segmentation.
![]()
## 🎬 Multi-directional Videos from SAM2Point
We showcase the multi-directional videos generated during the segmentation of SAM2Point:
### 3D Object
![]()
![]()
![]()
### 3D Indoor Scene
![]()
![]()
![]()
### 3D Outdoor Scene
![]()
![]()
![]()
### 3D Raw LiDAR
![]()
![]()
![]()
## 💪 Get Started
### InstallationClone the repository:
```bash
git clone https://github.com/ZiyuGuo99/SAM2Point.git
cd SAM2Point
```Create a conda environment:
```bash
conda create -n sam2point python=3.10
conda activate sam2point
```
SAM2Point requires Python >= 3.10, PyTorch >= 2.3.1, and TorchVision >= 0.18.1. Please follow the instructions [here](https://pytorch.org/get-started/locally/) to install both PyTorch and TorchVision dependencies.Install additional dependencies:
```bash
pip install -r requirements.txt
```### Prepare SAM 2 and 3D Data Samples
Download the checkpoint of SAM 2:
```bash
cd checkpoints
bash download_ckpts.sh
cd ..
```We provide 3D data samples from different datasets for testing SAM2Point:
```bash
gdown --id 1hIyjBCd2lsLnP_GYw-AMkxJnvNtyxBYq
unzip data.zip
```Alternatively, you can download the samples directly from [this link](https://drive.google.com/file/d/1hIyjBCd2lsLnP_GYw-AMkxJnvNtyxBYq/view?usp=sharing).
***Code for custom 3D input and prompts will be released soon.***### Start Segmentation
Modify `DATASET`, `SAMPLE_IDX`, `PROPMT_TYPE`, `PROMPT_IDX` in `run.sh` to specify the 3D input and prompt.Run the segmentation script:
```bash
bash run.sh
```The segmentation results will be saved under `./results/`, and the corresponding multi-directional videos will be saved under `./video/`.
## :white_check_mark: Citation
If you find **SAM2Point** useful for your research or applications, please kindly cite using this BibTeX:
```latex
@article{guo2024sam2point,
title={SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners},
author={Guo, Ziyu and Zhang, Renrui and Zhu, Xiangyang and Tong, Chengzhuo and Gao, Peng and Li, Chunyuan and Heng, Pheng-Ann},
journal={arXiv preprint arXiv:2408.16768},
year={2024}
}
```## 🧠 Related Work
Explore our additional research on **3D**, **SAM**, and **Multi-modal Large Language Models**:
- **[Point-Bind & Point-LLM]** [Multi-modality 3D Understanding, Generation, and Instruction Following](https://github.com/ZiyuGuo99/Point-Bind_Point-LLM)
- **[Personalized SAM]** [Personalize Segment Anything Model with One Shot](https://github.com/ZrrSkywalker/Personalize-SAM)
- **[Point-NN & Point-PN]** [Starting from Non-Parametric Networks for 3D Analysis](https://github.com/ZrrSkywalker/Point-NN)
- **[PointCLIP]** [3D Point Cloud Understanding by CLIP](https://github.com/ZrrSkywalker/PointCLIP)
- **[Any2Point]** [Empowering Any-modality Large Models for 3D](https://github.com/Ivan-Tang-3D/Any2Point)
- **[LLaVA-OneVision]** [Latest Generations of LLaVA Model](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/)
- **[LLaMA-Adapter]** [LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention](https://github.com/OpenGVLab/LLaMA-Adapter)
- **[MathVerse]** [MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?](https://mathverse-cuhk.github.io/)