Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://xuxw98.github.io/ESAM/
EmbodiedSAM: Online Segment Any 3D Thing in Real Time
https://xuxw98.github.io/ESAM/
3d-instance-segmentation 3d-scene-understanding embodied-vision real-time segment-anything semi-supervised streaming-video
Last synced: 2 months ago
JSON representation
EmbodiedSAM: Online Segment Any 3D Thing in Real Time
- Host: GitHub
- URL: https://xuxw98.github.io/ESAM/
- Owner: xuxw98
- Created: 2023-12-04T07:34:11.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-01T08:19:34.000Z (3 months ago)
- Last Synced: 2024-11-19T09:23:48.662Z (3 months ago)
- Topics: 3d-instance-segmentation, 3d-scene-understanding, embodied-vision, real-time, segment-anything, semi-supervised, streaming-video
- Language: Python
- Homepage: https://xuxw98.github.io/ESAM/
- Size: 59.8 MB
- Stars: 223
- Watchers: 4
- Forks: 14
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- Awesome-Segment-Anything - [code
README
# EmbodiedSAM: Online Segment Any 3D Thing in Real Time
### [Paper](https://arxiv.org/abs/2408.11811) | [Project Page](https://xuxw98.github.io/ESAM/) | [Video](https://cloud.tsinghua.edu.cn/f/f75279f89bf64720b8ec/?dl=1)> EmbodiedSAM: Online Segment Any 3D Thing in Real Time
> [Xiuwei Xu](https://xuxw98.github.io/), Huangxing Chen, [Linqing Zhao](https://scholar.google.com/citations?user=ypxt5UEAAAAJ&hl=zh-CN&oi=ao), [Ziwei Wang](https://ziweiwangthu.github.io/), [Jie Zhou](https://scholar.google.com/citations?user=6a79aPwAAAAJ&hl=en&authuser=1), [Jiwen Lu](http://ivg.au.tsinghua.edu.cn/Jiwen_Lu/)In this work, we presented ESAM, an efficient framework that leverages vision foundation models for online, real-time, fine-grained, generalized and open-vocabulary 3D instance segmentation.
## News
- [2024/10/01]: Demo code available at [here](https://github.com/xuxw98/ESAM/blob/main/docs/demo.md).
- [2024/8/27]: Fix some bugs.
- [2024/8/22]: Code and demo released.## Demo
### Bedroom:
![demo](./assets/demo2.gif)### Office:
![demo](./assets/demo1.gif)Demos are a little bit large; please wait a moment to load them. Welcome to the home page for more complete demos and detailed introductions.
## Method
Method Pipeline:
![overview](./assets/pipeline.png)## Getting Started
For environment setup and dataset preparation, please follow:
* [Installation](./docs/installation.md)
* [Dataset Preparation](./docs/dataset_preparation.md)For training and evaluation, please follow:
* [Train and Evaluation](./docs/run.md)
For visualization demo, please follow:
* [Visualization Demo](./docs/demo.md)## Main Results
We provide the checkpoints for quick reproduction of the results reported in the paper.**Class-agnostic 3D instance segmentation results on ScanNet200 dataset:**
| Method | Type | VFM | AP | AP@50 | AP@25 | Speed(ms) | Downloads |
|:--------:|:-------:|:-----------:|:----:|:-----:|:-----:|:---------:|:---------:|
| [SAMPro3D](https://github.com/GAP-LAB-CUHK-SZ/SAMPro3D) | Offline | [SAM](https://github.com/facebookresearch/segment-anything) | 18.0 | 32.8 | 56.1 | -- | -- |
| [SAI3D](https://github.com/yd-yin/SAI3D) | Offline | [SemanticSAM](https://github.com/UX-Decoder/Semantic-SAM) | 30.8 | 50.5 | 70.6 | -- | -- |
| [SAM3D](https://github.com/Pointcept/SegmentAnything3D) | Online | SAM | 20.6 | 35.7 | 55.5 | 1369+1518 | -- |
| ESAM | Online | SAM | 42.2 | 63.7 | 79.6 | 1369+**80** | [model](https://cloud.tsinghua.edu.cn/f/426d6eb693ff4b1fa04b/?dl=1) |
| ESAM-E | Online | [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM) | **43.4** | **65.4** | **80.9** | **20**+**80** | [model](https://cloud.tsinghua.edu.cn/f/7578d7e3d6764f6a93ee/?dl=1) |**Dataset transfer results from ScanNet200 to SceneNN and 3RScan:**
Method
Type
ScanNet200-->SceneNN
ScanNet200-->3RScan
AP
AP@50
AP@25
AP
AP@50
AP@25
SAMPro3D
Offline
12.6
25.8
53.2
3.9
8.0
21.0
SAI3D
Offline
18.6
34.7
65.7
5.4
11.8
27.4
SAM3D
Online
15.1
30.0
51.8
6.2
13.0
33.9
ESAM
Online
28.8
52.2
69.3
14.1
31.2
59.6
ESAM-E
Online
28.6
50.4
71.0
13.9
29.4
58.8
**3D instance segmentation results on ScanNet dataset:**
Method
Type
ScanNet
SceneNN
FPS
Download
AP
AP@50
AP@25
AP
AP@50
AP@25
TD3D
offline
46.2
71.1
81.3
--
--
--
--
--
Oneformer3D
offline
59.3
78.8
86.7
--
--
--
--
--
INS-Conv
online
--
57.4
--
--
--
--
--
--
TD3D-MA
online
39.0
60.5
71.3
26.0
42.8
59.2
3.5
--
ESAM-E
online
41.6
60.1
75.6
27.5
48.7
64.6
10
model
ESAM-E+FF
online
42.6
61.9
77.1
33.3
53.6
62.5
9.8
model
**Open-Vocabulary 3D instance segmentation results on ScanNet200 dataset:**
| Method | AP | AP@50 | AP@25 |
|:------:|:----:|:-----:|:-----:|
| SAI3D | 9.6 | 14.7 | 19.0 |
| ESAM | **13.7** | **19.2** | **23.9** |## TODO List
- [x] Release code and checkpoints.
- [x] Release the demo code to directly run ESAM on streaming RGB-D video.## Contributors
Both students below contributed equally and the order is determined by random draw.
- [Xiuwei Xu](https://xuxw98.github.io/)
- Huangxing ChenBoth advised by [Jiwen Lu](https://ivg.au.tsinghua.edu.cn/Jiwen_Lu/).
## Acknowledgement
We thank a lot for the flexible codebase of [Oneformer3D](https://github.com/oneformer3d/oneformer3d) and [Online3D](https://github.com/xuxw98/Online3D), as well as the valuable datasets provided by [ScanNet](https://github.com/ScanNet/ScanNet), [SceneNN](https://github.com/hkust-vgd/scenenn) and [3RScan](https://github.com/WaldJohannaU/3RScan).## Citation
```
@article{xu2024esam,
title={EmbodiedSAM: Online Segment Any 3D Thing in Real Time},
author={Xiuwei Xu and Huangxing Chen and Linqing Zhao and Ziwei Wang and Jie Zhou and Jiwen Lu},
journal={arXiv preprint arXiv:2408.11811},
year={2024}
}
```