https://github.com/zh-plus/awesome-interactive-embodiedai

Last synced: 4 months ago
JSON representation
Host: GitHub
URL: https://github.com/zh-plus/awesome-interactive-embodiedai
Owner: zh-plus
Created: 2024-12-10T09:41:15.000Z (7 months ago)
Default Branch: main
Last Pushed: 2025-03-04T08:56:52.000Z (4 months ago)
Last Synced: 2025-03-04T09:34:29.840Z (4 months ago)
Size: 86 MB
Stars: 5
Watchers: 1
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        ### 2D semantic segmentation

1. [Text4Seg: Reimagining Image Segmentation as Text Generation](https://mc-lan.github.io/Text4Seg/)

   ![text4seg](assets/text4seg.png)

### 3D semantic segmentation

1. SegmentAnything3D

   - [Pointcept/SegmentAnything3D: [ICCV'23 Workshop\] SAM3D: Segment Anything in 3D Scenes](https://github.com/Pointcept/SegmentAnything3D)

   

2. PvT

   - [Pointcept/Pointcept: Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT (CVPR'24), OA-CNNs (CVPR'24), MSC (CVPR'23)](https://github.com/Pointcept/Pointcept)

   

3. ODIN

   - [ayushjain1144/odin: Code for the paper: "ODIN: A Single Model for 2D and 3D Segmentation" (CVPR 2024)](https://github.com/ayushjain1144/odin)

   

4. [Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels](https://segment3d.github.io/)

6. [Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering](https://drprojects.github.io/supercluster)

   - https://github.com/drprojects/superpoint_transformer

     ![SPT](assets/SPT.png)

   

6. [Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation](https://arxiv.org/abs/2406.02548) [ICLR 2025 Oral]

   - [ICLR 2025 (Oral 📢) \] Our OpenYOLO3D model achieves state-of-the-art performance in Open Vocabulary 3D Instance Segmentation on ScanNet200 and Replica datasets with up ∼16x speedup compared to the best existing method in literature.](https://github.com/aminebdj/OpenYOLO3D?tab=readme-ov-file)

     ![openyolo3d](assets/openyolo3d.png)

7. [EmbodiedSAM: Online Segment Any 3D Thing in Real Time](https://xuxw98.github.io/ESAM/) [ICLR 2025 Oral]

   - [ICLR 2025, Oral\] EmbodiedSAM: Online Segment Any 3D Thing in Real Time](https://github.com/xuxw98/ESAM)

     ![esam](assets/esam.png)

8. [Search3D: Hierarchical Open-Vocabulary 3D Segmentation](https://arxiv.org/pdf/2409.18431) [Arxiv 2025.1]

   ![search3D](assets/search3D.png)

9. [OpenNeRF Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views](https://arxiv.org/pdf/2404.03650) [ICLR 2024]

   -  [ICLR 2024\] OpenSet 3D Neural Scene Segmentation with Pixel-wise Features and Rendered Novel Views](https://github.com/opennerf/opennerf)

10. [Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant](https://arxiv.org/pdf/2408.10652) [Arxiv 2024.8]

   ![Vocab-free](assets/Vocab-free.png)

11. [SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Instance Segmentation](https://github.com/GAP-LAB-CUHK-SZ/SAMPro3D) [3DV 2025]

    ![SAMPro3D](assets/SAMPro3D.jpg)

12. [PLA: Language-Driven Open-Vocabulary 3D Scene Understanding](https://dingry.github.io/projects/PLA) [CVPR 2023]

    ![PLA](assets/PLA.png)

13. [RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding](https://jihanyang.github.io/projects/RegionPLC) [CVPR 2024]

    ![RegionPLC](assets/RegionPLC.png)

14. [AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation](https://ywyue.github.io/AGILE3D/) [ICLR 2024]

    ![AGILE3D](assets/AGILE3D.gif)

### 3D caption

1. [TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes](https://arxiv.org/pdf/2403.19589) [ECCV 2024]

   - [ECCV 2024\] TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes](https://github.com/jxbbb/TOD3Cap)

     ![todcap](assets/todcap.jpg)

### 2D part segmentation

1. VLPart: [ICCV2023\] VLPart: Going Denser with Open-Vocabulary Part Segmentation](https://github.com/facebookresearch/VLPart)

   ![image-20241210170813139](assets/VLPart.png)

2. Semantic-SAM: [Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"](https://github.com/UX-Decoder/Semantic-SAM) [ECCV 2024\] 

   ![image-20241210171245036](assets/Semantic-SAM.png)

3. Part-CLIPseg: [Official PyTorch Implementation of PartCLIPSeg](https://github.com/kaist-cvml/part-clipseg) [NIPS 2024]

   ![part_clipseg](assets/part_clipseg.png)

### 3D part segmentation

1. [3x2: 3D Object Part Segmentation by 2D Semantic Correspondences](https://rehg.org/publication/pub40/) [ECCV 2024]

   ![3x3](assets/3x3.png)

2. [PartSTAD: 2D-to-3D Part Segmentation Task Adaptation](https://partstad.github.io/) [ECCV 2024]

   ![PartSTAD](assets/PartSTAD.png)

3. [PartSLIP](https://colin97.github.io/PartSLIP_page/) [ECCV 2023]

   ![part-slip](assets/part-slip.png)

4. [SAMPart3D](https://yhyang-myron.github.io/SAMPart3D-website/) [Arxiv 2024.10]

   ![SAMPart3D](assets/SAMPart3D.png)

### 3D Physical Attributes Annotation

1. [NeRF2Physics: Physical Property Understanding from Language-Embedded Feature Fields](https://ajzhai.github.io/NeRF2Physics/) [CVPR 2024]

   ![NeRF2Plysics](assets/NeRF2Plysics.png)

2. [PUGS: Zero-shot Physical Understanding with Gaussian Splatting](https://evernorif.github.io/PUGS/) [ICRA 2025]

   ![PUGS](assets/PUGS.png)

### Embodied interactive/interactable segmentation

1. [FANG-Xiaolin/uncos](https://github.com/FANG-Xiaolin/uncos)

   ![](assets/uncos.png)

2. [Functionality understanding and segmentation in 3D scenes](https://jcorsetti.github.io/fun3du/)

   ![Fun3DU](assets/Fun3DU.png)

3. [SceneFun3D](https://scenefun3d.github.io/) [CVPR 2024 Oral]

   ![sceneFun3D](assets/sceneFun3D.jpg)

### 3D Affordance Detection

1. [3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds](https://arxiv.org/pdf/2502.20041) [ICLR 2025]

   ![3DALLM](assets/3DALLM.png)

### 3D Intention Grounding (Detection)

1. [Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention](https://github.com/WeitaiKang/Intent3D) [ICLR2025]

### Embodied Interactable 3D Generation

1. [PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI](https://physcene.github.io/)

   ![teaser_compress](assets/teaser_compress.png)

### Others

### Survey

1. [When LLMs step into the 3D World A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models](https://arxiv.org/pdf/2405.10255) [2024.5]

### 3D Feature Extraction

1. [Duoduo CLIP: Efficient 3D Understanding with Multi-View Images](https://github.com/3dlg-hcvc/DuoduoCLIP) [ICLR 2025]

#### Navigation

1. [Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation](https://hovsg.github.io/)

   ![teaser-icra](assets/teaser-icra.png)

### Depth Estimation

1. [Depth Anything V2](https://depth-anything-v2.github.io/) [NIPS 2024]

   - [DepthAnything/Depth-Anything-V2: [NeurIPS 2024\] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation](https://github.com/DepthAnything/Depth-Anything-V2)

2. [Video Depth Anything](https://videodepthanything.github.io/) [Arxiv 2025.1]

   - [DepthAnything/Video-Depth-Anything: Video Depth Anything: Consistent Depth Estimation for Super-Long Videos](https://github.com/DepthAnything/Video-Depth-Anything)

3. [Prompt Depth Anything](https://promptda.github.io/) [Arxiv 2024.12]

   - [DepthAnything/PromptDA: Prompt Depth Anything](https://github.com/DepthAnything/PromptDA)

   - [rerun-io/prompt-da: PromptDepthAnything example](https://github.com/rerun-io/prompt-da)

     ![promptda-github-demo](assets/promptda-github-demo.gif)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zh-plus/awesome-interactive-embodiedai

Awesome Lists containing this project

README