Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/invictus717/pointlanguage

Last synced: 3 days ago
JSON representation

Host: GitHub
URL: https://github.com/invictus717/pointlanguage
Owner: invictus717
License: apache-2.0
Created: 2023-07-31T07:16:35.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2023-07-31T07:20:17.000Z (over 1 year ago)
Last Synced: 2023-07-31T08:29:24.506Z (over 1 year ago)
Size: 1.21 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

Yiyuan Zhang^1,2*
Kaixiong Gong^1,2*
Wanli Ouyang²
Xiangyu Yue^1,†

¹
Multimedia Lab, The Chinese University of Hong Kong

² OpenGVLab，Shanghai AI Laboratory

^* Equal Contribution
^† Corresponding Author

-----------------

## Point as A Foreign Language, Let Large Language Models (LLMs) Perceive 3D Physical World as Reading Articles!

## 🌟 News
* **2023.7.31:** Github Repository Initialization. The paper will be released very soon.

## Motivation
We propose to utilize pretrained language models for point cloud understanding. Differernt from existing methods leveraging image as intermediate, we found that language models can read point clouds as a foreign language. Benefit from pretraining on the large-scalle corpus, language models performs better in long-tailed and out-of-distibution tasks in 3D vision area.

### A Brief Summary

- 💡 **For multimodal research**, our method explores the **underlying representation relationship between different modalities**, specifically, language and 3D point cloud, and demonstrates that models pretrained on natural language can read 3D point clouds.
- 💡 **For 3D vision research**, our method performs **end-to-end point cloud understanding without hand-crafted structure designs**. And it also demonstrates the feasibility of using **natural corpus text as pretraining data for 3D vision**.
- 💡 **For the vision-language area**, our method experimentally validates that **3D point clouds and text can be encoded by the same parameters**. A new promising direction appears for the tasks involving modality alignment between text and point clouds.
- 💡 With outstanding performance across benchmarks including ModelNet-40, S3DIS, and ShapeNetPart, our method demonstrates its effectiveness on both coarse-grained and fine-grained 3D point cloud tasks.

# 🕙 ToDo
- [ ] Support Billion-scale Large Language Models.
- [ ] Large Language Model with More Modalities.
- [ ] Support Outdoor LiDAR Scenes.

# ✉️ Contact
If you are interested in this project, welcome to contribute to our project!

To contact us, you can send an email to `[email protected]` ,`[email protected]`, or `[email protected]`!

# License
This project is released under the [Apache 2.0 license](LICENSE).
# Acknowledgement
This code is developed based on an excellent open-sourced project [OpenPoints](https://github.com/guochengqian/openpoints).