Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Pointcept/GPT4Point
[CVPR'24 Highlight] GPT4Point: A Unified Framework for Point-Language Understanding and Generation.
https://github.com/Pointcept/GPT4Point
3d-generation llm multimodal-learning
Last synced: 5 days ago
JSON representation
[CVPR'24 Highlight] GPT4Point: A Unified Framework for Point-Language Understanding and Generation.
- Host: GitHub
- URL: https://github.com/Pointcept/GPT4Point
- Owner: Pointcept
- License: mit
- Created: 2023-08-12T10:44:07.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-04-27T03:46:37.000Z (2 months ago)
- Last Synced: 2024-05-06T00:03:27.597Z (2 months ago)
- Topics: 3d-generation, llm, multimodal-learning
- Language: Python
- Homepage: https://gpt4point.github.io/
- Size: 114 MB
- Stars: 253
- Watchers: 23
- Forks: 18
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists
- awesome-stars - Pointcept/GPT4Point - [CVPR'24 Highlight] GPT4Point: A Unified Framework for Point-Language Understanding and Generation. (Python)
README
# [CVPR2024] GPT4Point
: A Unified Framework for Point-Language Understanding and Generation
## 🔥 News
🔥 2024/04/27: We have modified the point encoder section, and now evaluation is more functional, although the training section still needs modification.🔥 2024/04/13: We release the **GPT4Point** **v1.0**, including training and 3D captioning evluation code.
🔥 2024/04/05: Our paper **GPT4Point** is selected as **CVPR'24 Highlight** 2.84% (324/11532) !
🔥 2024/02/27: Our paper **GPT4Point** is accepted by **CVPR'24**!
🔥 2024/01/19: We release the **Objaverse-XL (Point Cloud Format)** Download and Extraction way.
🔥 2023/12/05: The paper [GPT4Point (arxiv)](https://arxiv.org/abs/2312.02980) has been released, we unified the Point-language Understanding and Generation.
🔥 2023/08/13: Two-stage Pre-training code of PointBLIP has been released.
🔥 2023/08/13: Part of datasets used and result files has been uploaded.
## 🏠 Overview
This project presents **GPT4Point**
, a 3D multi-modality model that aligns **3D point clouds** with **language**. More details are shown in [project page](https://gpt4point.github.io/).
- **Unified Framework for Point-language Understanding and Generation.** We present the unified framework for point-language understanding and generation GPT4Point, including the 3D MLLM for point-text tasks and controlled 3D generation.
- **Automated Point-language Dataset Annotation Engine Pyramid-XL.** We introduce the automated point-language dataset annotation engine Pyramid-XL based on Objaverse-XL, currently encompassing 1M pairs of varying levels of coarseness and can be extended cost-effectively.
- **Object-level Point Cloud Benchmark.** Establishing a novel object-level point cloud benchmark with comprehensive evaluation metrics for 3D point cloud language tasks. This benchmark thoroughly assesses models' understanding capabilities and facilitates the evaluation of generated 3D objects.
## 🧭 Version
- **v1.0 (2024/04/13).** We release the training and evaluation (3D captioning) code.
Dataset and text annotation: **Cap3D**.
LLM Model: **OPT 2.7b**## 🔧 Installation
1. (Optional) Creating conda environment
```bash
conda create -n gpt4point python=3.8
conda activate gpt4point
```2. install from [PyPI](https://pypi.org/project/salesforce-lavis/)
```bash
pip install salesforce-lavis
```3. Or, for development, you may build from source
```bash
git clone https://github.com/salesforce/LAVIS.git
cd LAVIS
pip install -e .
```
## 📦 Data Preparation
1. **Annotations**:
All annotations will be downloaded automaticly through hugging_face.2. **Point Cloud**:
You can download the **Cap3D** point cloud dataset through the [Google Drive Link](https://drive.google.com/drive/folders/18uqvjVeEqVIWsZFHxoIXjb1LkZ9ZNTh0?usp=sharing). You should unzip these 10 tar.gz files and then put them together.
and the all folder strucure is:```bash
GPT4Point
├── data
│ ├── cap3d
│ │ ├── points
│ │ │ ├── Cap3D_pcs_8192_xyz_w_color
│ │ │ │ ├── .pkl
│ │ │ │ ├── ...
│ │ │ │ ├── .pkl
│ │ ├── annotations
│ │ │ ├── cap3d_caption_train.json
│ │ │ ├── cap3d_caption_val.json
│ │ │ ├── cap3d_real_and_chatgpt_caption_test.json
│ │ │ ├── cap3d_real_and_chatgpt_caption_test_gt.json (for evaluation)
```## 🚆 Training
1. For stage 1 training:
```bash
python -m torch.distributed.run --master_port=32339 --nproc_per_node=4 train.py --cfg-path lavis/projects/gpt4point/train/pretrain_stage1_cap3d.yaml
```2. For stage 2 training:
```bash
python -m torch.distributed.run --master_port=32339 --nproc_per_node=4 train.py --cfg-path lavis/projects/gpt4point/train/pretrain_stage2_cap3d_opt2.7b.yaml
```## 🏁 Evaluation
```bash
python -m torch.distributed.run --master_port=32239 --nproc_per_node=1 evaluate.py --cfg-path lavis/projects/gpt4point/eval/captioning3d_cap3d_opt2.7b_eval.yaml
```## 📦 Point Dataset and Data Annotation Engine (Optional)
### Objaverse-XL Point Dataset Download Way**Note that you should cd in the Objaverse-xl_Download directory.**
```bash
cd ./Objaverse-xl_Download
```Then please see the folder [Objaverse-xl_Download](./Objaverse-xl_Download) for details.
### Objaverse-XL Point Cloud Data Generation
Please see the [Extract_Pointcloud](./Objaverse-xl_Download/shap-e/) for details.
## 📝 TODO List
Dataset and Data Engine
- [✔] Release the arxiv and the project page.
- [✔] Release the dataset (Objaverse-Xl) Download way.
- [✔] Release the dataset (Objaverse-Xl) rendering (points) way.
- [✔] Release pretrain training code and 3D captioning val code.
- [ ] Release dataset and data annotation engine (Pyramid-XL).
- [ ] Release more evaluation code.
- [ ] Release more trainingn code.
- [ ] Release more models.## 🔗 Citation
If you find our work helpful, please cite:
```bibtex
@inproceedings{GPT4Point,
title={GPT4Point: A Unified Framework for Point-Language Understanding and Generation},
author={Zhangyang Qi and Ye Fang and Zeyi Sun and Xiaoyang Wu and Tong Wu and Jiaqi Wang and Dahua Lin and Hengshuang Zhao},
booktitle={CVPR},
year={2024},
}
```## 📄 License
![]()
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.## 📚 Related Work
Together, Let's make LLM for 3D great!
- [Point-Bind & Point-LLM](https://arxiv.org/abs/2309.00615): It aligns point clouds with Image-Bind to reason multi-modality input without 3D-instruction data training.
- [3D-LLM](https://arxiv.org/abs/2307.12981): employs 2D foundation models to encode multi-view images of 3D point clouds.
- [PointLLM](https://arxiv.org/abs/2308.16911): employs 3D point clouds with LLaVA.