https://github.com/OpenShapeLab/ShapeGPT
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model, a unified and user-friendly shape-language model
https://github.com/OpenShapeLab/ShapeGPT
3d-generation caption-generation chatgpt gpt language-model multi-modal shape unified
Last synced: 11 months ago
JSON representation
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model, a unified and user-friendly shape-language model
- Host: GitHub
- URL: https://github.com/OpenShapeLab/ShapeGPT
- Owner: OpenShapeLab
- Created: 2023-11-30T05:30:48.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-01T12:48:58.000Z (about 2 years ago)
- Last Synced: 2024-08-01T03:31:42.801Z (over 1 year ago)
- Topics: 3d-generation, caption-generation, chatgpt, gpt, language-model, multi-modal, shape, unified
- Homepage: https://shapegpt.github.io
- Size: 1.2 MB
- Stars: 86
- Watchers: 16
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Official repo for ShapeGPT
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
Project Page •
Arxiv Paper •
Demo •
FAQ •
Citation
https://github.com/OpenShapeLab/ShapeGPT/assets/91652696/47cb697b-4778-4046-9e8e-0eafa54d0270
##
Intro ShapeGPT
ShapeGPT is a **unified** and **user-friendly** shape-centric multi-modal language model to establish a multi-modal corpus and develop shape-aware language models on **multiple shape tasks**.
Technical details
The advent of large language models, enabling flexibility through instruction-driven approaches, has revolutionized many traditional generative tasks, but large models for 3D data, particularly in comprehensively handling 3D shapes with other modalities, are still under-explored. By achieving instruction-based shape generations, versatile multimodal generative shape models can significantly benefit various fields like 3D virtual construction and network-aided design. In this work, we present ShapeGPT, a shape-included multi-modal framework to leverage strong pre-trained language models to address multiple shape-relevant tasks. Specifically, ShapeGPT employs a word-sentence-paragraph framework to discretize continuous shapes into shape words, further assembles these words for shape sentences, as well as integrates shape with instructional text for multi-modal paragraphs. To learn this shape-language model, we use a three-stage training scheme, including shape representation, multimodal alignment, and instruction-based generation, to align shape-language codebooks and learn the intricate correlations among these modalities. Extensive experiments demonstrate that ShapeGPT achieves comparable performance across shape-relevant tasks, including text-to-shape, shape-to-text, shape completion, and shape editing.

## 🚩 News
- [2023/12/01] Upload paper and init project 🔥🔥🔥
## ⚡ Quick Start
## ▶️ Demo
## 👀 Visualization
## ⚠️ FAQ
Question-and-Answer
## 📖 Citation
If you find our code or paper helps, please consider citing:
```bibtex
@misc{yin2023shapegpt,
title={ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model},
author={Fukun Yin and Xin Chen and Chi Zhang and Biao Jiang and Zibo Zhao and Jiayuan Fan and Gang Yu and Taihao Li and Tao Chen},
year={2023},
eprint={2311.17618},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
## Acknowledgments
Thanks to [T5 model](https://github.com/google-research/text-to-text-transfer-transformer), [Motion-GPT](https://github.com/OpenMotionLab/MotionGPT), [Perceiver-IO](https://github.com/krasserm/perceiver-io) and [SDFusion](https://yccyenchicheng.github.io/SDFusion/), our code is partially borrowing from them. Our approach is inspired by [Unified-IO](https://unified-io.allenai.org/), [Michelangelo](https://neuralcarver.github.io/michelangelo/), [ShapeCrafter](https://ivl.cs.brown.edu/research/shapecrafter.html), [Pix2Vox](https://github.com/hzxie/Pix2Vox), and [3DShape2VecSet](https://github.com/1zb/3DShape2VecSet).
## License
This code is distributed under an [MIT LICENSE](LICENSE).
Note that our code depends on other libraries, including [PyTorch3D](https://pytorch3d.org/) and [PyTorch Lightning](https://lightning.ai/), and uses datasets which each have their own respective licenses that must also be followed.