{"id":13441975,"url":"https://github.com/Pointcept/GPT4Point","last_synced_at":"2025-03-20T13:31:47.168Z","repository":{"id":188053257,"uuid":"677699930","full_name":"Pointcept/GPT4Point","owner":"Pointcept","description":"[CVPR'24 Highlight] GPT4Point: A Unified Framework for Point-Language Understanding and Generation.","archived":false,"fork":false,"pushed_at":"2024-04-27T03:46:37.000Z","size":119495,"stargazers_count":291,"open_issues_count":10,"forks_count":19,"subscribers_count":23,"default_branch":"main","last_synced_at":"2024-08-01T03:38:33.614Z","etag":null,"topics":["3d-generation","llm","multimodal-learning"],"latest_commit_sha":null,"homepage":"https://gpt4point.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Pointcept.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-08-12T10:44:07.000Z","updated_at":"2024-08-01T02:08:34.000Z","dependencies_parsed_at":"2023-08-13T14:45:17.337Z","dependency_job_id":"226840b9-069c-4c13-8de5-b4a5b8b9605a","html_url":"https://github.com/Pointcept/GPT4Point","commit_stats":null,"previous_names":["qi-zhangyang/pointblip","pointcept/gpt4point"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Pointcept%2FGPT4Point","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Pointcept%2FGPT4Point/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Pointcept%2FGPT4Point/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Pointcept%2FGPT4Point/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Pointcept","download_url":"https://codeload.github.com/Pointcept/GPT4Point/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221768427,"owners_count":16877638,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-generation","llm","multimodal-learning"],"created_at":"2024-07-31T03:01:40.245Z","updated_at":"2025-03-20T13:31:47.161Z","avatar_url":"https://github.com/Pointcept.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# \u003cspan style=\"color:lightblue\"\u003e[CVPR2024]\u003c/span\u003e GPT4Point\u003ca\u003e \u003cimg src=\"./readme_figs/icon.png\" width=\"30\" /\u003e \u003c/a\u003e: A Unified Framework for Point-Language Understanding and Generation\n\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"http://arxiv.org/abs/2312.02980\" target='_**blank**'\u003e\n    \u003cimg src=\"https://img.shields.io/badge/arXiv paper-2312.02980📖-blue?\"\u003e\n  \u003c/a\u003e \n  \u003ca href=\"https://gpt4point.github.io/\" target='_blank'\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Project-\u0026#x1F680-blue\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://gpt4point.github.io/\" target='_blank'\u003e\n    \u003cimg src=\"https://img.shields.io/badge/version-v1.0-green\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n## 🔥 News\n🔥 2024/04/27: We have modified the point encoder section, and now evaluation is more functional, although the training section still needs modification.\n\n🔥 2024/04/13: We release the **GPT4Point** \u003cspan style=\"color:red\"\u003e**v1.0**\u003c/span\u003e, including training and 3D captioning evluation code.\n\n🔥 2024/04/05:  Our paper **GPT4Point** is selected as **CVPR'24 Highlight** 2.84% (324/11532) !\n\n🔥 2024/02/27:  Our paper **GPT4Point** is accepted by **CVPR'24**!\n\n🔥 2024/01/19:  We release the **Objaverse-XL (Point Cloud Format)** Download and Extraction way.\n\n🔥 2023/12/05:  The paper [GPT4Point (arxiv)](https://arxiv.org/abs/2312.02980) has been released, we unified the Point-language Understanding and Generation.\n\n🔥 2023/08/13:  Two-stage Pre-training code of PointBLIP has been released.\n\n🔥 2023/08/13:  Part of datasets used and result files has been uploaded.\n\n## 🏠 Overview\n\u003cp align=\"center\"\u003e  \u003ca\u003e  \u003cimg src=\"./readme_figs/fig1_teaser.png\"  width=\"1000\" /\u003e \u003c/a\u003e \u003c/p\u003e\n\nThis project presents **GPT4Point**\u003ca\u003e  \u003cimg src=\"./readme_figs/icon.png\"  width=\"20\" /\u003e \u003c/a\u003e, a 3D multi-modality model that aligns **3D point clouds** with **language**. More details are shown in [project page](https://gpt4point.github.io/).\n\n- **Unified Framework for Point-language Understanding and Generation.** We present the unified framework for point-language understanding and generation GPT4Point, including the 3D MLLM for point-text tasks and controlled 3D generation.\n\n- **Automated Point-language Dataset Annotation Engine Pyramid-XL.** We introduce the automated point-language dataset annotation engine Pyramid-XL based on Objaverse-XL, currently encompassing 1M pairs of varying levels of coarseness and can be extended cost-effectively.\n\n- **Object-level Point Cloud Benchmark.** Establishing a novel object-level point cloud benchmark with comprehensive evaluation metrics for 3D point cloud language tasks. This benchmark thoroughly assesses models' understanding capabilities and facilitates the evaluation of generated 3D objects.\n\n## 🧭 Version\n- **v1.0 (2024/04/13).** We release the training and evaluation (3D captioning) code.  \nDataset and text annotation: **Cap3D**.  \nLLM Model: **OPT 2.7b**\n\n\n## 🔧 Installation\n\n1. (Optional) Creating conda environment\n\n```bash\nconda create -n gpt4point python=3.8\nconda activate gpt4point\n```\n\n2. install from [PyPI](https://pypi.org/project/salesforce-lavis/)\n```bash\npip install salesforce-lavis\n```\n\n3. Or, for development, you may build from source\n\n```bash\ngit clone https://github.com/salesforce/LAVIS.git\ncd LAVIS\npip install -e .\n```\n## 📦 Data Preparation\n1. **Annotations**:\nAll annotations will be downloaded automaticly through hugging_face.\n\n2. **Point Cloud**:\nYou can download the **Cap3D** point cloud dataset through the [Google Drive Link](https://drive.google.com/drive/folders/18uqvjVeEqVIWsZFHxoIXjb1LkZ9ZNTh0?usp=sharing). You should unzip these 10 tar.gz files and then put them together.\nand the all folder strucure is:\n\n```bash\nGPT4Point\n├── data\n│   ├── cap3d\n│   │   ├── points\n│   │   │    ├── Cap3D_pcs_8192_xyz_w_color\n│   │   │    │    ├── \u003cpoint cloud id\u003e.pkl\n│   │   │    │    ├── ...\n│   │   │    │    ├── \u003cpoint cloud id\u003e.pkl\n│   │   ├── annotations\n│   │   │    ├── cap3d_caption_train.json\n│   │   │    ├── cap3d_caption_val.json\n│   │   │    ├── cap3d_real_and_chatgpt_caption_test.json\n│   │   │    ├── cap3d_real_and_chatgpt_caption_test_gt.json (for evaluation)\n```\n\n## 🚆 Training\n1. For stage 1 training:\n```bash\npython -m torch.distributed.run --master_port=32339 --nproc_per_node=4 train.py --cfg-path lavis/projects/gpt4point/train/pretrain_stage1_cap3d.yaml\n```\n\n2. For stage 2 training:\n```bash\npython -m torch.distributed.run --master_port=32339 --nproc_per_node=4 train.py --cfg-path lavis/projects/gpt4point/train/pretrain_stage2_cap3d_opt2.7b.yaml\n```\n\n## 🏁 Evaluation\n```bash\npython -m torch.distributed.run --master_port=32239 --nproc_per_node=1 evaluate.py --cfg-path lavis/projects/gpt4point/eval/captioning3d_cap3d_opt2.7b_eval.yaml\n```\n\n\n## 📦 Point Dataset and Data Annotation Engine (Optional)\n### Objaverse-XL Point Dataset Download Way\n\n**Note that you should cd in the Objaverse-xl_Download directory.**\n\n```bash\ncd ./Objaverse-xl_Download\n```\n\nThen please see the folder [Objaverse-xl_Download](./Objaverse-xl_Download) for details.\n\n\n### Objaverse-XL Point Cloud Data Generation\n\nPlease see the [Extract_Pointcloud](./Objaverse-xl_Download/shap-e/) for details.\n\n## 📝 TODO List\nDataset and Data Engine\n- [✔] Release the arxiv and the project page.\n- [✔] Release the dataset (Objaverse-Xl) Download way.\n- [✔] Release the dataset (Objaverse-Xl) rendering (points) way.\n- [✔] Release pretrain training code and 3D captioning val code.\n- [ ] Release dataset and data annotation engine (Pyramid-XL). \n- [ ] Release more evaluation code.\n- [ ] Release more trainingn code.\n- [ ] Release more models.\n\n\n## 🔗 Citation\n\nIf you find our work helpful, please cite:\n\n```bibtex\n@inproceedings{GPT4Point,\n  title={GPT4Point: A Unified Framework for Point-Language Understanding and Generation},\n  author={Zhangyang Qi and Ye Fang and Zeyi Sun and Xiaoyang Wu and Tong Wu and Jiaqi Wang and Dahua Lin and Hengshuang Zhao},\n  booktitle={CVPR},\n  year={2024},\n}\n```\n\n\n## 📄 License\n\u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc-sa/4.0/\"\u003e\u003cimg alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by-nc-sa/4.0/80x15.png\" /\u003e\u003c/a\u003e\n\u003cbr /\u003e\nThis work is under the \u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc-sa/4.0/\"\u003eCreative Commons Attribution-NonCommercial-ShareAlike 4.0 International License\u003c/a\u003e.\n\n\n\n## 📚 Related Work\nTogether, Let's make LLM for 3D great!\n- [Point-Bind \u0026 Point-LLM](https://arxiv.org/abs/2309.00615): It aligns point clouds with Image-Bind to reason multi-modality input without 3D-instruction data training.\n- [3D-LLM](https://arxiv.org/abs/2307.12981): employs 2D foundation models to encode multi-view images of 3D point clouds.\n- [PointLLM](https://arxiv.org/abs/2308.16911): employs 3D point clouds with LLaVA.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPointcept%2FGPT4Point","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FPointcept%2FGPT4Point","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPointcept%2FGPT4Point/lists"}