{"id":13442330,"url":"https://github.com/qizekun/ReCon","last_synced_at":"2025-03-20T13:33:24.137Z","repository":{"id":111961573,"uuid":"598482835","full_name":"qizekun/ReCon","owner":"qizekun","description":"[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining","archived":false,"fork":false,"pushed_at":"2024-07-21T18:15:20.000Z","size":2062,"stargazers_count":128,"open_issues_count":1,"forks_count":13,"subscribers_count":8,"default_branch":"main","last_synced_at":"2024-10-28T05:13:06.750Z","etag":null,"topics":["3d-point-clouds","multi-modal-learning","representation-learning","self-supervised-learning"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2302.02318","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/qizekun.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-07T07:47:32.000Z","updated_at":"2024-10-24T14:07:08.000Z","dependencies_parsed_at":"2024-01-16T02:46:28.459Z","dependency_job_id":"caddaf3a-6ea5-441c-8eed-fe769f22c224","html_url":"https://github.com/qizekun/ReCon","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qizekun%2FReCon","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qizekun%2FReCon/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qizekun%2FReCon/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qizekun%2FReCon/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/qizekun","download_url":"https://codeload.github.com/qizekun/ReCon/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244619280,"owners_count":20482392,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-point-clouds","multi-modal-learning","representation-learning","self-supervised-learning"],"created_at":"2024-07-31T03:01:44.422Z","updated_at":"2025-03-20T13:33:23.594Z","avatar_url":"https://github.com/qizekun.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# 🪖 ReCon: Contrast with Reconstruct\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/contrast-with-reconstruct-contrastive-3d/3d-point-cloud-linear-classification-on)](https://paperswithcode.com/sota/3d-point-cloud-linear-classification-on?p=contrast-with-reconstruct-contrastive-3d)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/contrast-with-reconstruct-contrastive-3d/3d-point-cloud-classification-on-scanobjectnn)](https://paperswithcode.com/sota/3d-point-cloud-classification-on-scanobjectnn?p=contrast-with-reconstruct-contrastive-3d)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/contrast-with-reconstruct-contrastive-3d/3d-point-cloud-classification-on-modelnet40)](https://paperswithcode.com/sota/3d-point-cloud-classification-on-modelnet40?p=contrast-with-reconstruct-contrastive-3d)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/contrast-with-reconstruct-contrastive-3d/few-shot-3d-point-cloud-classification-on-1)](https://paperswithcode.com/sota/few-shot-3d-point-cloud-classification-on-1?p=contrast-with-reconstruct-contrastive-3d)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/contrast-with-reconstruct-contrastive-3d/zero-shot-transfer-3d-point-cloud)](https://paperswithcode.com/sota/zero-shot-transfer-3d-point-cloud?p=contrast-with-reconstruct-contrastive-3d)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/contrast-with-reconstruct-contrastive-3d/zero-shot-transfer-3d-point-cloud-1)](https://paperswithcode.com/sota/zero-shot-transfer-3d-point-cloud-1?p=contrast-with-reconstruct-contrastive-3d)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/contrast-with-reconstruct-contrastive-3d/zero-shot-transfer-3d-point-cloud-2)](https://paperswithcode.com/sota/zero-shot-transfer-3d-point-cloud-2?p=contrast-with-reconstruct-contrastive-3d)\n\n\u003e [**Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining**](https://arxiv.org/abs/2302.02318) **ICML 2023** \u003cbr\u003e\n\u003e [Zekun Qi](https://scholar.google.com/citations?user=ap8yc3oAAAAJ)\\*, [Runpei Dong](https://runpeidong.com/)\\*, [Guofan Fan](https://github.com/Asterisci), [Zheng Ge](https://scholar.google.com.hk/citations?user=hJ-VrrIAAAAJ\u0026hl=en\u0026oi=ao), [Xiangyu Zhang](https://scholar.google.com.hk/citations?user=yuB-cfoAAAAJ\u0026hl=en\u0026oi=ao), [Kaisheng Ma](http://group.iiis.tsinghua.edu.cn/~maks/leader.html) and [Li Yi](https://ericyi.github.io/) \u003cbr\u003e\n\n[OpenReview](https://openreview.net/forum?id=80IfYewOh1) | [arXiv](https://arxiv.org/abs/2302.02318) | [Models](https://drive.google.com/drive/folders/17Eoy5N96dcTQJplCOjyeeVjSYyjW5QEd?usp=share_link)\n\nThis repository contains the code release of ReCon: **Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining** (ICML 2023). ReCon is also short for *reconnaissance* 🪖.\n\n## Contrast with Reconstruct (ICML 2023)\n\n[//]: # (Mainstream 3D representation learning approaches are built upon contrastive or generative modeling pretext tasks, where great improvements in performance on various downstream tasks have been achieved. However, by investigating the methods of these two paradigms, we find that \u0026#40;i\u0026#41; contrastive models are data-hungry that suffer from a representation over-fitting issue; \u0026#40;ii\u0026#41; generative models have a data filling issue that shows inferior data scaling capacity compared to contrastive models. This motivates us to learn 3D representations by sharing the merits of both paradigms, which is non-trivial due to the pattern difference between the two paradigms. In this paper, we propose *contrast with reconstruct* \u0026#40;**ReCon**\u0026#41; that unifies these two paradigms. ReCon is trained to learn from both generative modeling teachers and cross-modal contrastive teachers through ensemble distillation, where the generative student is used to guide the contrastive student. An encoder-decoder style ReCon-block is proposed that transfers knowledge through cross attention with stop-gradient, which avoids pretraining over-fitting and pattern difference issues. ReCon achieves a new state-of-the-art in 3D representation learning, e.g., 91.26% accuracy on ScanObjectNN.)\n\n\u003cdiv  align=\"center\"\u003e    \n \u003cimg src=\"./figure/framework.png\" width = \"1100\"  align=center /\u003e\n\u003c/div\u003e\n\n\n## News\n\n- 🍾 July, 2024: [**ShapeLLM (ReCon++)**](https://qizekun.github.io/shapellm/) accepted by ECCV 2024, check out the [code](https://github.com/qizekun/ShapeLLM)\n- 💥 Mar, 2024: Check out our latest work [**ShapeLLM (ReCon++)**](https://qizekun.github.io/shapellm/), which achieves **95.25%** fine-tuned accuracy and **65.4** zero-shot accuracy on ScanObjectNN\n- 📌 Aug, 2023: Check out our exploration of efficient conditional 3D generation [**VPP**](https://arxiv.org/abs/2307.16605)\n- 📌 Jun, 2023: Check out our exploration of pre-training in 3D scenes [**Point-GCC**](https://arxiv.org/abs/2305.19623)\n- 🎉 Apr, 2023: [**ReCon**](https://arxiv.org/abs/2302.02318) accepted by ICML 2023\n- 💥 Feb, 2023: Check out our previous work [**ACT**](https://arxiv.org/abs/2212.08320), which has been accepted by ICLR 2023\n\n## 1. Requirements\nPyTorch \u003e= 1.7.0;\npython \u003e= 3.7;\nCUDA \u003e= 9.0;\nGCC \u003e= 4.9;\ntorchvision;\n\n```\n# Quick Start\nconda create -n recon python=3.10 -y\nconda activate recon\n\nconda install pytorch==2.0.1 torchvision==0.15.2 cudatoolkit=11.8 -c pytorch -c nvidia\n# pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 -f https://download.pytorch.org/whl/torch_stable.html\n```\n\n```\n# Install basic required packages\npip install -r requirements.txt\n# Chamfer Distance\ncd ./extensions/chamfer_dist \u0026\u0026 python setup.py install --user\n# PointNet++\npip install \"git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops\u0026subdirectory=pointnet2_ops_lib\"\n```\n\n## 2. Datasets\n\nWe use ShapeNet, ScanObjectNN, ModelNet40 and ShapeNetPart in this work. See [DATASET.md](./DATASET.md) for details.\n\n## 3. ReCon Models\n| Task              | Dataset        | Config                                                               | Acc.       | Checkpoints Download                                                                                     |\n|-------------------|----------------|----------------------------------------------------------------------|------------|----------------------------------------------------------------------------------------------------------|\n| Pre-training      | ShapeNet       | [pretrain_base.yaml](cfgs/pretrain/base.yaml)                        | N.A.       | [ReCon](https://drive.google.com/file/d/1L-TlZUi7umBCDpZW-1F0Gf4X-9Wvf_Zo/view?usp=share_link)           |\n| Classification    | ScanObjectNN   | [finetune_scan_hardest.yaml](./cfgs/full/finetune_scan_hardest.yaml) | 91.26%     | [PB_T50_RS](https://drive.google.com/file/d/1kjKqvs8o6jiqZc4-srMFp2DOpIYDHdgf/view?usp=share_link)       |\n| Classification    | ScanObjectNN   | [finetune_scan_objbg.yaml](./cfgs/full/finetune_scan_objbg.yaml)     | 95.35%     | [OBJ_BG](https://drive.google.com/file/d/1qjohpaTCl-DzHaIv6Ilq0sLAGG2H3Z9I/view?usp=share_link)          |\n| Classification    | ScanObjectNN   | [finetune_scan_objonly.yaml](./cfgs/full/finetune_scan_objonly.yaml) | 93.80%     | [OBJ_ONLY](https://drive.google.com/file/d/1kvowgPbvlFxx3B5WSfL3LiKZ5--5s52b/view?usp=share_link)        |\n| Classification    | ModelNet40(1k) | [finetune_modelnet.yaml](./cfgs/full/finetune_modelnet.yaml)         | 94.5%      | [ModelNet_1k](https://drive.google.com/file/d/1UsRuIc7ND2n4PjYyF3n0tT3hf7alyOML/view?usp=share_link)     |\n| Classification    | ModelNet40(8k) | [finetune_modelnet_8k.yaml](./cfgs/full/finetune_modelnet_8k.yaml)   | 94.7%      | [ModelNet_8k](https://drive.google.com/file/d/1qUuT6sjhZw3gn0rFj-qDfYG6VMBXF0sT/view?usp=share_link)     |\n| Zero-Shot         | ModelNet10     | [zeroshot_modelnet10.yaml](./cfgs/zeroshot/modelnet10.yaml)          | 75.6%      | [ReCon zero-shot](https://drive.google.com/file/d/1Xz6lZn6MI2lJldPiSqLdAnjEm0dwQ4Mg/view?usp=share_link) |\n| Zero-Shot         | ModelNet10*    | [zeroshot_modelnet10.yaml](./cfgs/zeroshot/modelnet10.yaml)          | 81.6%      | [ReCon zero-shot](https://drive.google.com/file/d/1Xz6lZn6MI2lJldPiSqLdAnjEm0dwQ4Mg/view?usp=share_link) |\n| Zero-Shot         | ModelNet40     | [zeroshot_modelnet40.yaml](./cfgs/zeroshot/modelnet40.yaml)          | 61.7%      | [ReCon zero-shot](https://drive.google.com/file/d/1Xz6lZn6MI2lJldPiSqLdAnjEm0dwQ4Mg/view?usp=share_link) |\n| Zero-Shot         | ModelNet40*    | [zeroshot_modelnet40.yaml](./cfgs/zeroshot/modelnet40.yaml)          | 66.8%      | [ReCon zero-shot](https://drive.google.com/file/d/1Xz6lZn6MI2lJldPiSqLdAnjEm0dwQ4Mg/view?usp=share_link) |\n| Zero-Shot         | ScanObjectNN   | [zeroshot_scan_objonly.yaml](./cfgs/zeroshot/scan_objonly.yaml)      | 43.7%      | [ReCon zero-shot](https://drive.google.com/file/d/1Xz6lZn6MI2lJldPiSqLdAnjEm0dwQ4Mg/view?usp=share_link) |\n| Linear SVM        | ModelNet40     | [svm.yaml](./cfgs/svm/modelnet40.yaml)                               | 93.4%      | [ReCon svm](https://drive.google.com/file/d/1SvCfDzXM2QM7BfOd960z3759HY_c-eQv/view?usp=share_link)       |\n| Part Segmentation | ShapeNetPart   | [segmentation](./segmentation)                                       | 86.4% mIoU | [part seg](https://drive.google.com/file/d/13XuEsN7BDu-YX86ZSM1SpUHGMvDys2VH/view?usp=share_link)        |\n\n| Task              | Dataset    | Config                                   | 5w10s (%)  | 5w20s (%)  | 10w10s (%) | 10w20s (%) | Download                                                                                       |\n|-------------------|------------|------------------------------------------|------------|------------|------------|------------|------------------------------------------------------------------------------------------------|\n| Few-shot learning | ModelNet40 | [fewshot.yaml](./cfgs/full/fewshot.yaml) | 97.3 ± 1.9 | 98.9 ± 1.2 | 93.3 ± 3.9 | 95.8 ± 3.0 | [ReCon](https://drive.google.com/file/d/1L-TlZUi7umBCDpZW-1F0Gf4X-9Wvf_Zo/view?usp=share_link) |\n\nThe checkpoints and logs have been released on [Google Drive](https://drive.google.com/drive/folders/17Eoy5N96dcTQJplCOjyeeVjSYyjW5QEd?usp=share_link). You can use the voting strategy in classification testing to reproduce the performance reported in the paper.\nFor classification downstream tasks, we randomly select 8 seeds to obtain the best checkpoint. \nFor zero-shot learning, * means that we use all the train/test data for zero-shot transfer.\n\n## 4. ReCon Pre-training\nPre-training with the default configuration, run the script:\n```\nsh scripts/pretrain.sh \u003cGPU\u003e \u003cexp_name\u003e\n```\nIf you want to try different models or masking ratios etc., first create a new config file, and pass its path to --config.\n```\nCUDA_VISIBLE_DEVICES=\u003cGPU\u003e python main.py --config \u003cconfig_path\u003e --exp_name \u003cexp_name\u003e\n```\n## 5. ReCon Classification Fine-tuning\nFine-tuning with the default configuration, run the script:\n```\nbash scripts/cls.sh \u003cGPU\u003e \u003cexp_name\u003e \u003cpath/to/pre-trained/model\u003e\n```\nOr, you can use the command.\n\nFine-tuning on ScanObjectNN, run:\n```\nCUDA_VISIBLE_DEVICES=\u003cGPUs\u003e python main.py --config cfgs/full/finetune_scan_hardest.yaml \\\n--finetune_model --exp_name \u003cexp_name\u003e --ckpts \u003cpath/to/pre-trained/model\u003e\n```\nFine-tuning on ModelNet40, run:\n```\nCUDA_VISIBLE_DEVICES=\u003cGPUs\u003e python main.py --config cfgs/full/finetune_modelnet.yaml \\\n--finetune_model --exp_name \u003cexp_name\u003e --ckpts \u003cpath/to/pre-trained/model\u003e\n```\n## 6. ReCon Test\u0026Voting\nTest\u0026Voting with the default configuration, run the script:\n```\nbash scripts/test.sh \u003cGPU\u003e \u003cexp_name\u003e \u003cpath/to/best/fine-tuned/model\u003e\n```\nor:\n```\nCUDA_VISIBLE_DEVICES=\u003cGPUs\u003e python main.py --test --config cfgs/finetune_modelnet.yaml \\\n--exp_name \u003coutput_file_name\u003e --ckpts \u003cpath/to/best/fine-tuned/model\u003e\n```\n## 7. ReCon Few-Shot\nFew-shot with the default configuration, run the script:\n```\nsh scripts/fewshot.sh \u003cGPU\u003e \u003cexp_name\u003e \u003cpath/to/pre-trained/model\u003e \u003cway\u003e \u003cshot\u003e \u003cfold\u003e\n```\nor\n```\nCUDA_VISIBLE_DEVICES=\u003cGPUs\u003e python main.py --config cfgs/full/fewshot.yaml --finetune_model \\\n--ckpts \u003cpath/to/pre-trained/model\u003e --exp_name \u003cexp_name\u003e --way \u003c5 or 10\u003e --shot \u003c10 or 20\u003e --fold \u003c0-9\u003e\n```\n## 8. ReCon Zero-Shot\nZero-shot with the default configuration, run the script:\n```\nbash scripts/zeroshot.sh \u003cGPU\u003e \u003cexp_name\u003e \u003cpath/to/pre-trained/model\u003e\n```\n## 9. ReCon Part Segmentation\nPart segmentation on ShapeNetPart, run:\n```\ncd segmentation\nbash seg.sh \u003cGPU\u003e \u003cexp_name\u003e \u003cpath/to/pre-trained/model\u003e\n```\nor\n```\ncd segmentation\npython main.py --ckpts \u003cpath/to/pre-trained/model\u003e --log_dir \u003cpath/to/log/dir\u003e --learning_rate 0.0001 --epoch 300\n```\nTest part segmentation on ShapeNetPart, run:\n```\ncd segmentation\nbash test.sh \u003cGPU\u003e \u003cexp_name\u003e \u003cpath/to/best/fine-tuned/model\u003e\n```\n## 10. ReCon Linear SVM\nLinear SVM on ModelNet40, run:\n```\nsh scripts/svm.sh \u003cGPU\u003e \u003cexp_name\u003e \u003cpath/to/pre-trained/model\u003e \n```\n\n## 11. Visualization\nWe use [PointVisualizaiton](https://github.com/qizekun/PointVisualizaiton) repo to render beautiful point cloud image, including specified color rendering and attention distribution rendering.\n\n\n## Contact\n\nIf you have any questions related to the code or the paper, feel free to email Zekun (`qizekun@gmail.com`) or Runpei (`runpei.dong@gmail.com`). \n\n## License\n\nReCon is released under MIT License. See the [LICENSE](./LICENSE) file for more details. Besides, the licensing information for `pointnet2` modules is available [here](https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/UNLICENSE).\n\n## Acknowledgements\n\nThis codebase is built upon [Point-MAE](https://github.com/Pang-Yatian/Point-MAE), [Point-BERT](https://github.com/lulutang0608/Point-BERT), [CLIP](https://github.com/openai/CLIP), [Pointnet2_PyTorch](https://github.com/erikwijmans/Pointnet2_PyTorch) and [ACT](https://github.com/RunpeiDong/ACT)\n\n## Citation\n\nIf you find our work useful in your research, please consider citing:\n\n```bibtex\n@inproceedings{qi2023recon,\n  title={Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining},\n  author={Qi, Zekun and Dong, Runpei and Fan, Guofan and Ge, Zheng and Zhang, Xiangyu and Ma, Kaisheng and Yi, Li},\n  booktitle={International Conference on Machine Learning (ICML) },\n  year={2023}\n}\n```\nand closely related work [ACT](https://github.com/RunpeiDong/ACT) and [ShapeLLM](https://github.com/qizekun/ShapeLLM):\n```bibtex\n@inproceedings{dong2023act,\n  title={Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?},\n  author={Runpei Dong and Zekun Qi and Linfeng Zhang and Junbo Zhang and Jianjian Sun and Zheng Ge and Li Yi and Kaisheng Ma},\n  booktitle={The Eleventh International Conference on Learning Representations (ICLR) },\n  year={2023},\n  url={https://openreview.net/forum?id=8Oun8ZUVe8N}\n}\n@inproceedings{qi2024shapellm,\n  author = {Qi, Zekun and Dong, Runpei and Zhang, Shaochen and Geng, Haoran and Han, Chunrui and Ge, Zheng and Yi, Li and Ma, Kaisheng},\n  title = {ShapeLLM: Universal 3D Object Understanding for Embodied Interaction},\n  booktitle={European Conference on Computer Vision (ECCV) },\n  year = {2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqizekun%2FReCon","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqizekun%2FReCon","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqizekun%2FReCon/lists"}