{"id":13442790,"url":"https://github.com/POSTECH-CVLab/FastPointTransformer","last_synced_at":"2025-03-20T15:30:51.150Z","repository":{"id":40642938,"uuid":"475928299","full_name":"POSTECH-CVLab/FastPointTransformer","owner":"POSTECH-CVLab","description":"Official source code of Fast Point Transformer, CVPR 2022","archived":false,"fork":false,"pushed_at":"2023-02-01T09:49:50.000Z","size":579,"stargazers_count":283,"open_issues_count":5,"forks_count":41,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-16T13:09:34.501Z","etag":null,"topics":["3d-vision","computer-vision","cvpr2022","point-cloud","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/POSTECH-CVLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-03-30T14:57:51.000Z","updated_at":"2025-02-16T14:54:09.000Z","dependencies_parsed_at":"2023-02-17T04:15:55.323Z","dependency_job_id":null,"html_url":"https://github.com/POSTECH-CVLab/FastPointTransformer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/POSTECH-CVLab%2FFastPointTransformer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/POSTECH-CVLab%2FFastPointTransformer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/POSTECH-CVLab%2FFastPointTransformer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/POSTECH-CVLab%2FFastPointTransformer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/POSTECH-CVLab","download_url":"https://codeload.github.com/POSTECH-CVLab/FastPointTransformer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244639916,"owners_count":20485951,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-vision","computer-vision","cvpr2022","point-cloud","transformer"],"created_at":"2024-07-31T03:01:50.925Z","updated_at":"2025-03-20T15:30:50.835Z","avatar_url":"https://github.com/POSTECH-CVLab.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Fast Point Transformer\n### [Project Page](http://cvlab.postech.ac.kr/research/FPT/) | [Paper](https://arxiv.org/abs/2112.04702)\nThis repository contains the official source code and data for our paper:\n\n\u003e[Fast Point Transformer](https://arxiv.org/abs/2112.04702)  \n\u003e [Chunghyun Park](https://chrockey.github.io/),\n\u003e [Yoonwoo Jeong](https://yoonwoojeong.medium.com/about),\n\u003e [Minsu Cho](http://cvlab.postech.ac.kr/~mcho/), and\n\u003e [Jaesik Park](http://jaesik.info/)\u003cbr\u003e\n\u003e POSTECH GSAI \u0026 CSE\u003cbr\u003e\n\u003e CVPR, New Orleans, 2022.\n\n\u003cdiv style=\"text-align:center\"\u003e\n\u003cimg src=\"assets/overview.png\" alt=\"An Overview of the proposed pipeline\"/\u003e\n\u003c/div\u003e\n\n## Overview\nThis work introduces *Fast Point Transformer* that consists of a new lightweight self-attention layer. Our approach encodes continuous 3D coordinates, and the voxel hashing-based architecture boosts computational efficiency. The proposed method is demonstrated with 3D semantic segmentation and 3D detection. The accuracy of our approach is competitive to the best voxel based method, and our network achieves 129 times faster inference time than the state-of-the-art, Point Transformer, with a reasonable accuracy trade-off in 3D semantic segmentation on S3DIS dataset.\n\n## Citation\nIf you find our code or paper useful, please consider citing our paper:\n\n ```BibTeX\n@inproceedings{park2022fast,\n  title={Fast Point Transformer},\n  author={Park, Chunghyun and Jeong, Yoonwoo and Cho, Minsu and Park, Jaesik},\n  booktitle={Proceedings of the {IEEE/CVF} Conference on Computer Vision and Pattern Recognition (CVPR)},\n  month={June},\n  year={2022},\n  pages={16949-16958}\n}\n```\n\n## Experiments\n### 1. S3DIS Area 5 test\nWe denote MinkowskiNet42 trained with this repository as MinkowskiNet42\u003csup\u003e\u0026dagger;\u003c/sup\u003e.\nWe use voxel size 4cm for both MinkowskiNet42\u003csup\u003e\u0026dagger;\u003c/sup\u003e and our Fast Point Transformer.\n\n| Model                             | Latency (sec) | mAcc (%) | mIoU (%) | Reference |\n|:----------------------------------|--------------------:|:--------:|:--------:|:---------:|\n| PointTransformer                  | 18.07 | 76.5 | 70.4 | [Codes from the authors](https://github.com/POSTECH-CVLab/point-transformer) |\n| MinkowskiNet42\u003csup\u003e\u0026dagger;\u003c/sup\u003e | 0.08  | 74.1 | 67.2 | [Checkpoint](https://postechackr-my.sharepoint.com/:u:/g/personal/p0125ch_postech_ac_kr/EZcO0DH6QeNGgIwGFZsmL-4BAlikmHAHlBs4JBcS5XfpVQ?download=1) |\n| \u0026nbsp;\u0026nbsp;+ rotation average    | 0.66  | 75.1 | 69.0 | - |\n| FastPointTransformer              | 0.14 | 76.6 | 69.2 | [Checkpoint](https://postechackr-my.sharepoint.com/:u:/g/personal/p0125ch_postech_ac_kr/ER8KwMTzqAxAvK9KeOZ9U_IBuCAuv4hP6zOWD-3HNO6Xeg?download=1) |\n| \u0026nbsp;\u0026nbsp;+ rotation average    | 1.13  | 77.6 | 71.0 | - |\n\n### 2. ScanNetV2 validation\n| Model                             | Voxel Size  | mAcc (%) | mIoU (%) | Reference |\n|:----------------------------------|:-----------:|:--------:|:--------:|:---------:|\n| MinkowskiNet42                    | 2cm | 80.4 | 72.2 | [Official GitHub](https://github.com/chrischoy/SpatioTemporalSegmentation) |\n| MinkowskiNet42\u003csup\u003e\u0026dagger;\u003c/sup\u003e | 2cm | 81.4 | 72.1 | [Checkpoint](https://postechackr-my.sharepoint.com/:u:/g/personal/p0125ch_postech_ac_kr/EXmE1pWDZ8lEtJU7SQMjkXcBnhSMXFTdHWXkMAAF7KeiuA?download=1) |\n| FastPointTransformer              | 2cm | 81.2 | 72.5 | [Checkpoint](https://postechackr-my.sharepoint.com/:u:/g/personal/p0125ch_postech_ac_kr/EX_xAyhoNXdJg4eSg2vS_bYB8eFAP7A8FPCYfKOS2T13LQ?download=1) |\n| MinkowskiNet42\u003csup\u003e\u0026dagger;\u003c/sup\u003e | 5cm | 76.3 | 67.0 | [Checkpoint](https://postechackr-my.sharepoint.com/:u:/g/personal/p0125ch_postech_ac_kr/EZLG00u5JXJDvOi3sYziOIMB1l6HNN5OW9gTQRFWc6EwzA?download=1) |\n| FastPointTransformer              | 5cm | 78.9 | 70.0 | [Checkpoint](https://postechackr-my.sharepoint.com/:u:/g/personal/p0125ch_postech_ac_kr/EXbXclfXZGtMpBZY93zi7M8B_tl8rwM65NK1cumN7QM_8g?download=1) |\n| MinkowskiNet42\u003csup\u003e\u0026dagger;\u003c/sup\u003e | 10cm | 70.8 | 60.7 | [Checkpoint](https://postechackr-my.sharepoint.com/:u:/g/personal/p0125ch_postech_ac_kr/EVLn0f5noY1Al6Kos9l-0yABM0qZLFt6d4a3yFgTcQ2Vmw?download=1) |\n| FastPointTransformer              | 10cm | 76.1 | 66.5 | [Checkpoint](https://postechackr-my.sharepoint.com/:u:/g/personal/p0125ch_postech_ac_kr/ESO1jLNHO89ApdjguUauqsMBCx_TijA26UOeGbF4XxQwoA?download=1) |\n\n## Installation\nThis repository is developed and tested on\n\n- Ubuntu 18.04 and 20.04\n- Conda 4.11.0\n- CUDA 11.1 and 11.3\n- Python 3.8.13\n- PyTorch 1.7.1, 1.10.0, and 1.12.1\n- MinkowskiEngine 0.5.4\n\n### Environment Setup\nYou can install the environment by using the provided shell script:\n```bash\n~$ git clone --recursive git@github.com:POSTECH-CVLab/FastPointTransformer.git\n~$ cd FastPointTransformer\n~/FastPointTransformer$ bash setup.sh fpt\n~/FastPointTransformer$ conda activate fpt\n```\n\n## Training \u0026 Evaluation\nFirst of all, you need to download the datasets (ScanNetV2 and S3DIS), and preprocess them as:\n```bash\n(fpt) ~/FastPointTransformer$ python src/data/preprocess_scannet.py # you need to modify the data path\n(fpt) ~/FastPointTransformer$ python src/data/preprocess_s3dis.py # you need to modify the data path\n```\nAnd then, locate the provided meta data of each dataset (`src/data/meta_data`) with the preprocessed dataset following the structure below:\n\n```\n${data_dir}\n├── scannetv2\n│   ├── meta_data\n│   │   ├── scannetv2_train.txt\n│   │   ├── scannetv2_val.txt\n│   │   └── ...\n│   └── scannet_processed\n│       ├── train\n│       │   ├── scene0000_00.ply\n│       │   ├── scene0000_01.ply\n│       │   └── ...\n│       └── test\n└── s3dis\n    ├── meta_data\n    │   ├── area1.txt\n    │   ├── area2.txt\n    │   └── ...\n    └── s3dis_processed\n        ├── Area_1\n        │   ├── conferenceRoom_1.ply\n        │   ├── conferenceRoom_2.ply\n        │   └── ...\n        ├── Area_2\n        └── ...\n```\n\nAfter then, you can train and evalaute a model by using the provided python scripts (`train.py` and `eval.py`) with configuration files in the `config` directory.\nFor example, you can train and evaluate Fast Point Transformer with voxel size 4cm on S3DIS dataset via the following commands:\n```bash\n(fpt) ~/FastPointTransformer$ python train.py config/s3dis/train_fpt.gin\n(fpt) ~/FastPointTransformer$ python eval.py config/s3dis/eval_fpt.gin {checkpoint_file} # use -r option for rotation averaging.\n```\n\n### Consistency Score\nYou need to generate predictions via the following command:\n```bash\n(fpt) ~/FastPointTransformer$ python -m src.cscore.prepare {checkpoint_file} -m {model_name} -v {voxel_size} # This takes hours.\n```\nThen, you can calculate the consistency score (CScore) with:\n```bash\n(fpt) ~/FastPointTransformer$ python -m src.cscore.calculate {prediction_dir} # This takes seconds.\n```\n\n### 3D Object Detection using VoteNet\nPlease refer [this repository](https://github.com/chrockey/FastPointTransformer-VoteNet).\n\n## Acknowledgement\n\nOur code is based on the [MinkowskiEngine](https://github.com/NVIDIA/MinkowskiEngine).\nWe also thank [Hengshuang Zhao](https://hszhao.github.io/) for providing [the code](https://github.com/POSTECH-CVLab/point-transformer) of [Point Transformer](https://arxiv.org/abs/2012.09164).\nIf you use our model, please consider citing them as well.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPOSTECH-CVLab%2FFastPointTransformer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FPOSTECH-CVLab%2FFastPointTransformer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPOSTECH-CVLab%2FFastPointTransformer/lists"}