{"id":29219691,"url":"https://github.com/dvlab-research/sphereformer","last_synced_at":"2025-08-22T02:04:06.959Z","repository":{"id":147939439,"uuid":"617026213","full_name":"dvlab-research/SphereFormer","owner":"dvlab-research","description":"The official implementation for \"Spherical Transformer for LiDAR-based 3D Recognition\" (CVPR 2023).","archived":false,"fork":false,"pushed_at":"2023-06-08T09:34:58.000Z","size":1761,"stargazers_count":331,"open_issues_count":22,"forks_count":38,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-20T13:33:25.981Z","etag":null,"topics":["3d-object-detection","3d-semantic-segmentation","cvpr2023","lidar-point-cloud","nuscenes","semantickitti","transformer","waymo"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dvlab-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"license","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-03-21T14:57:09.000Z","updated_at":"2025-03-17T11:30:43.000Z","dependencies_parsed_at":"2024-01-16T02:46:39.664Z","dependency_job_id":"16b7bc06-06e1-4c9d-ac29-7d816c4700bd","html_url":"https://github.com/dvlab-research/SphereFormer","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/dvlab-research/SphereFormer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvlab-research%2FSphereFormer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvlab-research%2FSphereFormer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvlab-research%2FSphereFormer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvlab-research%2FSphereFormer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dvlab-research","download_url":"https://codeload.github.com/dvlab-research/SphereFormer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvlab-research%2FSphereFormer/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263245317,"owners_count":23436514,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-object-detection","3d-semantic-segmentation","cvpr2023","lidar-point-cloud","nuscenes","semantickitti","transformer","waymo"],"created_at":"2025-07-03T02:06:39.153Z","updated_at":"2025-07-03T02:06:41.720Z","avatar_url":"https://github.com/dvlab-research.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/spherical-transformer-for-lidar-based-3d/3d-semantic-segmentation-on-semantickitti)](https://paperswithcode.com/sota/3d-semantic-segmentation-on-semantickitti?p=spherical-transformer-for-lidar-based-3d)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/spherical-transformer-for-lidar-based-3d/lidar-semantic-segmentation-on-nuscenes)](https://paperswithcode.com/sota/lidar-semantic-segmentation-on-nuscenes?p=spherical-transformer-for-lidar-based-3d)\n\n# Spherical Transformer for LiDAR-based 3D Recognition (CVPR 2023)\n\nThis is the official PyTorch implementation of **SphereFormer** (CVPR 2023).\n\n**Spherical Transformer for LiDAR-based 3D Recognition** [\\[Paper\\]](https://arxiv.org/pdf/2303.12766.pdf)\n\nXin Lai, Yukang Chen, Fanbin Lu, Jianhui Liu, Jiaya Jia \n\n# Highlight \n1. **SphereFormer** is a plug-and-play transformer module. We develop **radial window attention**, which significantly boosts the segmentation performance of **distant points**, e.g., from 13.3% to 30.4% mIoU on nuScenes lidarseg *val* set. \n2. It achieves superior performance on various **outdoor semantic segmentation benchmarks**, e.g., nuScenes, SemanticKITTI, Waymo, and also shows competitive results on **nuScenes detection** dataset.\n3. This repository employs a **fast** and **memory-efficient** library for sparse transformer with **varying token numbers**, [**SparseTransformer**](https://github.com/dvlab-research/SparseTransformer).\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"figs/figure.jpg\"/\u003e\n\u003c/div\u003e\n\n# Get Started\n\nFor *object deteciton*, please go to the `detection/` directory. (or click [Here](detection/README.md))\n\nThe below guide is for *semantic segmentation*.\n\n## Environment\n\nInstall dependencies (we test on python=3.7.9, pytorch==1.8.0, cuda==11.1, gcc==7.5.0)\n```\ngit clone https://github.com/dvlab-research/SphereFormer.git --recursive\npip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html\npip install torch_scatter==2.0.9\npip install torch_geometric==1.7.2\npip install spconv-cu114==2.1.21\npip install torch_sparse==0.6.12 cumm-cu114==0.2.8 torch_cluster==1.5.9\npip install tensorboard timm termcolor tensorboardX\n```\n\nInstall `sptr`\n```\ncd third_party/SparseTransformer \u0026\u0026 python setup.py install\n```\n\nNote: Make sure you have installed `gcc` and `cuda`, and `nvcc` can work (if you install cuda by conda, it won't provide nvcc and you should install cuda manually.)\n\n## Datasets Preparation\n\n### nuScenes\nDownload the nuScenes dataset from [here](https://www.nuscenes.org/nuscenes#download). Unzip and arrange it as follows. Then fill in the `data_root` entry in the .yaml configuration file.\n```\nnuscenes/\n|--- v1.0-trainval/\n|--- samples/\n|------- LIDAR_TOP/\n|--- lidarseg/\n|------- v1.0-trainval/\n```\nThen, fill in the `data_path` and `save_dir` in `data/nuscenes_preprocess_infos.py`, then generate the infos by\n```\npip install nuscenes-devkit pyquaternion\ncd data \u0026\u0026 python nuscenes_preprocess_infos.py\n```\n\n### SemanticKITTI\nDownload the SemanticKIITI dataset from [here](http://www.semantic-kitti.org/dataset.html#download). Unzip and arrange it as follows. Then fill in the `data_root` entry in the .yaml configuration file.\n```\ndataset/\n|--- sequences/\n|------- 00/\n|------- 01/\n|------- 02/\n|------- 03/\n|------- .../\n```\n\n### Waymo Open Dataset\nDownload the Waymo Open Dataset from [here](https://waymo.com/open/). Unzip and arrange it as follows. Then fill in the `data_root` entry in the .yaml configuration file.\n```\nwaymo/\n|--- training/\n|--- validation/\n|--- testing/\n```\nThen, transfer the raw files into the format of SemanticKITTI as follows. (Note: do not use GPU here, and CPU works well already)\n```\ncd data/waymo_to_semanticKITTI\nCUDA_VISIBLE_DEVICES=\"\" python convert.py --load_dir [YOUR_DATA_ROOT] --save_dir [YOUR_SAVE_ROOT]\n```\n\n## Training\n\n### nuScenes\n```\npython train.py --config config/nuscenes/nuscenes_unet32_spherical_transformer.yaml\n```\n\n### SemanticKITTI\n```\npython train.py --config config/semantic_kitti/semantic_kitti_unet32_spherical_transformer.yaml\n```\n\n### Waymo Open Dataset\n```\npython train.py --config config/waymo/waymo_unet32_spherical_transformer.yaml\n```\n\n## Validation\nFor validation, you need to modify the `.yaml` config file. (1) fill in the `weight` with the path of model weight (`.pth` file); (2) set `val` to `True`; (3) for testing-time augmentation, set `use_tta` to `True` and set `vote_num` accordingly. After that, run the following command. \n```\npython train.py --config [YOUR_CONFIG_PATH]\n```\n\n## Pre-trained Models\n\n\n| dataset | Val mIoU (tta) | Val mIoU | mIoU_close | mIoU_medium | mIoU_distant |  Download  |\n|---------------|:----:|:----:|:----:|:----:|:----:|:-----------:|\n| [nuScenes](config/nuscenes/nuscenes_unet32_spherical_transformer.yaml) | 79.5 | 78.4 | 80.8 | 60.8 | 30.4 | [Model Weight](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155154502_link_cuhk_edu_hk/Ebj08nZvE5lPpRn1ALgkcKwBjEQ5lrQFhx-yR2cbi9Cy-A?e=D3N3ge) |\n| [SemanticKITTI](config/semantic_kitti/semantic_kitti_unet32_spherical_transformer.yaml) | 69.0 | 67.8 | 68.6 | 60.4 | 17.8 | [Model Weight](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155154502_link_cuhk_edu_hk/EXsr5RdFzd9Lj7_T8L0dCagBZCDmbe5DtcZ8ipf1CfC58w?e=KxGpLV) |\n| [Waymo Open Dataset](config/waymo/waymo_unet32_spherical_transformer.yaml) | 70.8 | 69.9 | 70.3 | 68.6 | 61.9 | N/A |\n\nNote: Pre-trained weights on Waymo Open Dataset are not released due to the regulations. \n\n# SpTr Library\nThe `SpTr` library is highly recommended for sparse transformer, particularly for 3D point cloud attention. It is **fast**, **memory-efficient** and **easy-to-use**. The github repository is https://github.com/dvlab-research/SparseTransformer.git.\n\n# Citation\nIf you find this project useful, please consider citing:\n\n```\n@inproceedings{lai2023spherical,\n  title={Spherical Transformer for LiDAR-based 3D Recognition},\n  author={Lai, Xin and Chen, Yukang and Lu, Fanbin and Liu, Jianhui and Jia, Jiaya},\n  booktitle={CVPR},\n  year={2023}\n}\n```\n\n# Our Works on 3D Point Cloud\n\n* **Spherical Transformer for LiDAR-based 3D Recognition (CVPR 2023)** [\\[Paper\\]](https://arxiv.org/pdf/2303.12766.pdf) [\\[Code\\]](https://github.com/dvlab-research/SphereFormer) : A plug-and-play transformer module that boosts performance for distant region (for 3D LiDAR point cloud)\n\n* **Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022)**: [\\[Paper\\]](https://openaccess.thecvf.com/content/CVPR2022/papers/Lai_Stratified_Transformer_for_3D_Point_Cloud_Segmentation_CVPR_2022_paper.pdf) [\\[Code\\]](https://github.com/dvlab-research/Stratified-Transformer) : Point-based window transformer for 3D point cloud segmentation\n\n* **SparseTransformer (SpTr) Library** [\\[Code\\]](https://github.com/dvlab-research/SparseTransformer) : A fast, memory-efficient, and easy-to-use library for sparse transformer with varying token numbers.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdvlab-research%2Fsphereformer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdvlab-research%2Fsphereformer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdvlab-research%2Fsphereformer/lists"}