{"id":18726544,"url":"https://github.com/chrischoy/spatiotemporalsegmentation","last_synced_at":"2025-10-03T18:09:44.501Z","repository":{"id":35933805,"uuid":"211492027","full_name":"chrischoy/SpatioTemporalSegmentation","owner":"chrischoy","description":"4D Spatio-Temporal Semantic Segmentation on a 3D video (a sequence of 3D scans)","archived":false,"fork":false,"pushed_at":"2022-12-05T06:34:28.000Z","size":1272,"stargazers_count":304,"open_issues_count":31,"forks_count":50,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-05-19T20:08:56.267Z","etag":null,"topics":["3d","3d-vision","convolutional-neural-network","cvpr","minkowski-engine","scannet","semantic-segmentation","spatio-temporal-analysis"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chrischoy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-09-28T11:45:58.000Z","updated_at":"2025-05-12T16:01:44.000Z","dependencies_parsed_at":"2023-01-16T09:25:46.298Z","dependency_job_id":null,"html_url":"https://github.com/chrischoy/SpatioTemporalSegmentation","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/chrischoy/SpatioTemporalSegmentation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrischoy%2FSpatioTemporalSegmentation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrischoy%2FSpatioTemporalSegmentation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrischoy%2FSpatioTemporalSegmentation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrischoy%2FSpatioTemporalSegmentation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chrischoy","download_url":"https://codeload.github.com/chrischoy/SpatioTemporalSegmentation/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrischoy%2FSpatioTemporalSegmentation/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263636477,"owners_count":23492267,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d","3d-vision","convolutional-neural-network","cvpr","minkowski-engine","scannet","semantic-segmentation","spatio-temporal-analysis"],"created_at":"2024-11-07T14:14:46.080Z","updated_at":"2025-10-03T18:09:44.446Z","avatar_url":"https://github.com/chrischoy.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[pypi-url]: https://pypi.org/project/MinkowskiEngine/\n\n\n# Spatio-Temporal Segmentation\n\nThis repository contains the accompanying code for [4D-SpatioTemporal ConvNets: Minkowski Convolutional Neural Networks, CVPR'19](https://arxiv.org/abs/1904.08755).\n\n## Change Log\n\n- 2020-05-19 The latest Minkowski Engine since the commit [be5c3](https://github.com/StanfordVL/MinkowskiEngine/commit/be5c3c18b26d6a62380d613533b7a939a5458705), does not require explicit cache clear and can use the memory more efficiently.\n- 2020-05-04: As pointed out by Thomas Chaton on [Issue#30](https://github.com/chrischoy/SpatioTemporalSegmentation/issues/30), I also found out that the training script contains bugs that models cannot reach the target performance described in the Model Zoo with the latest MinkowskiEngine. I am in the process of debugging the bugs, but I am having some difficulty finding the bugs. So, I created another git repo [SpatioTemporalSegmentation-ScanNet](https://github.com/chrischoy/SpatioTemporalSegmentation-ScanNet) from my other private repo that reaches the target performance. Please refer to the [SpatioTemporalSegmentation-ScanNet](https://github.com/chrischoy/SpatioTemporalSegmentation-ScanNet) for the ScanNet training. I'll update this repo once I find the bugs and merge SpatioTemporalSegmentation-ScanNet with this repo. Sorry for the trouble.\n\n## Requirements\n\n- Ubuntu 14.04 or higher\n- CUDA 10.1 or higher\n- pytorch 1.3 or higher\n- python 3.6 or higher\n- GCC 6 or higher\n\n\n## Installation\n\nYou need to install `pytorch` and [`Minkowski Engine`][pypi-url] either with `pip` or with anaconda.\n\n### Pip\n\nThe MinkowskiEngine is distributed via [PyPI MinkowskiEngine][pypi-url] which can be installed simply with `pip`.\nFirst, install pytorch following the [instruction](https://pytorch.org). Next, install `openblas`.\n\n```\nsudo apt install libopenblas-dev\n\npip install torch torchvision\n\npip install -U git+https://github.com/StanfordVL/MinkowskiEngine\n```\n\nNext, clone the repository and install the rest of the requirements\n\n```\ngit clone https://github.com/chrischoy/SpatioTemporalSegmentation/\n\ncd SpatioTemporalSegmentation\n\npip install -r requirements.txt\n```\n\n### Troubleshooting\n\nPlease visit the MinkowskiEngine [issue pages](https://github.com/StanfordVL/MinkowskiEngine/) if you have difficulties installing Minkowski Engine.\n\n\n## ScanNet Training\n\n1. Download the ScanNet dataset from [the official website](http://kaldir.vc.in.tum.de/scannet_benchmark/documentation). You need to sign the terms of use.\n\n2. Next, preprocess all scannet raw point cloud with the following command after you set the path correctly.\n\n```\npython -m lib.datasets.preprocessing.scannet\n```\n\n3. Train the network with\n\n```\nexport BATCH_SIZE=N;\n./scripts/train_scannet.sh 0 \\\n\t-default \\\n\t\"--scannet_path /path/to/preprocessed/scannet\"\n```\n\nModify the `BATCH_SIZE` accordingly.\n\nThe first argument is the GPU id and the second argument is the path postfix\nand the last argument is the miscellaneous arguments.\n\n\n### mIoU vs. Overall Accuracy\n\nThe official evaluation metric for ScanNet is mIoU.\nOA, Overal Accuracy is not the official metric since it is not discriminative. This is the convention from the 2D semantic segmentation as the pixelwise overall accuracy does not capture the fidelity of the semantic segmentation.\nOn 3D ScanNet semantic segmentation, OA: 89.087 -\u003e mIOU 71.496 mAP 76.127 mAcc 79.660 on the ScanNet validation set v2.\n\nThen why is the overall accuracy least discriminative metric?  This is due to the fact that most of the scenes consist of large structures\nsuch as walls, floors, or background and scores on these will dominate the statistics if you use Overall Accuracy.\n\n\n## Synthia 4D Experiment\n\n1. Download the dataset from [download](http://cvgl.stanford.edu/data2/Synthia4D.tar)\n\n2. Extract\n\n```\ncd /path/to/extract/synthia4d\nwget http://cvgl.stanford.edu/data2/Synthia4D.tar\ntar -xf Synthia4D.tar\ntar -xvjf *.tar.bz2\n```\n\n3. Training\n\n```\nexport BATCH_SIZE=N; \\\n./scripts/train_synthia4d.sh 0 \\\n\t\"-default\" \\\n\t\"--synthia_path /path/to/extract/synthia4d\"\n```\n\nThe above script trains a network. You have to change the arguments accordingly. The first argument to the script is the GPU id. Second argument is the log directory postfix; change to mark your experimental setup. The final argument is a series of the miscellaneous aruments. You have to specify the synthia directory here. Also, you have to wrap all arguments with \" \".\n\n\n## Stanford 3D Dataset\n\n1. Download the stanford 3d dataset from [the website](http://buildingparser.stanford.edu/dataset.html)\n\n2. Preprocess\n\nModify the input and output directory accordingly in\n\n`lib/datasets/preprocessing/stanford.py`\n\nAnd run\n\n```\npython -m lib.datasets.preprocessing.stanford\n```\n\n3. Train\n\n```\n./scripts/train_stanford.sh 0 \\\n\t\"-default\" \\\n\t\"--stanford3d_path /PATH/TO/PREPROCESSED/STANFORD\"\n```\n\n## Model Zoo\n\n| Model            | Dataset             | Voxel Size | Conv1 Kernel Size | Performance              | Link   |\n|:----------------:|:-------------------:|:----------:|:-----------------:|:-------------------------------------------------------------------------------:|:------:|\n| Mink16UNet34C    | ScanNet train + val | 2cm        | 3                 | Test set 73.6% mIoU, no sliding window                                          | [download](https://node1.chrischoy.org/data/publications/minknet/Mink16UNet34C_ScanNet.pth) |\n| Mink16UNet34C    | ScanNet train       | 2cm        | 5                 | Val 72.219% mIoU, no rotation average, no sliding window [per class performance](https://github.com/chrischoy/SpatioTemporalSegmentation/issues/13) | [download](https://node1.chrischoy.org/data/publications/minknet/MinkUNet34C-train-conv1-5.pth) |\n| Mink16UNet18     | Stanford Area5 train | 5cm       | 5                 | Area 5 test 65.828% mIoU, no rotation average, no sliding window [per class performance](https://pastebin.com/Gj3PrPFr) | [download](https://node1.chrischoy.org/data/publications/minknet/Mink16UNet18-stanford-conv1-5.pth) |\n| Mink16UNet34     | Stanford Area5 train | 5cm       | 5                 | Area 5 test 66.348% mIoU, no rotation average, no sliding window [per class performance](https://pastebin.com/WzhfGMQG) | [download](https://node1.chrischoy.org/data/publications/minknet/Mink16UNet34-stanford-conv1-5.pth) |\n| 3D Mink16UNet14A | Synthia CVPR19 train | 15cm      | 3                 | CVPR19 test 81.903% mIoU, no rotation average, no sliding window [per class performance](https://pastebin.com/hN88iQqa) | [download](https://node1.chrischoy.org/data/publications/minknet/Mink16UNet14A-synthia-conv1-3.pth) |\n| 3D Mink16UNet18  | Synthia CVPR19 train | 15cm      | 3                 | CVPR19 test 82.762% mIoU, no rotation average, no sliding window [per class performance](https://pastebin.com/zvt0yLFV) | [download](https://node1.chrischoy.org/data/publications/minknet/Mink16UNet18-synthia-conv1-3.pth) |\n\n\nNote that sliding window style evaluation (cropping and stitching results) used in many related works effectively works as an ensemble (rotation averaging) which boosts the performance.\n\n\n## Demo\n\nThe demo code will download weights and an example scene first and then visualize prediction results.\n\n| Dataset  | Scannet                  | Stanford                  |\n|:--------:|:------------------------:|:-------------------------:|\n| Command  | `python -m demo.scannet` | `python -m demo.stanford` |\n| Result   | ![](imgs/scannet.png)    | ![](imgs/stanford.png)    |\n\n\n## Citing this work\n\nIf you use the Minkowski Engine, please cite:\n\n- [4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks, CVPR'19](https://arxiv.org/abs/1904.08755), [[pdf]](https://arxiv.org/pdf/1904.08755.pdf)\n\n```\n@inproceedings{choy20194d,\n  title={4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks},\n  author={Choy, Christopher and Gwak, JunYoung and Savarese, Silvio},\n  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},\n  pages={3075--3084},\n  year={2019}\n}\n```\n\n## Related projects\n\n- [Minkowski Engine, a neural network library for sparse tensors](https://github.com/StanfordVL/MinkowskiEngine)\n- [Fully Convolutional Geometric Features, ICCV'19, fast and accurate 3D features](https://github.com/chrischoy/FCGF)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchrischoy%2Fspatiotemporalsegmentation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchrischoy%2Fspatiotemporalsegmentation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchrischoy%2Fspatiotemporalsegmentation/lists"}