{"id":13418605,"url":"https://github.com/facebookresearch/SparseConvNet","last_synced_at":"2025-03-15T03:31:38.477Z","repository":{"id":37318486,"uuid":"96553587","full_name":"facebookresearch/SparseConvNet","owner":"facebookresearch","description":"Submanifold sparse convolutional networks","archived":true,"fork":false,"pushed_at":"2024-01-09T01:47:27.000Z","size":918,"stargazers_count":2098,"open_issues_count":56,"forks_count":333,"subscribers_count":43,"default_branch":"main","last_synced_at":"2025-03-09T23:42:30.965Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://github.com/facebookresearch/SparseConvNet","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facebookresearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2017-07-07T15:48:30.000Z","updated_at":"2025-03-07T02:57:02.000Z","dependencies_parsed_at":"2024-02-07T13:01:22.761Z","dependency_job_id":"40882f1b-7a69-4658-8462-2f8f1f80cfdc","html_url":"https://github.com/facebookresearch/SparseConvNet","commit_stats":{"total_commits":136,"total_committers":12,"mean_commits":"11.333333333333334","dds":0.2647058823529411,"last_synced_commit":"cf251d058959a9dbaccb25fc919dc4f4548be232"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FSparseConvNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FSparseConvNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FSparseConvNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FSparseConvNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facebookresearch","download_url":"https://codeload.github.com/facebookresearch/SparseConvNet/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243681024,"owners_count":20330152,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T22:01:04.386Z","updated_at":"2025-03-15T03:31:33.467Z","avatar_url":"https://github.com/facebookresearch.png","language":"C++","readme":"# Submanifold Sparse Convolutional Networks\n\n[![Support Ukraine](https://img.shields.io/badge/Support-Ukraine-FFD500?style=flat\u0026labelColor=005BBB)](https://opensource.fb.com/support-ukraine)\n\nThis is the PyTorch library for training Submanifold Sparse Convolutional Networks.\n\n## Spatial sparsity\n\nThis library brings [Spatially-sparse convolutional networks](https://github.com/btgraham/SparseConvNet) to PyTorch. Moreover, it introduces **Submanifold Sparse Convolutions**, that can be used to build computationally efficient sparse VGG/ResNet/DenseNet-style networks.\n\nWith regular 3x3 convolutions, the set of active (non-zero) sites grows rapidly:\u003cbr /\u003e\n![submanifold](img/i.gif) \u003cbr /\u003e\nWith **Submanifold Sparse Convolutions**, the set of active sites is unchanged. Active sites look at their active neighbors (green); non-active sites (red) have no computational overhead: \u003cbr /\u003e\n![submanifold](img/img.gif) \u003cbr /\u003e\nStacking Submanifold Sparse Convolutions to build VGG and ResNet type ConvNets, information can flow along lines or surfaces of active points.\u003cbr /\u003e\n\nDisconnected components don't communicate at first, although they will merge due to the effect of strided operations, either pooling or convolutions. Additionally, adding ConvolutionWithStride2-SubmanifoldConvolution-DeconvolutionWithStride2 paths to the network allows disjoint active sites to communicate; see the 'VGG+' networks in the paper.\u003cbr /\u003e\n![Strided Convolution, convolution, deconvolution](img/img_stridedConv_conv_deconv.gif) \u003cbr /\u003e\n![Strided Convolution, convolution, deconvolution](img/img_stridedConv_conv_deconv.png) \u003cbr /\u003e\nFrom left: **(i)** an active point is highlighted; a convolution with stride 2 sees the green active sites **(ii)** and produces output **(iii)**, 'children' of hightlighted active point from (i) are highlighted; a submanifold sparse convolution sees the green active sites **(iv)** and produces output **(v)**; a deconvolution operation sees the green active sites **(vi)**  and produces output **(vii)**.\n\n## Dimensionality and 'submanifolds'\n\nSparseConvNet supports input with different numbers of spatial/temporal dimensions.\nHigher dimensional input is more likely to be sparse because of the 'curse of dimensionality'. \u003cbr /\u003e\n\n  Dimension|Name in 'torch.nn'|Use cases\n  :--:|:--:|:--:\n  1|Conv1d| Text, audio\n  2|Conv2d|Lines in 2D space, e.g. handwriting\n  3|Conv3d|Lines and surfaces in 3D space or (2+1)D space-time\n  4| - |Lines, etc,  in (3+1)D space-time\n\nWe use the term 'submanifold' to refer to input data that is sparse because it has a lower effective dimension than the space in which it lives, for example a one-dimensional curve in 2+ dimensional space, or a two-dimensional surface in 3+ dimensional space.\n\nIn theory, the library supports up to 10 dimensions. In practice, ConvNets with size-3 SVC convolutions in dimension 5+ may be impractical as the number of parameters per convolution is growing exponentially. Possible solutions include factorizing the convolutions (e.g. 3x1x1x..., 1x3x1x..., etc), or switching to a hyper-tetrahedral lattice (see [Sparse 3D convolutional neural networks](http://arxiv.org/abs/1505.02890)).\n\n\n\n\n\n## Hello World\nSparseConvNets can be built either by [defining a function that inherits from torch.nn.Module](examples/Assamese_handwriting/VGGplus.py) or by stacking modules in a [sparseconvnet.Sequential](PyTorch/sparseconvnet/sequential.py):\n```\nimport torch\nimport sparseconvnet as scn\n\n# Use the GPU if there is one, otherwise CPU\ndevice = 'cuda:0' if torch.cuda.is_available() else 'cpu'\n\nmodel = scn.Sequential().add(\n    scn.SparseVggNet(2, 1,\n                     [['C', 8], ['C', 8], ['MP', 3, 2],\n                      ['C', 16], ['C', 16], ['MP', 3, 2],\n                      ['C', 24], ['C', 24], ['MP', 3, 2]])\n).add(\n    scn.SubmanifoldConvolution(2, 24, 32, 3, False)\n).add(\n    scn.BatchNormReLU(32)\n).add(\n    scn.SparseToDense(2, 32)\n).to(device)\n\n# output will be 10x10\ninputSpatialSize = model.input_spatial_size(torch.LongTensor([10, 10]))\ninput_layer = scn.InputLayer(2, inputSpatialSize)\n\nmsgs = [[\" X   X  XXX  X    X    XX     X       X   XX   XXX   X    XXX   \",\n         \" X   X  X    X    X   X  X    X       X  X  X  X  X  X    X  X  \",\n         \" XXXXX  XX   X    X   X  X    X   X   X  X  X  XXX   X    X   X \",\n         \" X   X  X    X    X   X  X     X X X X   X  X  X  X  X    X  X  \",\n         \" X   X  XXX  XXX  XXX  XX       X   X     XX   X  X  XXX  XXX   \"],\n\n        [\" XXX              XXXXX      x   x     x  xxxxx  xxx \",\n         \" X  X  X   XXX       X       x   x x   x  x     x  x \",\n         \" XXX                X        x   xxxx  x  xxxx   xxx \",\n         \" X     X   XXX       X       x     x   x      x    x \",\n         \" X     X          XXXX   x   x     x   x  xxxx     x \",]]\n\n\n# Create Nx3 and Nx1 vectors to encode the messages above:\nlocations = []\nfeatures = []\nfor batchIdx, msg in enumerate(msgs):\n    for y, line in enumerate(msg):\n        for x, c in enumerate(line):\n            if c == 'X':\n                locations.append([y, x, batchIdx])\n                features.append([1])\nlocations = torch.LongTensor(locations)\nfeatures = torch.FloatTensor(features).to(device)\n\ninput = input_layer([locations,features])\nprint('Input SparseConvNetTensor:', input)\noutput = model(input)\n\n# Output is 2x32x10x10: our minibatch has 2 samples, the network has 32 output\n# feature planes, and 10x10 is the spatial size of the output.\nprint('Output SparseConvNetTensor:', output)\n```\n\n\n## Examples\n\nExamples in the examples folder include\n* [Assamese handwriting recognition](https://archive.ics.uci.edu/ml/datasets/Online+Handwritten+Assamese+Characters+Dataset#)\n* [Chinese handwriting for recognition](http://www.nlpr.ia.ac.cn/databases/handwriting/Online_database.html)\n* [3D Segmentation](https://shapenet.cs.stanford.edu/iccv17/) using ShapeNet Core-55\n* [ScanNet](http://www.scan-net.org/) 3D Semantic label benchmark\n\nFor example:\n```\ncd examples/Assamese_handwriting\npython VGGplus.py\n```\n\n## Setup\n\nTested with PyTorch 1.3, CUDA 10.0, and Python 3.3 with [Conda](https://www.anaconda.com/).\n\n```\nconda install pytorch torchvision cudatoolkit=10.0 -c pytorch # See https://pytorch.org/get-started/locally/\ngit clone git@github.com:facebookresearch/SparseConvNet.git\ncd SparseConvNet/\nbash develop.sh\n```\nTo run the examples you may also need to install unrar:\n```\napt-get install unrar\n```\n\n## License\nSparseConvNet is BSD licensed, as found in the LICENSE file. [Terms of use](https://opensource.facebook.com/legal/terms). [Privacy](https://opensource.facebook.com/legal/privacy)\n\nCopyright © Meta Platforms, Inc\n\n## Links\n1. [ICDAR 2013 Chinese Handwriting Recognition Competition 2013](https://web.archive.org/web/20160418143451/http://www.nlpr.ia.ac.cn/events/CHRcompetition2013/competition/Home.html) First place in task 3, with test error of 2.61%. Human performance on the test set was 4.81%. [Report](https://web.archive.org/web/20160910012723/http://www.nlpr.ia.ac.cn/events/CHRcompetition2013/competition/ICDAR%202013%20CHR%20competition.pdf)\n2. [Spatially-sparse convolutional neural networks, 2014](http://arxiv.org/abs/1409.6070) SparseConvNets for Chinese handwriting recognition\n3. [Fractional max-pooling, 2014](http://arxiv.org/abs/1412.6071) A SparseConvNet with fractional max-pooling achieves an error rate of 3.47% for CIFAR-10.\n4. [Sparse 3D convolutional neural networks, BMVC 2015](http://arxiv.org/abs/1505.02890) SparseConvNets for 3D object recognition and (2+1)D video action recognition.\n5. [Kaggle plankton recognition competition, 2015](https://www.kaggle.com/c/datasciencebowl) Third place. The competition solution is being adapted for research purposes in [EcoTaxa](http://ecotaxa.obs-vlfr.fr/).\n6. [Kaggle Diabetic Retinopathy Detection, 2015](https://www.kaggle.com/c/diabetic-retinopathy-detection/) First place in the Kaggle Diabetic Retinopathy Detection competition.\n7. [SparseConvNet 'classic'](https://github.com/btgraham/SparseConvNet-archived) version\n8. [Submanifold Sparse Convolutional Networks, 2017](https://arxiv.org/abs/1706.01307) Introduces deep 'submanifold' SparseConvNets.\n9. [Workshop on Learning to See from 3D Data, 2017](https://shapenet.cs.stanford.edu/iccv17workshop/) First place in the [semantic segmentation](https://shapenet.cs.stanford.edu/iccv17/) competition. [Report](https://arxiv.org/pdf/1710.06104)\n10. [3D Semantic Segmentation with Submanifold Sparse Convolutional Networks, 2017](https://arxiv.org/abs/1711.10275) Semantic segmentation for the ShapeNet Core55 and NYU-DepthV2 datasets, CVPR 2018\n11. [Unsupervised learning with sparse space-and-time autoencoders](https://arxiv.org/abs/1811.10355) (3+1)D space-time autoencoders\n12. [ScanNet 3D semantic label benchmark 2018](http://kaldir.vc.in.tum.de/scannet_benchmark/semantic_label_3d) 0.726 average IOU for 3D semantic segmentation.\n13. [MinkowskiEngine](https://github.com/StanfordVL/MinkowskiEngine) is an alternative implementation of SparseConvNet; [0.736 average IOU for ScanNet]( https://github.com/chrischoy/SpatioTemporalSegmentation).\n14. [SpConv: PyTorch Spatially Sparse Convolution Library](https://github.com/traveller59/spconv) is an alternative implementation of SparseConvNet.\n15. [Live Semantic 3D Perception for Immersive Augmented Reality](https://ieeexplore.ieee.org/document/8998140) describes a way to optimize memory access for SparseConvNet.\n16. [OccuSeg](https://arxiv.org/abs/2003.06537) real-time object detection using SparseConvNets.\n17. [TorchSparse](https://github.com/mit-han-lab/torchsparse) implements 3D submanifold convolutions.\n18. [TensorFlow 3D](https://github.com/google-research/google-research/tree/master/tf3d) implements submanifold convolutions.\n19. [VoTr](https://github.com/PointsCoder/VOTR) implements submanifold [voxel transformers](https://openaccess.thecvf.com/content/ICCV2021/papers/Mao_Voxel_Transformer_for_3D_Object_Detection_ICCV_2021_paper.pdf) using [SpConv](https://github.com/traveller59/spconv).\n20. [Mix3D](https://github.com/kumuji/mix3d) brings [MixUp](https://openreview.net/forum?id=r1Ddp1-Rb) to the sparse setting\u0026mdash; 0.781 average IOU for ScanNet 3D semantic segmentation.\n21. [Point Transformer V3](https://arxiv.org/abs/2312.10035) uses sparse convolutions as an enhanced conditional positional encoding (xCPE); 0.794 average IOU for ScanNet 3D semantic segmentation.\n\n## Citations\n\nIf you find this code useful in your research then please cite:\n\n**[3D Semantic Segmentation with Submanifold Sparse Convolutional Networks, CVPR 2018](https://arxiv.org/abs/1711.10275)** \u003cbr /\u003e\n[Benjamin Graham](https://research.fb.com/people/graham-benjamin/), \u003cbr /\u003e\n[Martin Engelcke](http://ori.ox.ac.uk/mrg_people/martin-engelcke/), \u003cbr /\u003e\n[Laurens van der Maaten](https://lvdmaaten.github.io/), \u003cbr /\u003e\n\n```\n@article{3DSemanticSegmentationWithSubmanifoldSparseConvNet,\n  title={3D Semantic Segmentation with Submanifold Sparse Convolutional Networks},\n  author={Graham, Benjamin and Engelcke, Martin and van der Maaten, Laurens},\n  journal={CVPR},\n  year={2018}\n}\n```\n\nand/or\n\n**[Submanifold Sparse Convolutional Networks, https://arxiv.org/abs/1706.01307](https://arxiv.org/abs/1706.01307)** \u003cbr /\u003e\n[Benjamin Graham](https://research.fb.com/people/graham-benjamin/), \u003cbr /\u003e\n[Laurens van der Maaten](https://lvdmaaten.github.io/), \u003cbr /\u003e\n\n```\n@article{SubmanifoldSparseConvNet,\n  title={Submanifold Sparse Convolutional Networks},\n  author={Graham, Benjamin and van der Maaten, Laurens},\n  journal={arXiv preprint arXiv:1706.01307},\n  year={2017}\n}\n```\n","funding_links":[],"categories":["TODO scan for Android support in followings","C++","Pytorch \u0026 related libraries｜Pytorch \u0026 相关库","Pytorch \u0026 related libraries"],"sub_categories":["CV｜计算机视觉:","CV:"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2FSparseConvNet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacebookresearch%2FSparseConvNet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2FSparseConvNet/lists"}