{"id":15036886,"url":"https://github.com/nvlabs/tiny-cuda-nn","last_synced_at":"2025-05-13T20:14:10.846Z","repository":{"id":37728857,"uuid":"358678977","full_name":"NVlabs/tiny-cuda-nn","owner":"NVlabs","description":"Lightning fast C++/CUDA neural network framework","archived":false,"fork":false,"pushed_at":"2025-04-29T08:32:28.000Z","size":19994,"stargazers_count":4000,"open_issues_count":241,"forks_count":495,"subscribers_count":49,"default_branch":"master","last_synced_at":"2025-05-06T23:52:57.364Z","etag":null,"topics":["cuda","deep-learning","gpu","mlp","nerf","neural-network","pytorch","real-time","rendering"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NVlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-04-16T17:53:10.000Z","updated_at":"2025-05-06T10:28:04.000Z","dependencies_parsed_at":"2023-10-14T17:34:26.531Z","dependency_job_id":"3e00f463-92a8-4b80-a883-8bf21c157b0f","html_url":"https://github.com/NVlabs/tiny-cuda-nn","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2Ftiny-cuda-nn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2Ftiny-cuda-nn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2Ftiny-cuda-nn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVlabs%2Ftiny-cuda-nn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NVlabs","download_url":"https://codeload.github.com/NVlabs/tiny-cuda-nn/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254020631,"owners_count":22000755,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","deep-learning","gpu","mlp","nerf","neural-network","pytorch","real-time","rendering"],"created_at":"2024-09-24T20:32:38.188Z","updated_at":"2025-05-13T20:14:10.826Z","avatar_url":"https://github.com/NVlabs.png","language":"C++","readme":"# Tiny CUDA Neural Networks ![](https://github.com/NVlabs/tiny-cuda-nn/workflows/CI/badge.svg)\n\nThis is a small, self-contained framework for training and querying neural networks. Most notably, it contains a lightning fast [\"fully fused\" multi-layer perceptron](https://raw.githubusercontent.com/NVlabs/tiny-cuda-nn/master/data/readme/fully-fused-mlp-diagram.png) ([technical paper](https://tom94.net/data/publications/mueller21realtime/mueller21realtime.pdf)), a versatile [multiresolution hash encoding](https://raw.githubusercontent.com/NVlabs/tiny-cuda-nn/master/data/readme/multiresolution-hash-encoding-diagram.png) ([technical paper](https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.pdf)), as well as support for various other input encodings, losses, and optimizers.\n\n## Performance\n\n![Image](data/readme/fully-fused-vs-tensorflow.png)\n_Fully fused networks vs. TensorFlow v2.5.0 w/ XLA. Measured on 64 (solid line) and 128 (dashed line) neurons wide multi-layer perceptrons on an RTX 3090. Generated by `benchmarks/bench_ours.cu` and `benchmarks/bench_tensorflow.py` using `data/config_oneblob.json`._\n\n\n## Usage\n\nTiny CUDA neural networks have a simple C++/CUDA API:\n\n```cpp\n#include \u003ctiny-cuda-nn/common.h\u003e\n\n// Configure the model\nnlohmann::json config = {\n\t{\"loss\", {\n\t\t{\"otype\", \"L2\"}\n\t}},\n\t{\"optimizer\", {\n\t\t{\"otype\", \"Adam\"},\n\t\t{\"learning_rate\", 1e-3},\n\t}},\n\t{\"encoding\", {\n\t\t{\"otype\", \"HashGrid\"},\n\t\t{\"n_levels\", 16},\n\t\t{\"n_features_per_level\", 2},\n\t\t{\"log2_hashmap_size\", 19},\n\t\t{\"base_resolution\", 16},\n\t\t{\"per_level_scale\", 2.0},\n\t}},\n\t{\"network\", {\n\t\t{\"otype\", \"FullyFusedMLP\"},\n\t\t{\"activation\", \"ReLU\"},\n\t\t{\"output_activation\", \"None\"},\n\t\t{\"n_neurons\", 64},\n\t\t{\"n_hidden_layers\", 2},\n\t}},\n};\n\nusing namespace tcnn;\n\nauto model = create_from_config(n_input_dims, n_output_dims, config);\n\n// Train the model (batch_size must be a multiple of tcnn::BATCH_SIZE_GRANULARITY)\nGPUMatrix\u003cfloat\u003e training_batch_inputs(n_input_dims, batch_size);\nGPUMatrix\u003cfloat\u003e training_batch_targets(n_output_dims, batch_size);\n\nfor (int i = 0; i \u003c n_training_steps; ++i) {\n\tgenerate_training_batch(\u0026training_batch_inputs, \u0026training_batch_targets); // \u003c-- your code\n\n\tfloat loss;\n\tmodel.trainer-\u003etraining_step(training_batch_inputs, training_batch_targets, \u0026loss);\n\tstd::cout \u003c\u003c \"iteration=\" \u003c\u003c i \u003c\u003c \" loss=\" \u003c\u003c loss \u003c\u003c std::endl;\n}\n\n// Use the model\nGPUMatrix\u003cfloat\u003e inference_inputs(n_input_dims, batch_size);\ngenerate_inputs(\u0026inference_inputs); // \u003c-- your code\n\nGPUMatrix\u003cfloat\u003e inference_outputs(n_output_dims, batch_size);\nmodel.network-\u003einference(inference_inputs, inference_outputs);\n```\n\n\n## Example: learning a 2D image\n\nWe provide a sample application where an image function _(x,y) -\u003e (R,G,B)_ is learned. It can be run via\n```sh\ntiny-cuda-nn$ ./build/mlp_learning_an_image data/images/albert.jpg data/config_hash.json\n```\nproducing an image every couple of training steps. Each 1000 steps should take a bit over 1 second with the default configuration on an RTX 4090.\n\n| 10 steps | 100 steps | 1000 steps | Reference image |\n|:---:|:---:|:---:|:---:|\n| ![10steps](data/readme/10.jpg) | ![100steps](data/readme/100.jpg) | ![1000steps](data/readme/1000.jpg) | ![reference](data/images/albert.jpg) |\n\n\n\n## Requirements\n\n- An __NVIDIA GPU__; tensor cores increase performance when available. All shown results come from an RTX 3090.\n- A __C++14__ capable compiler. The following choices are recommended and have been tested:\n  - __Windows:__ Visual Studio 2019 or 2022\n  - __Linux:__ GCC/G++ 8 or higher\n- A recent version of __[CUDA](https://developer.nvidia.com/cuda-toolkit)__. The following choices are recommended and have been tested:\n  - __Windows:__ CUDA 11.5 or higher\n  - __Linux:__ CUDA 10.2 or higher\n- __[CMake](https://cmake.org/) v3.21 or higher__.\n- The fully fused MLP component of this framework requires a __very large__ amount of shared memory in its default configuration. It will likely only work on an RTX 3090, an RTX 2080 Ti, or higher-end GPUs. Lower end cards must reduce the `n_neurons` parameter or use the `CutlassMLP` (better compatibility but slower) instead.\n\nIf you are using Linux, install the following packages\n```sh\nsudo apt-get install build-essential git\n```\n\nWe also recommend installing [CUDA](https://developer.nvidia.com/cuda-toolkit) in `/usr/local/` and adding the CUDA installation to your PATH.\nFor example, if you have CUDA 12.6.3, add the following to your `~/.bashrc`\n```sh\nexport PATH=\"/usr/local/cuda-12.6.3/bin:$PATH\"\nexport LD_LIBRARY_PATH=\"/usr/local/cuda-12.6.3/lib64:$LD_LIBRARY_PATH\"\n```\n\n\n## Compilation (Windows \u0026 Linux)\n\nBegin by cloning this repository and all its submodules using the following command:\n```sh\n$ git clone --recursive https://github.com/nvlabs/tiny-cuda-nn\n$ cd tiny-cuda-nn\n```\n\nThen, use CMake to build the project: (on Windows, this must be in a [developer command prompt](https://docs.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-160#developer_command_prompt))\n```sh\ntiny-cuda-nn$ cmake . -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo\ntiny-cuda-nn$ cmake --build build --config RelWithDebInfo -j\n```\n\nIf compilation fails inexplicably or takes longer than an hour, you might be running out of memory. Try running the above command without `-j` in that case.\n\n\n## PyTorch extension\n\n__tiny-cuda-nn__ comes with a [PyTorch](https://github.com/pytorch/pytorch) extension that allows using the fast MLPs and input encodings from within a [Python](https://www.python.org/) context.\nThese bindings can be significantly faster than full Python implementations; in particular for the [multiresolution hash encoding](https://raw.githubusercontent.com/NVlabs/tiny-cuda-nn/master/data/readme/multiresolution-hash-encoding-diagram.png).\n\n\u003e The overheads of Python/PyTorch can nonetheless be extensive if the batch size is small.\n\u003e For example, with a batch size of 64k, the bundled `mlp_learning_an_image` example is __~2x slower__ through PyTorch than native CUDA.\n\u003e With a batch size of 256k and higher (default), the performance is much closer.\n\nBegin by setting up a Python 3.X environment with a recent, CUDA-enabled version of PyTorch. Then, invoke\n```sh\npip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch\n```\n\nAlternatively, if you would like to install from a local clone of __tiny-cuda-nn__, invoke\n```sh\ntiny-cuda-nn$ cd bindings/torch\ntiny-cuda-nn/bindings/torch$ python setup.py install\n```\n\nUpon success, you can use __tiny-cuda-nn__ models as in the following example:\n```py\nimport commentjson as json\nimport tinycudann as tcnn\nimport torch\n\nwith open(\"data/config_hash.json\") as f:\n\tconfig = json.load(f)\n\n# Option 1: efficient Encoding+Network combo.\nmodel = tcnn.NetworkWithInputEncoding(\n\tn_input_dims, n_output_dims,\n\tconfig[\"encoding\"], config[\"network\"]\n)\n\n# Option 2: separate modules. Slower but more flexible.\nencoding = tcnn.Encoding(n_input_dims, config[\"encoding\"])\nnetwork = tcnn.Network(encoding.n_output_dims, n_output_dims, config[\"network\"])\nmodel = torch.nn.Sequential(encoding, network)\n```\n\nSee `samples/mlp_learning_an_image_pytorch.py` for an example.\n\n\n\n## Components\n\nFollowing is a summary of the components of this framework. [The JSON documentation](DOCUMENTATION.md) lists configuration options.\n\n\n| Networks | \u0026nbsp; | \u0026nbsp;\n| :--- | :---------- | :-----\n| Fully fused MLP | `src/fully_fused_mlp.cu` | Lightning fast implementation of small multi-layer perceptrons (MLPs).\n| CUTLASS MLP     | `src/cutlass_mlp.cu`     | MLP based on [CUTLASS](https://github.com/NVIDIA/cutlass)' GEMM routines. Slower than fully-fused, but handles larger networks and still is reasonably fast.\n\n| Input encodings | \u0026nbsp; | \u0026nbsp;\n| :--- | :---------- | :-----\n| Composite | `include/tiny-cuda-nn/encodings/composite.h` | Allows composing multiple encodings. Can be, for example, used to assemble the Neural Radiance Caching encoding [[Müller et al. 2021]](https://tom94.net/).\n| Frequency | `include/tiny-cuda-nn/encodings/frequency.h` | NeRF's [[Mildenhall et al. 2020]](https://www.matthewtancik.com/nerf) positional encoding applied equally to all dimensions.\n| Grid | `include/tiny-cuda-nn/encodings/grid.h` | Encoding based on trainable multiresolution grids. Used for [Instant Neural Graphics Primitives [Müller et al. 2022]](https://nvlabs.github.io/instant-ngp/). The grids can be backed by hashtables, dense storage, or tiled storage.\n| Identity | `include/tiny-cuda-nn/encodings/identity.h` | Leaves values untouched.\n| Oneblob | `include/tiny-cuda-nn/encodings/oneblob.h` | From Neural Importance Sampling [[Müller et al. 2019]](https://tom94.net/data/publications/mueller18neural/mueller18neural-v4.pdf) and Neural Control Variates [[Müller et al. 2020]](https://tom94.net/data/publications/mueller20neural/mueller20neural.pdf).\n| SphericalHarmonics | `include/tiny-cuda-nn/encodings/spherical_harmonics.h` | A frequency-space encoding that is more suitable to direction vectors than component-wise ones.\n| TriangleWave | `include/tiny-cuda-nn/encodings/triangle_wave.h` | Low-cost alternative to the NeRF's encoding. Used in Neural Radiance Caching [[Müller et al. 2021]](https://tom94.net/).\n\n| Losses | \u0026nbsp; | \u0026nbsp;\n| :--- | :---------- | :-----\n| L1 | `include/tiny-cuda-nn/losses/l1.h` | Standard L1 loss.\n| Relative L1 | `include/tiny-cuda-nn/losses/l1.h` | Relative L1 loss normalized by the network prediction.\n| MAPE | `include/tiny-cuda-nn/losses/mape.h` | Mean absolute percentage error (MAPE). The same as Relative L1, but normalized by the target.\n| SMAPE | `include/tiny-cuda-nn/losses/smape.h` | Symmetric mean absolute percentage error (SMAPE). The same as Relative L1, but normalized by the mean of the prediction and the target.\n| L2 | `include/tiny-cuda-nn/losses/l2.h` | Standard L2 loss.\n| Relative L2 | `include/tiny-cuda-nn/losses/relative_l2.h` | Relative L2 loss normalized by the network prediction [[Lehtinen et al. 2018]](https://github.com/NVlabs/noise2noise).\n| Relative L2 Luminance | `include/tiny-cuda-nn/losses/relative_l2_luminance.h` | Same as above, but normalized by the luminance of the network prediction. Only applicable when network prediction is RGB. Used in Neural Radiance Caching [[Müller et al. 2021]](https://tom94.net/).\n| Cross Entropy | `include/tiny-cuda-nn/losses/cross_entropy.h` | Standard cross entropy loss. Only applicable when the network prediction is a PDF.\n| Variance | `include/tiny-cuda-nn/losses/variance_is.h` | Standard variance loss. Only applicable when the network prediction is a PDF.\n\n| Optimizers | \u0026nbsp; | \u0026nbsp;\n| :--- | :---------- | :-----\n| Adam | `include/tiny-cuda-nn/optimizers/adam.h` | Implementation of Adam [[Kingma and Ba 2014]](https://arxiv.org/abs/1412.6980), generalized to AdaBound [[Luo et al. 2019]](https://github.com/Luolc/AdaBound).\n| Novograd | `include/tiny-cuda-nn/optimizers/lookahead.h` | Implementation of Novograd [[Ginsburg et al. 2019]](https://arxiv.org/abs/1905.11286).\n| SGD | `include/tiny-cuda-nn/optimizers/sgd.h` | Standard stochastic gradient descent (SGD).\n| Shampoo | `include/tiny-cuda-nn/optimizers/shampoo.h` | Implementation of the 2nd order Shampoo optimizer [[Gupta et al. 2018]](https://arxiv.org/abs/1802.09568) with home-grown optimizations as well as those by [Anil et al. [2020]](https://arxiv.org/abs/2002.09018).\n| Average | `include/tiny-cuda-nn/optimizers/average.h` | Wraps another optimizer and computes a linear average of the weights over the last N iterations. The average is used for inference only (does not feed back into training).\n| Batched | `include/tiny-cuda-nn/optimizers/batched.h` | Wraps another optimizer, invoking the nested optimizer once every N steps on the averaged gradient. Has the same effect as increasing the batch size but requires only a constant amount of memory. |\n| Composite | `include/tiny-cuda-nn/optimizers/composite.h` | Allows using several optimizers on different parameters.\n| EMA | `include/tiny-cuda-nn/optimizers/average.h` | Wraps another optimizer and computes an exponential moving average of the weights. The average is used for inference only (does not feed back into training).\n| Exponential Decay | `include/tiny-cuda-nn/optimizers/exponential_decay.h` | Wraps another optimizer and performs piecewise-constant exponential learning-rate decay.\n| Lookahead | `include/tiny-cuda-nn/optimizers/lookahead.h` | Wraps another optimizer, implementing the lookahead algorithm [[Zhang et al. 2019]](https://arxiv.org/abs/1907.08610).\n\n\n## License and Citation\n\nThis framework is licensed under the BSD 3-clause license. Please see `LICENSE.txt` for details.\n\nIf you use it in your research, we would appreciate a citation via\n```bibtex\n@software{tiny-cuda-nn,\n\tauthor = {M\\\"uller, Thomas},\n\tlicense = {BSD-3-Clause},\n\tmonth = {4},\n\ttitle = {{tiny-cuda-nn}},\n\turl = {https://github.com/NVlabs/tiny-cuda-nn},\n\tversion = {1.7},\n\tyear = {2021}\n}\n```\n\nFor business inquiries, please visit our website and submit the form: [NVIDIA Research Licensing](https://www.nvidia.com/en-us/research/inquiries/)\n\n\n## Publications \u0026 Software\n\nAmong others, this framework powers the following publications:\n\n\u003e __Instant Neural Graphics Primitives with a Multiresolution Hash Encoding__  \n\u003e [Thomas Müller](https://tom94.net), [Alex Evans](https://research.nvidia.com/person/alex-evans), [Christoph Schied](https://research.nvidia.com/person/christoph-schied), [Alexander Keller](https://research.nvidia.com/person/alex-keller)  \n\u003e _ACM Transactions on Graphics (__SIGGRAPH__), July 2022_  \n\u003e __[Website](https://nvlabs.github.io/instant-ngp/)\u0026nbsp;/ [Paper](https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.pdf)\u0026nbsp;/ [Code](https://github.com/NVlabs/instant-ngp)\u0026nbsp;/ [Video](https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.mp4)\u0026nbsp;/ [BibTeX](https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.bib)__\n\n\u003e __Extracting Triangular 3D Models, Materials, and Lighting From Images__  \n\u003e [Jacob Munkberg](https://research.nvidia.com/person/jacob-munkberg), [Jon Hasselgren](https://research.nvidia.com/person/jon-hasselgren), [Tianchang Shen](http://www.cs.toronto.edu/~shenti11/), [Jun Gao](http://www.cs.toronto.edu/~jungao/), [Wenzheng Chen](http://www.cs.toronto.edu/~wenzheng/), [Alex Evans](https://research.nvidia.com/person/alex-evans), [Thomas Müller](https://tom94.net), [Sanja Fidler](https://www.cs.toronto.edu/~fidler/)  \n\u003e __CVPR (Oral)__, June 2022  \n\u003e __[Website](https://nvlabs.github.io/nvdiffrec/)\u0026nbsp;/ [Paper](https://nvlabs.github.io/nvdiffrec/assets/paper.pdf)\u0026nbsp;/ [Video](https://nvlabs.github.io/nvdiffrec/assets/video.mp4)\u0026nbsp;/ [BibTeX](https://nvlabs.github.io/nvdiffrec/assets/bib.txt)__\n\n\u003e __Real-time Neural Radiance Caching for Path Tracing__  \n\u003e [Thomas Müller](https://tom94.net), [Fabrice Rousselle](https://research.nvidia.com/person/fabrice-rousselle), [Jan Novák](http://jannovak.info), [Alexander Keller](https://research.nvidia.com/person/alex-keller)  \n\u003e _ACM Transactions on Graphics (__SIGGRAPH__), August 2021_  \n\u003e __[Paper](https://tom94.net/data/publications/mueller21realtime/mueller21realtime.pdf)\u0026nbsp;/ [GTC talk](https://gtc21.event.nvidia.com/media/Fully%20Fused%20Neural%20Network%20for%20Radiance%20Caching%20in%20Real%20Time%20Rendering%20%5BE31307%5D/1_liqy6k1c)\u0026nbsp;/ [Video](https://tom94.net/data/publications/mueller21realtime/mueller21realtime.mp4)\u0026nbsp;/ [Interactive results viewer](https://tom94.net/data/publications/mueller21realtime/interactive-viewer/)\u0026nbsp;/ [BibTeX](https://tom94.net/data/publications/mueller21realtime/mueller21realtime.bib)__\n\n\nAs well as the following software:\n\n\u003e __NerfAcc: A General NeRF Accleration Toolbox__  \n\u003e [Ruilong Li](https://www.liruilong.cn/), [Matthew Tancik](https://www.matthewtancik.com/about-me), [Angjoo Kanazawa](https://people.eecs.berkeley.edu/~kanazawa/)  \n\u003e __https://github.com/KAIR-BAIR/nerfacc__\n\n\u003e __Nerfstudio: A Framework for Neural Radiance Field Development__  \n\u003e [Matthew Tancik*](https://www.matthewtancik.com/about-me), [Ethan Weber*](https://ethanweber.me/), [Evonne Ng*](http://people.eecs.berkeley.edu/~evonne_ng/), [Ruilong Li](https://www.liruilong.cn/), Brent Yi, Terrance Wang, Alexander Kristoffersen, Jake Austin, Kamyar Salahi, Abhik Ahuja, David McAllister, [Angjoo Kanazawa](https://people.eecs.berkeley.edu/~kanazawa/)  \n\u003e __https://github.com/nerfstudio-project/nerfstudio__\n\nPlease feel free to make a pull request if your publication or software is not listed.\n\n## Acknowledgments\n\nSpecial thanks go to the NRC authors for helpful discussions and to [Nikolaus Binder](https://research.nvidia.com/person/nikolaus-binder) for providing part of the infrastructure of this framework, as well as for help with utilizing TensorCores from within CUDA.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvlabs%2Ftiny-cuda-nn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnvlabs%2Ftiny-cuda-nn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvlabs%2Ftiny-cuda-nn/lists"}