{"id":21902525,"url":"https://github.com/dendenxu/fast-gaussian-rasterization","last_synced_at":"2025-04-10T01:10:25.424Z","repository":{"id":232371230,"uuid":"784043644","full_name":"dendenxu/fast-gaussian-rasterization","owner":"dendenxu","description":"A geometry-shader-based, global CUDA sorted high-performance 3D Gaussian Splatting rasterizer. Can achieve a 5-10x speedup in rendering compared to the vanialla diff-gaussian-rasterization.","archived":false,"fork":false,"pushed_at":"2024-04-13T11:48:47.000Z","size":147,"stargazers_count":130,"open_issues_count":2,"forks_count":0,"subscribers_count":7,"default_branch":"main","last_synced_at":"2024-04-14T03:04:09.895Z","etag":null,"topics":["3dgs","4dgs","nerf","rasterization","shaders"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dendenxu.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"license","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-04-09T04:29:40.000Z","updated_at":"2024-08-07T09:20:35.717Z","dependencies_parsed_at":"2024-04-18T07:31:27.571Z","dependency_job_id":"30a1079b-73dc-4e6c-9eb7-a8b3c07cd729","html_url":"https://github.com/dendenxu/fast-gaussian-rasterization","commit_stats":null,"previous_names":["dendenxu/fast-gaussian-splatting","dendenxu/fast-gaussian-rasterization"],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dendenxu%2Ffast-gaussian-rasterization","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dendenxu%2Ffast-gaussian-rasterization/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dendenxu%2Ffast-gaussian-rasterization/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dendenxu%2Ffast-gaussian-rasterization/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dendenxu","download_url":"https://codeload.github.com/dendenxu/fast-gaussian-rasterization/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248137886,"owners_count":21053775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3dgs","4dgs","nerf","rasterization","shaders"],"created_at":"2024-11-28T15:19:30.848Z","updated_at":"2025-04-10T01:10:25.407Z","avatar_url":"https://github.com/dendenxu.png","language":"Python","funding_links":[],"categories":["Python","3D视觉生成重建","Other Resources"],"sub_categories":["资源传输下载"],"readme":"# Fast Gaussian Rasterization\n\n- **Can be 5-10x faster than the original software CUDA rasterizer ([diff-gaussian-rasterization](https://github.com/graphdeco-inria/diff-gaussian-rasterization)).**\n- **Can be 2-3x faster if using offline rendering. (Bottleneck: copying rendered images around, thinking about improvements.)**\n- **Speedup most visible with high pixel-to-point ratio (large Gaussians, small point count, high-res rendering).**\n\nhttps://github.com/dendenxu/fast-gaussian-splatting/assets/43734697/f50afd6f-bbd5-4e18-aca6-a7356a5d3f75\n\nNo backward pass is supported yet. \nWill think of ways to add a backward. \nDepth-peeling ([4K4D](https://zju3dv.github.io/4k4d)) is too slow.\nDiscussion welcomed.\n\n## Installation\n\nInstall the latest release from PyPI:\n\n```shell\npip install fast_gauss\n```\n\nOr the latest commit from GitHub:\n\n```shell\npip install git+https://github.com/dendenxu/fast-gaussian-rasterization\n```\n\nNo CUDA compilation is required to build `fast_gauss` since we're only shader-based for now.\n\n## Usage\n\nReplace the original import of `diff_gaussian_rasterization` with `fast_gauss`.\n\nFor example, replace this:\n\n```python\nfrom diff_gaussian_rasterization import GaussianRasterizationSettings, GaussianRasterizer\n```\n\nwith this:\n\n```python\nfrom fast_gauss import GaussianRasterizationSettings, GaussianRasterizer\n```\n\nAnd you're good to go.\n\n## Tips\n\n**Note: for the ultimate 5-10x performance increase, you'll need to let `fast_gauss`'s shader directly write to your desired framebuffer.**\n\nCurrently, we are trying to automatically detect whether you're managing your own OpenGL context (i.e. opening up a GUI) by checking for the module `OpenGL` during the import of `fast_gauss`.\nIf detected, all rendering commands will return `None`s and we will directly write to the bound framebuffer at the time of the draw call.\nThus if you're running in a GUI (OpenGL-based) environment, the output of our rasterizer will be `None`s and does not require further processing.\n\n- [ ] TODO: Improve offline rendering performance.\n- [ ] TODO: Add a warning to the user if they're performing further processing on the returned values.\n\n**Note: the speedup is the most visible when the pixel-to-point ratio is high.**\n\nThat is, when there are large Gaussians and very high-resolution rendering, the speedup is more visible.\nThe CUDA-based software implementation is more resolution sensitive and for some extremely dense point clouds (\u003e 1 million points), the CUDA implementation might be faster.\nThis is because the typical rasterization-based pipeline on modern graphics hardware is [not well-optimized for small triangles](https://www.youtube.com/watch?v=hf27qsQPRLQ\u0026list=WL).\n\n**Note: for best performance, cache the persistent results (for example, the 6 elements of the covariance matrix).**\n\nThis is more of a general tip and not directly related to `fast_gauss`.\nHowever, the impact is more observable here since we haven't implemented a fast 3D covariance computation (from scales and rotations) in the shader yet.\nOnly PyTorch implementation is available for now.\n\nWhen the point count increases, even the smallest `precomputation` can help.\nAn example is the concatenation of the base 0-degree SH parameter and the rest, that small maneuver might cost us 10ms on a 3060 with 5 million points.\nThus, store the concatenated tensors instead and avoid concatenating them in every frame.\n\n- [ ] TODO: Implement SH eval in the vertex shader.\n- [ ] TODO: Warn users if they're not properly precomputing the covariance matrix.\n- [ ] TODO: Implement a more optimized `OptimizedGaussians` for precomputing things and apply a cache. Similar to that of the vertex shader (see [Invokation frequency](https://www.khronos.org/opengl/wiki/Vertex_Shader)).\n\n**Note: it's recommended to pass in a CPU tensor in the `GaussianRasterizationSettings` to avoid explicit synchronizations for even better performance.**\n\n- [ ] TODO: Add a warning to the user if GPU tensors are detected.\n\n**Note: the second output of the `GaussianRasterizer` is not radii anymore (since we're not gonna use it for the backward pass), but the alpha values of the rendered image instead.**\n\nAnd the alpha channel content seems to be bugged currently, will debug.\n\n- [ ] TODO: Debug alpha channel values\n\n## TODOs\n\n- [ ] TODO: Apply more of the optimization techniques used by similar shaders, including packing the data into a texture and bit reduction during computation.\n- [ ] TODO: Thinks of ways for a backward pass. Welcome to discuss!\n- [ ] TODO: Compute covariance from scaling and rotation in the shader, currently it's on the CUDA (PyTorch) side.\n- [ ] TODO: Compute SH in the shader, currently it's on the CUDA (PyTorch) side.\n- [ ] TODO: Try to align the rendering results at the pixel level, small deviation exists currently.\n- [ ] TODO: Use indexed draw calls to minimize data passing and shuffling.\n- [ ] TODO: Do incremental sorting based on viewport change, currently it's a full resort on with CUDA (PyTorch).\n\n## Implementation\n\n**Guidelines**\n\n- Let the professionals do the work.\n  - Let GPU do the large-scale sorting.\n  - Let the graphics pipeline do the rasterization for us, not the other way around.\n  - Let OpenGL directly write to your framebuffer.\n- Minimize repeated work.\n  - Compute the 3D to 2D covariance projection only once for each Gaussian, instead of 4 times for the quad, enabled by the geometry shader.\n- Minimize stalls (minimize explicit synchronizations between GPU and CPU).\n  - Enabled by using `non_blocking=True` data passing and moving sync points to as early as possible.\n  - Boosted by the fact that we're sorting on the GPU, thus no need to perform synchronized host-to-device copies.\n\n**Why does a global sort work?**\n\nThe OpenGL specification is somewhat vague but there's this reference:\n(in the 4th paragraph of section 2.1 of chapter 2 of this specification: https://registry.khronos.org/OpenGL/specs/gl/glspec44.core.pdf)\n\n\u003e Commands are always processed in the order in which they are received, although there may be an indeterminate delay before the effects of a command are realized. This means, for example, that one primitive must be drawn completely before any subsequent one can affect the framebuffer.\n\nThus if the order of the data in the vertex buffer (or as specified by an index buffer) is back-to-front, and alpha blending is enabled, you can count on OpenGL to correctly update the framebuffer in the correct back to front order.\n\n- [ ] TODO: Expand implementation details.\n\n## Environment\n\nThis project requires you to have an NVIDIA GPU with the ability to interop between CUDA and OpenGL.\nThus, WSL is [not supported](https://docs.nvidia.com/cuda/wsl-user-guide/index.html#features-not-yet-supported) and OSX (MacOS) is not supported.\nTested on Linux and Windows.\n\nFor offline rendering (the drop-in replacement of the original CUDA rasterizer), we also need a valid EGL environment.\nIt can sometimes be hard to set up for virtualized machines. [Potential fix](https://github.com/zju3dv/4K4D/issues/27#issuecomment-2026747401).\n\n## Credits\n\nInspired by those insanely fast WebGL-based 3DGS viewers:\n\n- [GaussianSplats3D](https://github.com/mkkellogg/GaussianSplats3D) for inspiring our vertex-geometry-fragment shader pipeline.\n- [gsplat.tech](https://gsplat.tech/).\n- [splat](https://github.com/antimatter15/splat).\n\nUsing the algorithm and improvements from:\n\n- [diff-gaussian-rasterization](https://github.com/graphdeco-inria/diff-gaussian-rasterization) for the main Gaussian Splatting algorithm.\n- [diff_gauss](https://github.com/dendenxu/diff-gaussian-rasterization) for the fixed culling.\n\nCUDA-GL interop \u0026 EGL environment inspired by:\n\n- [4K4D](https://zju3dv.github.io/4k4d) where they(I) used the interop for depth-peeling.\n- [EasyVolcap](https://github.com/zju3dv/EasyVolcap) for the collection of utilities, including EGL setup.\n- [nvdiffrast](https://nvlabs.github.io/nvdiffrast) for their EGL context setup and CUDA-GL interop setup.\n\n## Citation\n\n```bibtex\n@misc{fast_gauss,  \n    title = {Fast Gaussian Rasterization},\n    howpublished = {GitHub},  \n    year = {2024},\n    url = {https://github.com/dendenxu/fast-gaussian-rasterization}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdendenxu%2Ffast-gaussian-rasterization","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdendenxu%2Ffast-gaussian-rasterization","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdendenxu%2Ffast-gaussian-rasterization/lists"}