{"id":19817181,"url":"https://github.com/nvpro-samples/gl_dynamic_lod","last_synced_at":"2025-08-15T00:35:21.598Z","repository":{"id":30071082,"uuid":"33620571","full_name":"nvpro-samples/gl_dynamic_lod","owner":"nvpro-samples","description":"GPU classifies how to render millions of particles","archived":false,"fork":false,"pushed_at":"2024-01-17T16:43:55.000Z","size":2611,"stargazers_count":68,"open_issues_count":0,"forks_count":12,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-01-18T00:35:02.562Z","etag":null,"topics":["lod","nvidia","opengl"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nvpro-samples.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2015-04-08T17:08:19.000Z","updated_at":"2024-01-16T15:44:22.000Z","dependencies_parsed_at":"2024-01-17T18:30:25.414Z","dependency_job_id":"6eafaae3-e6e9-427b-90e2-27b866027330","html_url":"https://github.com/nvpro-samples/gl_dynamic_lod","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nvpro-samples%2Fgl_dynamic_lod","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nvpro-samples%2Fgl_dynamic_lod/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nvpro-samples%2Fgl_dynamic_lod/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nvpro-samples%2Fgl_dynamic_lod/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nvpro-samples","download_url":"https://codeload.github.com/nvpro-samples/gl_dynamic_lod/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224253415,"owners_count":17280934,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["lod","nvidia","opengl"],"created_at":"2024-11-12T10:11:56.002Z","updated_at":"2024-11-12T10:11:56.074Z","avatar_url":"https://github.com/nvpro-samples.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# gl dynamic lod\n\nWith the addition of indirect rendering (```ARB_draw_indirect``` and ```ARB_multi_draw_indirect```), OpenGL got an efficient mechanism that allows the GPU to create or modify its own work without stalling the pipeline. As the CPU and GPU are best used when working asynchronously, avoiding readbacks to CPU to drive decision making is beneficial.\n\nIn this sample we use ```ARB_draw_indirect``` and ```ARB_shader_atomic_counters``` to build three distinct render lists for drawing particles as spheres, each using a different shader and representing a different level of detail (LOD):\n\n* Draw as point\n* Draw as instanced low resolution mesh\n* Draw as instanced adaptively tessellated mesh\n\n![sample screenshot](https://github.com/nvpro-samples/gl_dynamic_lod/blob/master/doc/sample.jpg)\n\nThis allows us to limit the total amount of geometry being rasterized, and still benefit from high geometric quality where needed.\n\n![sample screenshot](https://github.com/nvpro-samples/gl_dynamic_lod/blob/master/doc/wireframe.jpg)\n\nThe frame timeline is therefore split into two parts:\n\n1. **LOD Classification**:\n - Each particle is put in one of the appropriate lists using global atomics based on projected size in the viewport. Frustum-culling is also applied in advance.\n - A single shader invocation manipulates the DrawIndirect commands based on the atomic counter values. This step is required as the sample uses an alternative way to classic instancing.\n2. **Rendering**:\n - Every list is drawn by one or two ```glDrawElementsIndirect``` calls to render the particles. \n - Instancing is done via batching in two steps (see later).\n\n``` cpp\nstruct DrawElementsIndirect {\n  uint  elementCount;   // modified at runtime\n  uint  instanceCount;  // modified at runtime\n  uint  first;          // 0\n  uint  baseVertex;     // 0\n  uint  baseInstance;   // 0\n};\n```\n\n#### Batched low complexity mesh instancing\n\nWhen instancing meshes that have only very few triangles, the classic way of using the graphics API's instance counter may not be the most efficient for the hardware. We use batching to improve performance. Instead of drawing all particles at once, we draw them in two steps, which depends on how much we want to draw overall (*listSize*):\n\n 1. **elementCount** = batchSize * meshSize; **instanceCount** = *listSize* / batchSize;\n 2. **elementCount** = (*listSize* % batchSize) * meshSize; **instanceCount** = 1;\n\nWe first draw _batchSize_ meshes via classic instancing, and then whatever is left.\n\nThe instanced mesh is replicated *batchSize* times in the source VBO/IBO, instead of storing it only once. That way each per-instance hardware drawcall does more work, which helps leverage GPU parallelism. The memory cost of this can typically be neglected, as we specifically target low-complexity meshes with just a few triangles \u0026 vertices; if we had a lot of triangles per-mesh, then classic instancing would do the trick.\n\nWith classic instancing we would simply use **gl_InstanceID** to find out which instance we are, but here we use an alternative formula:\n\n``` cpp\ninstanceID = batchedID + gl_InstanceID * MESH_BATCHSIZE;\n```\n\n**batchedID** represents which of the replicated batched meshes we are currently rendering. While it isn't a built-in vertex shader variable, we can derive it from the **gl_VertexID**, as the index buffer accounts for the vertex data replication in the VBO. The index values (gl_VertexID) of a batched mesh are in the range [MESH_VERTICES * batchedID, (MESH_VERTICES * (batchedID+1)) -1], so\n\n``` cpp\ninstanceID = (gl_VertexID / MESH_VERTICES) + gl_InstanceID * MESH_BATCHSIZE;\n```\n\nWhen drawing the rest of the meshes with the second drawcall, one has to offset the instanceID by the number of meshes already drawn.\n\n``` cpp\ninstanceID +=   int(firstCmd.instanceCount) * \n              ( int(firstCmd.elementCount) / MESH_INDICES); \n```\n\n#### Performance\n\nThe UI can be used to modify the sample a bit. For example, \"invisible rendering\" via ```glEnable(GL_RASTERIZER_DISCARD)``` can be used to time the classification or compute shaders alone. The entire task can also be split into multiple jobs, which allows the program to decrease the size of temporary list buffers. Last but not least, one can experiment with recording the particle data directly or indices. The default configuration gives the best performance for higher amounts of particles (compute, single job, indices).\n\nTimings in microseconds via GL timer query taken on a Quadro M6000, 1048574 particles\n\n``` cpp\n Timer Frame;    GL    4206;\n  Timer Lod;     GL     151;\n   Timer Cont;   GL     139;  // Particle classification (content)\n   Timer Cmds;   GL       8;  // DrawIndirect struct (commands)\n  Timer Draw;    GL    3888;\n   Timer Tess;   GL     256;  // Adaptively-tessellated spheres\n   Timer Mesh;   GL    3586;  // Simple sphere mesh\n   Timer Pnts;   GL      39;  // Spheres drawn as points\n  Timer TwDraw;  GL     160;\n```\n\n#### Sample Highlights\n\nThe user can influence the classification based on the viewport size using the \"pixelsize\" parameters. The classification can also be paused and re-used despite camera being changed, which can be useful to see the frustum culling in action, or inspect low-resolution representations.\n\nKey functionality is found in\n\n- Sample::drawLod()\n\nAs well as in helper functions\n\n- Sample::initParticleBuffer()\n- Sample::initLodBuffers()\n\nIn common.h, you can set ```USE_COMPACT_PARTICLE``` to 1 to reduce the size of the particles to a single vec4 by giving all particles the same world size. This mode allows rendering around 130 million particles on NVIDIA hardware, twice as much as the default 0 setting.\n\n#### Building\nIdeally, clone this and other interesting [nvpro-samples](https://github.com/nvpro-samples) repositories into a common subdirectory. You will always need [nvpro_core](https://github.com/nvpro-samples/nvpro_core). The nvpro_core is searched either as a subdirectory of the sample, or one directory up.\n\nIf you are interested in multiple samples, you can use the [build_all](https://github.com/nvpro-samples/build_all) CMAKE as an entry point. It will also give you options to enable or disable individual samples when creating the solutions.\n\n#### Related Samples\n[gl_occlusion_culling](https://github.com/nvpro-samples/gl_occlusion_culling) makes use of similar OpenGL functionality to perform more accurate visibility culling.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvpro-samples%2Fgl_dynamic_lod","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnvpro-samples%2Fgl_dynamic_lod","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvpro-samples%2Fgl_dynamic_lod/lists"}