{"id":19817178,"url":"https://github.com/nvpro-samples/vk_idbuffer_rasterization","last_synced_at":"2025-05-01T11:30:33.122Z","repository":{"id":41090912,"uuid":"413786880","full_name":"nvpro-samples/vk_idbuffer_rasterization","owner":"nvpro-samples","description":"Vulkan sample to render efficient per-part IDs in CAD models","archived":false,"fork":false,"pushed_at":"2023-11-20T21:58:37.000Z","size":434,"stargazers_count":7,"open_issues_count":0,"forks_count":1,"subscribers_count":10,"default_branch":"main","last_synced_at":"2023-11-20T22:36:54.088Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nvpro-samples.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-10-05T11:20:59.000Z","updated_at":"2023-03-12T20:08:51.000Z","dependencies_parsed_at":"2023-11-20T22:46:00.943Z","dependency_job_id":null,"html_url":"https://github.com/nvpro-samples/vk_idbuffer_rasterization","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nvpro-samples%2Fvk_idbuffer_rasterization","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nvpro-samples%2Fvk_idbuffer_rasterization/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nvpro-samples%2Fvk_idbuffer_rasterization/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nvpro-samples%2Fvk_idbuffer_rasterization/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nvpro-samples","download_url":"https://codeload.github.com/nvpro-samples/vk_idbuffer_rasterization/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224253415,"owners_count":17280934,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T10:11:55.888Z","updated_at":"2025-05-01T11:30:33.098Z","avatar_url":"https://github.com/nvpro-samples.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# vk_idbuffer_rasterization\n\nIn CAD rendering a surface is often created of multiple feature parts, for example from nurbs-surfaces joined together create the solid shape.\nWhen rendering an object with material effects, we normally ignore such parts and attempt to render the shape as a whole. However, under some\ncircumstances we might want to have the ability to identify each part uniquely (see the individually-tinted parts in the screenshot).\n\n![sample screenshot - CAD model courtesy of PTC](doc/screenshot.png)\n\n(CAD model courtesy of PTC)\n\nIn this sample we leverage the unique part ids to implement a very basic mouse selection highlight (animated inverted color effect over the part).\nAnother use-case for idbuffers / item buffers containing unique part IDs is screenspace-based outlining for feature- or silhouette-edges.\n\nOne issue we face is that, while an object might have many triangles in whole, when it's made of many such parts, they tend to have few triangles each. Therefore, rendering the parts individually does not allow GPUs to run at full performance, even when we remove the CPU-bottlenecks from such small drawcalls, the GPU can become front-end bound on the GPU, as the front-end is responsible for creating the drawcalls.\n\nA modern GPU tends to be wide, that means it has lots of execution units that prefer big work loads, so a drawcall with just a few triangles (less than 500) tends to not be able to saturate the hardware quick enough. While the hardware is able to process multiple drawcalls in parallel, there tend to be limits on how much can run in parallel, and also if the work per-draw is small the setup/overhead of each draw just can bite us in the end.\n\nSometimes it's not avoidable to have objects with few drawcalls, ideally those get not drawn in bulk, but interleaved with objects that have more triangles so we can hide this problem better.\n\nIn this sample we showcase a few rendering techniques to get a per-part ID within the fragment shader and still be efficient.\n\nTypical UI operations:\n\n- `renderer` change between different techniques to render the part IDs\n- `per-draw parameters` alter the way per-draw parameters are passed and how drawcalls are submitted.\n- `search batch` the number of parts per drawcall to batch in the `search` renderers.\n- `part color weight` slider allows to blend between the individual part colors and the material color\n- `colorize drawcalls` when active overrides the object's material color with a per-draw color (useful to show the batching)\n- `model copies` increase the number of instances of the model (recommended for fast GPUs and performance investigation)\n- `Render GPU [ms]`: milliseconds it took to render the scene, please disable vsync (press V, see window title) for performance investigation and create meaningful loads.\n\n\n## CAD Model Setup\n\nSome assumptions about how we organized our CAD model / buffers:\n\n* geometry is a set of vertices and indices\n* geometry is made of multiple parts, which each take a range of the geometry's triangle indices.\n* geometry can be drawn as single drawcall spanning all triangles / all parts.\n* one object references a geometry (so we can instance the same geometry for each wheel)\n* each object has a `uniquePartOffset` so that a final part that is rendered can be uniquely identified \n\n## Techniques to draw per-part IDs\n\nWe implemented a few different approaches to get a per-part ID in the fragment-shader stage at the end.\nNot all of them are as versatile as others. As always with graphics programming, your mileage will vary as\nthe outcome greatly depends on your typical input data.\n\n### **per-draw part index**\n\nEach drawcall sets the part index. This means if our model has lots of parts, we get a ton of drawcalls.\n\nFor CAD parts this often has the **risk of being slowest technique**, but might be the simplest starting point.\n\nThe performance outcome of rendering techniques always depends on your data, if your \"parts\" tend to have plenty triangles,\nthen this way is totally fine to use.\n\n``` cpp\n// drawcall setup\n  // We encode the partIndex in the \"baseInstance\" value of each drawcall.\n  // This makes this technique also friendly with multi-draw-indirect structure.\n  // We don't really use instancing, so the value will be always be available\n  // as-is in the drawcall.\n  vkCmdDrawIndexed(cmd, part.indexCount, 1, part.firstIndex, part.firstVertex,\n                        part.ID);\n\n// vertex shader simply passes through the partIndex\n// in Vulkan gl_InstanceIndex is the result of gl_InstanceID + gl_BaseInstance\n  // input\n  layout(location=0) in Interpolants {\n    ...\n    flat uint partIndex;\n  } OUT;\n\n  // code\n  ...\n  OUT.partIndex = gl_InstanceIndex;\n\n// fragment shader picks up the value from the vertex-shader\n  // input\n  layout(location=0) in Interpolants {\n    ...\n    flat uint partIndex;\n  } IN;\n\n  // code\n  ...\n  int partIndex = IN.partIndex;\n\n```\n\n### **per-triangle part index**\n\nIn this variant we store a `triangle partID buffer` for each triangle of the object's geometry. This maps each triangle to the part it belongs to. Depending on the number of parts we can use `uint8_t`, `uint16_t` or `uint32_t` array. In this example we used `uint32_t` and in the UI you can see the memory cost under `triangle ids`.\n\nWhile this costs a bit of extra memory, it tends to be the fastest variant, as we can still render the entire object in one shot, independent how many parts it contains, that is why we **recommend this setup**.\n\n``` cpp\n// drawcall setup\n  // similar as before we hijack the \"baseInstance\" value to get a cheap per-draw\n  // value. This time we store the geometry's buffer offset into the `triangle partID buffer`.\n  vkCmdDrawIndexed(cmd, geometry.indexCount, 1, geometry.firstIndex, geometry.firstVertex, \n                        geometry.partTriCountsOffset);\n\n// fragment shader\n  // lookup each triangle's partId\n  // \"PUSH.idsAddr\" is a pushconstant that contains the buffer_reference address\n  //                for array that contains the part ID per triangle.\n  // \"IN_ID.idsOffset\" is used a bit like \"firstIndex\" in a drawcall, it allows us \n  //                   to store many parts worth of partTriCounts in the buffer.\n  //                   It is piped through the vertex-shader as in the simple setup.\n  int partIndex = int(PUSH.idsAddr.d[gl_PrimitiveID + int(IN_ID.idsOffset)]);\n\n```\n\nThe easiest and often fastest option to do this lookup is inside the fragment-shader. The renderers with the `fs` suffix do the lookup there:\n- [drawid_primid.frag.glsl](drawid_primid.frag.glsl).\n\nAnother alternative is using a geometry-shader and compute a new gl_PrimitiveID passed to the fragment shader. The renderers with the `gs` perform the lookup in the geometry-shader stage:\n- [drawid_primid_gs.geo.glsl](drawid_primid_gs.geo.glsl). \n\nUsing the geometry-shader version is typically slower than the fragement-shader, and we also do the operation/fetching prior early depth culling. So we don't recommend using the geometry-shader technique either.\n\n### **per-triangle search part index**\n\nIn this technique we try to lower the memory footprint of the previous technique. Before we had one partID per triangle, but what if we just store how many triangles each part stores and then figure out which part we are. \n\nThis means we only need the number of triangles each part has, which is a lot less memory (see `part ids` in UI), as it no longer depends on the actual number of triangles.\n\nWe batch some parts together as single drawcall and then search based on `gl_PrimitiveID`. \n\nThe drawcalls are batched to contain up to `SEARCH_COUNT` many parts and our shader-code is optimized for this.\nIn our sample a value of 16 worked well.\n\n``` cpp\n// drawcall setup\n  // We batch our geometry parts into a reduced number of drawcalls.\n  // This time we encode some crucial information about the batch in the \"baseInstance\"\n  // \"partCount\" tells us how many parts are in the batch (up to SEARCH_COUNT many)\n  // \"partOffset\" is the partID of the first part within this batch.\n  vkCmdDrawIndexed(cmd, batch.indexCount, 1, batch.firstIndex, batch.firstVertex,\n                        batch.partCount | (batch.partOffset \u003c\u003c 8));\n\n\n// fragment shader\n\n  // PUSH.idsAddr points to partTriCounts.\n  uints_in partTriCounts = PUSH.idsAddr;\n\n  // the batch meta info that we get per-draw\n  uint partOffset = IN_ID.idsOffset \u003e\u003e 8;\n  uint partCount  = IN_ID.idsOffset \u0026 0xFF;\n\n  int begin = 0;\n  int partIndex = 0;\n\n  // unroll support is provided via GL_EXT_control_flow_attributes\n  [[unroll]]\n  for (int i = 0; i \u003c SEARCH_COUNT; i++)\n  {\n    // for each part in the batch get number of triangles\n    // (we pad our partTriCounts buffer at the end so that this hardcoded search window never\n    //  creates out-of-memory access)\n    \n    // don't make this load part of a dynamic condition, so that the compiler\n    // can batch-load all SEARCH_COUNT many loads in separate registers, which reduces\n    // memory latency.\n    partTriangleCount = int(partTriCounts.d[partOffset + i]);\n\n    // we hardcoded this loop in the shader hence we add the `(i \u003c partCount)` condition\n    // which is dynamic per batch\n    // if the part is valid then look if the current gl_PrimitiveID fits in the range\n    [[flatten]]\n    if (i \u003c partCount \n      \u0026\u0026 gl_PrimitiveID \u003e= begin \n      \u0026\u0026 gl_PrimitiveID \u003c begin + partTriangleCount)\n    {\n      partIndex = i;\n    }\n    // shift begin of next range\n    begin += partTriangleCount;\n  }\n```\n\n### Passing per draw information\nThe sample implements different code paths to pass per-draw information, which can be switched between using the `per-draw parameters` UI option.\nEspecially at very high frequencies (low number of triangles/work per draw) the approaches can make a difference.\n\nThe [`per_draw_inputs.glsl`](per_draw_inputs.glsl) file does wrap these different methods.\n\n#### Push Constants\nWhen selecting `pushconstants` in the UI, the sample will supply per-draw\nparameters to the GPU via via Vulkan's 'Push Constants'. These are comparable to\n'Uniforms' in OpenGL. Push Constants provide a convenient way to provide shaders\nwith data that changes dynamically between draw calls. Traditionally this data\nmight be things like the ModelViewProjection matrix and material or lighting\nparameters. Unlike updates to descriptor sets, push constant data updates are\nembedded into the command buffer. This way no additional synchronization or \ncache management is needed, which makes push constants a very easy way to handle.\nThe downside to push constants is the additional overhead - they should not\nbe used to update a lot of parameters for each draw call. Our code sample attempts\nto minimize the number of push constant updates by making only updates to\nparameters that changed between draw calls.\n\n#### Multi-Draw Indirect and gl_BaseInstance\nAs the previous techniques sometimes rely on passing some information efficiently\nper draw, let's look at different possibilities. The UI option \n`MDI \u0026 gl_BaseInstance` offers a path that is using _Multi-Draw-Indirect_ to\nissue draw calls and a technique based around 'gl_BaseInstance' to pass per-draw\nparameters to the shader(s).\n\nMulti-Draw-Indirect (MDI) offers a way in Vulkan to submit many draws at once\nwith just one command, `vkCmdDraw[Indexed]Indirect`, thereby potentially\nincreasing performance considerably. With `vkCmdDraw[Indexed]Indirect`, the\nper-draw parameters that are typically provided as function parameters to\n`vkCmdDrawIndexed` like number of indices, first vertex, base instance etc.\nneed to be provided in an \"Indirect Buffer\" filled with\n`VkDrawIndexedIndirectCommand` objects that describe the individual draws to\nthe GPU. We do loose one important feature, though: the ability to change state\ninbetween these draws - in particular changes to the push constants. All draws\nexecuted as part of an MDI drawcall share the same state. Therefore we need to\nfind a different way to pass the required transform matrix, material ID, part\nidentifier etc to the shaders. Currently, the only way in core Vulkan to \ncommunicate a per-draw identifier is to use the \n`VkDraw[Indexed]IndirectCommand::firstInstance` member. \nThis parameter is passed through to the vertex shader as `gl_BaseInstance` \n(available in GLSL 4.6) and has no effect on rendering if instancing is not\nused. Since our sample is not using instancing we are free to use this \nparameter to pass a user defined ID along to the vertex shader. When the draws\ndo not actually make use of instancing, gl_InstanceID can be used synonymously \nto gl_BaseInstance which may work better on some hardware. Using gl_BaseInstance,\nthe shader can identify which part the current draw belongs to and use it to do\na lookup into a storage buffer containing the actual draw parameters needed\nfor the current draw. In our sample's case, we keep a storage buffer containing\nan array of `DrawPushData` around that we index with gl_InstanceID.\n\n```\nstruct DrawPushData\n{\n  // Common to all vertex shaders\n  uint matrixIndex;\n\n  uint flexible;\n\n  // Simple per-part fragment push constants for MODE_PER_DRAW_BASEINST\n  uint materialIndex;\n\n  // Added to the part ID when shading() so the same ID for different objects is\n  // a different color.\n  uint uniquePartOffset;\n\n  // Address bound contains different content per mode:\n  // - MODE_PER_TRI_ID*: trianglePartIds - per-triangle part IDs\n  // - MODE_PER_TRI_*BATCH_PART_SEARCH*: partTriCounts - per-part triangle counts\n  // - MODE_PER_TRI_*GLOBAL_PART_SEARCH*: partTriOffsets - running per-part triangle offsets\n  BUFFER_REFERENCE(uints_in, idsAddr);\n};\n```\nIt provides the vertex shader with the transform matrix, the fragment shader with\nthe material index and means to identify the object the currently drawn triangle\nbelongs to with the algorithms described prior.\n\nNotice that we don't pass these parameters individually as varyings (in/out parameters)\nbetween shader stages. Instead, the vertex shader only passes the current draw ID along in\na flat varying. This is done to minimize passing data between the shader stages. Saving\non inter-stage data in turn saves on-chip memory and thus allows for better utilization of\nthe GPU, in particular for large model rendering, when the the number of rendered primitives\nreaches and surpasses the amount of pixels on screen.\n\nEach shader stage that needs per-draw information from the per-draw buffer does its own\nlookup. This lookup will be highly uniform and thus cause the data to be kept in L1/L2\ncache with high likelyhood.\n\nWhile passing the data as individual varyings is possible, it increases the amount of\non-chip memory each output vertex needs. This increased usage negatively impacts\noccupancy, meaning less vertex-shader threads can be run in parallel.\n\n#### Multi-Draw Indirect and instanced vertex attribute\nVia the `MDI \u0026 instanced attribute` option in UI choses a renderer path which replaces the use of\ngl_BaseInstance shader built-in through an instanced vertex attribute.\n\nOn some hardware it is faster to emulate gl_BaseInstance via an instanced vertex attribute\nby providing a buffer which contains the draw index in the shape of `buffer[x] = x`.\nWhen binding this buffer as instanced vertex attribute, this vertex attribute will then\nprovide the gl_BaseInstance identifier indirectly. Once the draw ID is fetched from the\ninstanced vertex attribute, the remaining handling of per-draw parameters remains the same\nas the _MDI \u0026 gl_BaseInstance_ option.\n\n### Performance\n\nSummarizing our three main techniques:\n\n```\nwe want to draw a geometry with 9 parts\na b c d e f g h i\n```\n\n**per-draw part index**: we get 9 drawcalls, each with one part\n```\n0 1 2 3 4 5 6 7 8\na b c d e f g h i\n```\n\n**per-triangle part index**: we get 1 drawcall, spanning all parts\n```\n0\nabcdefghi\n```\n\n**per-triangle search part index**: with this technique and SEARCH_COUNT=4 we get 3 drawcalls\n```\n0    1    2\nabcd efgh i\n```\n\nResults for a NVIDIA GeForce 3080 and `model copies = 3` with 1440p + 4x msaa and `search batch = 16` and `pushconstants`.\n\n| renderer                                      | drawcalls | time in milliseconds |\n|-----------------------------------------------|-----------|----------------------|\n| per-draw part index                           |   296 049 |               12.2   |\n| per-triangle part index fs                    |     6 777 |              **2.3** |\n| per-triangle part index gs                    |     6 777 |                4.3   |\n| per-triangle search part index fs             |    22 260 |                2.5   |\n| per-triangle search part index gs             |    22 260 |                4.6   |\n\nWe can see that the per-drawcall part index clearly is the worst option for this model, as it contains of lots of parts with few triangles.\nAs mentioned before the easiest plug-in solution is typically having a per-triangle buffer.\n\nFor the single car the `triangle partID buffer` was around 9 MB (32-bit per triangle) and the `partID buffer` used for searches just 268 KB (32-bit per part). So if you are tight on memory the `per-triangle search fs` method may be good choice.\n\nAnd as reminder evaluate the techniques with your kind of rendering setup and typical content.\n\n## Selection Highlight\n\nThe selection highlight is done by figuring out the partIndex underneath the mouse directly in the fragment shader. Using the `atomicMin` instruction provided through `GL_EXT_shader_atomic_int64`, we store the global unique `partIndex` in the lower 32-bit and the fragment depth in the upper 64-bit. At the end of the rendering we will have the closest return in the variable. We copy this variable over and use it for the visual highlight of the next frame. See the following code also in [drawid_shading.glsl](drawid_shading.glsl)\n\n\n``` cpp\n  // simple ray selection highlight:\n  \n  // if this fragment coordinate matches the mouse cursor\n  // we do a 64-bit atomicMin to find the closest surface (lowest depth value)\n  // and we store the unique partIndex \n  if (all(equal(ivec2(gl_FragCoord.xy), scene.mousePos))) \n  {\n    // pack partIndex in lower  32-bit\n    //      depth     in higher 32-bit\n    atomicMin(ray.mouseHit, packUint2x32(uvec2(partIndex, floatBitsToUint(gl_FragCoord.z))) );\n  }\n  \n  // rayLast is the result of the above logic from last frame.\n  // We cannot use the same frame's result, because as we raster the various triangles\n  // the result will change.\n  // If the current partIndex matches the one that was the closest in the last\n  // frame, then alter the color for the selection highlight.\n  // The copying of the result is done after rendering\n  // (see the vkCmdCopyBuffer at end of RendererVK::draw)\n  if (partIndex == unpackUint2x32(rayLast.mouseHit).x)\n  {\n    color = mix(color, vec4(1) - color, sin(scene.time * 10) * 0.5 + 0.5);\n  }\n```\n\n**Tip for VR**\n\nWhile this sample does a simple ray-test along the mouse cursor, one can use the same principle setup for an arbitrary selection ray. By treating each fragment shader invocation as a small plane we can intersect that with the selection ray. Then test if the intersection point is close to the current gl_FragCoord and if so run the atomicMin above, but with the hit distance rather than depth (means only few fragment shader invocations hit the atomicMin). That would give you a very cheap arbitrary selection ray, say controlled by VR controllers, on any visible surface almost for free. It comes with the restriction that you must have clear vision on anything you want to select, but that is often okay.\n\n## Building\nMake sure to have installed the [Vulkan-SDK](http://lunarg.com/vulkan-sdk/). Always use 64-bit build configurations.\n\nIdeally, clone this and other interesting [nvpro-samples](https://github.com/nvpro-samples) repositories into a common subdirectory. You will always need [nvpro_core](https://github.com/nvpro-samples/nvpro_core). The nvpro_core is searched either as a subdirectory of the sample, or one directory up.\n\nIf you are interested in multiple samples, you can use [build_all](https://github.com/nvpro-samples/build_all) CMAKE as entry point, it will also give you options to enable/disable individual samples when creating the solutions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvpro-samples%2Fvk_idbuffer_rasterization","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnvpro-samples%2Fvk_idbuffer_rasterization","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvpro-samples%2Fvk_idbuffer_rasterization/lists"}