{"id":17445903,"url":"https://github.com/piellardj/water-webgpu","last_synced_at":"2025-06-20T00:04:24.781Z","repository":{"id":254598953,"uuid":"610321891","full_name":"piellardj/water-webgpu","owner":"piellardj","description":"WebGPU water simulation handling up to a million particles.","archived":false,"fork":false,"pushed_at":"2024-08-24T15:56:31.000Z","size":81671,"stargazers_count":42,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-12T19:53:42.194Z","etag":null,"topics":["compute-shader","gpgpu","gpgpu-computing","gpgpu-physics","lagrangian","shaders","typescript","webgpu"],"latest_commit_sha":null,"homepage":"https://piellardj.github.io/water-webgpu/","language":"TypeScript","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/piellardj.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"license","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-06T14:41:33.000Z","updated_at":"2024-12-27T14:19:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"b922a306-eaa3-46f0-8a83-72bcac459be1","html_url":"https://github.com/piellardj/water-webgpu","commit_stats":null,"previous_names":["piellardj/water-webgpu"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/piellardj/water-webgpu","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piellardj%2Fwater-webgpu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piellardj%2Fwater-webgpu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piellardj%2Fwater-webgpu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piellardj%2Fwater-webgpu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/piellardj","download_url":"https://codeload.github.com/piellardj/water-webgpu/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piellardj%2Fwater-webgpu/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260852095,"owners_count":23072587,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compute-shader","gpgpu","gpgpu-computing","gpgpu-physics","lagrangian","shaders","typescript","webgpu"],"created_at":"2024-10-17T18:16:49.576Z","updated_at":"2025-06-20T00:04:19.759Z","avatar_url":"https://github.com/piellardj.png","language":"TypeScript","funding_links":["https://www.paypal.com/donate/?hosted_button_id=AF7H7GEJTL95E"],"categories":[],"sub_categories":[],"readme":"# water-webgpu\n\n## Description\nThis is a water simulation where water is modeled as thousands of small balls colliding with each other. You can interact with it by adding objects such as a cup or a helix. You can also dynamically change the domain contraints and the engine settings. The smaller the timestep, the more precise it is.\n\nThis project runs fully on GPU and can handle up to a million balls. It is implemented with the experimental WebGPU API, which allows GPGPU in the browser for massively parallel computing. You might need to manually enable WebGPU in your browser.\n\nSee it live [here](https://piellardj.github.io/water-webgpu/).\n\n[![Donate](https://raw.githubusercontent.com/piellardj/piellardj.github.io/master/images/readme/donate-paypal.svg)](https://www.paypal.com/donate/?hosted_button_id=AF7H7GEJTL95E)\n\n## Preview\n\nhttps://user-images.githubusercontent.com/22922087/228359283-33cc019e-f49b-4865-9906-eb5b36fb7920.mp4\n\nhttps://user-images.githubusercontent.com/22922087/228642679-4031c3f1-6008-4f5b-bc70-46198698d09b.mp4\n\nhttps://user-images.githubusercontent.com/22922087/228644853-d2b4d2da-42aa-4678-a433-e3ecf7e17c6a.mp4\n\nhttps://user-images.githubusercontent.com/22922087/228642866-f7bef392-21a3-464e-99ca-78a793789c3b.mp4\n\nhttps://user-images.githubusercontent.com/22922087/228643120-e85d4b6d-4832-4d22-9152-ecd5484afe3c.mp4\n\n## Algorithms\n### Engine\nThe simulation runs fully on the GPU: the CPU is only used to compute the initial positions.\n\n#### Base principle\nThe physics are not the focus of this project, so they are pretty basic.\n\nAll particles in the scene share the same size. They evolve inside a delimited spatial domain (a unit cube). Particles collide with each other, so that there is no interpenetration. Particles are subject to gravity. I use Euler integration for each step.\n\nIn the scene there can be obstacles. They are modeled as special particles that do not move, so in case of collision between a mobile particle and an obstacle's particle, it is the mobile particle that absorbs the energy and bounces back. For the rendering, the obstacles are rendered as meshes, but this is only for display.\n\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"Obstacle modeled as particles.\" src=\"src/readme/obstacle.png\"/\u003e\n    \u003cp\u003e\n        \u003ci\u003eObstacle modeled as particles.\u003c/i\u003e\n    \u003c/p\u003e\n\u003c/div\u003e\n\n#### Spatial indexing\nA naive implementation of these equations would be, for each sphere to check every other sphere for collision. However:\n- this is obviously not scalable;\n- this is wasting a lot of resources, because if two spheres are too far from one another, there is no need to compute their interaction.\n\nThe solution to both these issues is to use spatial indexing, so that:\n- each particle only checks the particles near it and skips the ones far away;\n- particles that are spatially close are adjacent in the `GPUBuffer`, which is favorable for GPU cache.\n\nSpatial indexing works well on GPU, but requires an adapted implementation, different from a classical CPU one.\n\nIn my explanation, I will consider a 2D domain to make diagrams readable; however in my project I use 3D indexing. This technique can be generalized to higher dimensions if needed.\n\nLet take the example of a scene where there are spheres of radius `r`. I divide the domain into a regular grid, where the cell size at least `2r`. This way if a sphere is in a certain cell, then the only other spheres potentially colliding are the ones in the same cell, or in the 9 adjacent cells.\n\nIn this example there are 7 spheres (in blue), and the domain is divided into 16 cells (in black). Each cell is given a unique scalar identifier.\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"Spatial indexing: step 1\" width=\"512px\" src=\"src/readme/indexing-01.png\"/\u003e\n\u003c/div\u003e\n\nThen I count the number of spheres in each cell (`pCount` in these diagrams), and I assign a local id to each sphere (in blue).\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"Spatial indexing: step 2\" src=\"src/readme/indexing-02.png\"/\u003e\n\u003c/div\u003e\n\nThen I compute a prefix sum (exclusive scan): the `cStart` of each cell is the sum of the `pCount` of the previous cells.\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"Spatial indexing: step 3\" src=\"src/readme/indexing-03.png\"/\u003e\n\u003c/div\u003e\n\nThen to each particle, I assign a global id (in red) which is the sum of the cell's `cStart` and the particle's local id. This global id is unique to each particle, and is then used to reorder particles.\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"Spatial indexing: step 4\" src=\"src/readme/indexing-04.png\"/\u003e\n\u003c/div\u003e\n\nFinally, I reorder the particles according to their global id.\n\nOnce this indexing is done:\n- the particles were reordered;\n- I easily can get the list of particles in a cell: they are the ones with ids ranging from `cStart` to `cStart + pCount`.\n\nIn this example, let's say I want to compute the collisions for particle 1.\n- I start by computing the cell id (cell #2)\n- I then lookup particles in adjacent cells (#1, #2, #3, #5, #6, #7): in cells #1,#3,#5,#6 `pCount=0` so no particles, in cell #2 particles `0` to `0+3` (0, 1, 2), in cell #7 particles `5` to `5+1` (5).\n\nHere is what indexing looks like (only the cells with `pCount\u003e0` are displayed):\n\nhttps://user-images.githubusercontent.com/22922087/228377674-6cc242ab-d291-4367-823d-86c9c04d8297.mp4\n\n#### Performance\nUnfortunately, at the time of writing I did not find an easy way of precisely monitoring performance. I don't know, of physics or spatial indexing, which takes the most computing time. The only metric I have is the iterations per second (one iteration being spatial indexing + physics computing).\n\nHere is the evolution of the iterations per second relatively to the particles count. Measures were done on my desktop computer with an nVidia 1660 Ti, with particles of radius 0.005, and a grid size of 98x98x98 (9411952 cells of 0.0102 each).\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"Iterations-per-second by particles count.\" src=\"src/readme/performance.png\"/\u003e\n    \u003cp\u003e\n        \u003ci\u003eIterations-per-second by particles count.\u003c/i\u003e\n    \u003c/p\u003e\n\u003c/div\u003e\n\nTweaking the grid size (as long as it respects the minimum cell size condition) will certainly affect performance too; however I did not perform any tests of this kind to determine the ideal size.\n\n### Rendering\nThe project supports two render modes:\n- balls, which is the cheapest one\n- water, which is the most expensive one\n\nThese modes are purely cosmetic and don't affect the simulation in any way. In both modes I use deferred rendering.\nBelow, examples of both rendering modes for a same scene, comprising a central obstacle partially submerged.\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"'Balls' rendering mode.\" src=\"src/readme/balls_shaded.png\"/\u003e\n    \u003cp\u003e\n        \u003ci\u003e\"Balls\" rendering mode.\u003c/i\u003e\n    \u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"'Water' rendering mode.\" src=\"src/readme/water_shaded.png\"/\u003e\n    \u003cp\u003e\n        \u003ci\u003e\"Water\" rendering mode.\u003c/i\u003e\n    \u003c/p\u003e\n\u003c/div\u003e\n\n#### \"Balls\" rendering mode\nThis rendering mode of rendering is the most straightforward one.\n\nEach ball is first rendered as a 2D billboard, which each fragment containing the billboard-local position in the red/green channels. Since the balls are really close to one another, a simple 2D billboard is not enough: I have to manually compute the depth in the fragment shader to mimic the shape of the sphere. In my tests, for large amounts of spheres this is still cheaper than using actually 3D geometry.\nThe depth is stored in the alpha channel. Since it is only 8 bits, I need to carefully chose the camera near and far planes to maximize useful range.\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"In RG channels, local position. In alpha channel, depth.\" src=\"src/readme/local-pos_depth.png\"/\u003e\n    \u003cp\u003e\n        \u003ci\u003eIn RG channels, local position. In alpha channel, depth.\u003c/i\u003e\n    \u003c/p\u003e\n\u003c/div\u003e\n\nThen at composition-time, I compute the world normal by combining the billboard-local position and the camera properties:\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"Computed world normals.\" src=\"src/readme/world_normals.png\"/\u003e\n    \u003cp\u003e\n        \u003ci\u003eComputed world normals.\u003c/i\u003e\n    \u003c/p\u003e\n\u003c/div\u003e\n\nand from there it is easy to compute a basic diffuse shading:\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"Final balls shading with diffuse lighting.\" src=\"src/readme/balls_final.png\"/\u003e\n    \u003cp\u003e\n        \u003ci\u003eFinal balls shading with diffuse lighting.\u003c/i\u003e\n    \u003c/p\u003e\n\u003c/div\u003e\n\n#### \"Water\" rendering mode\nThis rendering mode is way more expensive but has a cartoonish water look that I like. Everything happens in screen-space: no additional geometry is required.\n\nThe first step is common with the \"balls\" rendering mode: I render each sphere as a billboard. This time however, I use all 4 channels of the texture and I store in the blue channel the water depth (computed in additive mode).\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"In RG channels, local position. In blue channel, water depth. In alpha channel, depth.\" src=\"src/readme/local-pos_water-depth_depth.png\"/\u003e\n    \u003cp\u003e\n        \u003ci\u003eIn RG channels, local position. In blue channel, water depth. In alpha channel, depth.\u003c/i\u003e\n    \u003c/p\u003e\n\u003c/div\u003e\n\nIn a second step, I apply a blur to try to merge the spheres together.\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"Same texture, with local position and water depth blurred.\" src=\"src/readme/local-pos_water-depth_depth_blurred.png\"/\u003e\n    \u003cp\u003e\n        \u003ci\u003eSame texture, with local position and water depth blurred.\u003c/i\u003e\n    \u003c/p\u003e\n\u003c/div\u003e\n\nThis blur is applied in a compute shader. It is computed in two steps as a separable Gaussian blur: first vertical, then horizontal. For better performance, I first load the region into workgroup cache, then work on that cache.  It takes depth into account, in order to keep edges sharp: if there is a discontinuity in depth, then no blur is applied. Otherwise, a sphere in the foreground would be merged with the water in the background, which makes no sense visually.\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"Where there is a depth discontinuity, no blur is applied.\" src=\"src/readme/blur_depth-aware.png\"/\u003e\n    \u003cp\u003e\n        \u003ci\u003eWhere there is a depth discontinuity, no blur is applied.\u003c/i\u003e\n    \u003c/p\u003e\n\u003c/div\u003e\n\nIn the last step, all this information is combined, and with a bit of Fresnel and specularity here is the result:\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"Here is what the shaded water looks like.\" src=\"src/readme/water.png\"/\u003e\n    \u003cp\u003e\n        \u003ci\u003eHere is what the shaded water looks like.\u003c/i\u003e\n    \u003c/p\u003e\n\u003c/div\u003e\n\nI am especially happy with the water depth information, which greatly improves the rendering since it allows to see obstacles through the water. This effect is visible in the video below:\n\nhttps://user-images.githubusercontent.com/22922087/228359283-33cc019e-f49b-4865-9906-eb5b36fb7920.mp4\n\n## Implementation details\nI used this project to further learn about WebGPU (which at the time of writing, is still in draft so things could change in the future). Below are specific implementation details I think are notable.\n\n### Water depth: additive blending and write masks\nI was a bit worried about the computation of the water depth. However it turned out to be easy because WebGPU supports:\n- additive blending\n```typescript\nconst additiveBlend: GPUBlendState = {\n    color: {\n        srcFactor: \"one\",\n        dstFactor: \"one\",\n        operation: \"add\",\n    },\n    alpha: {\n        srcFactor: \"one\",\n        dstFactor: \"one\",\n        operation: \"add\",\n    }\n};\n```\n- and write mask\n```typescript\nconst colorTarget: GPUColorTargetState = {\n    format: \"rgba8unorm\",\n    writeMask: GPUColorWrite.BLUE, // or GPUColorWrite.RED | GPUColorWrite.GREEN | GPUColorWrite.ALPHA\n};\n```\n\nThis is necessary because the RGA channels of the deferred texture store the scene rendered classically (with depth write) whereas the B channel stores the scene in additive mode without depth write: as a result I cannot compute all four channels at the same time, I have to perform 2 renders into the same texture. I first render the RGA channels with a RGA writeMask and `depthWriteEnabled=true`, and then I do a second render for the B channel, with a B writeMask and `depthWriteEnabled=false`.\n\n### Blur: compute shaders, workgroup address space and workgroupBarrier\nIn the \"Water\" render mode, I have to blur the deferred texture. In WebGL1, I would have used a fragment shader to do this. In WebGPU, there is support for compute shader, which offers more control, flexibility and performance !\n\nThe blur is computed as a separable Gaussian blur: first a horizontal blur is applied, then a vertical one. The principle of a blur is to sample the neighbouring fragments and sum them, with normalized weights. A direct way of doing this would be, for each texel, to fetch the neighbouring texels directly from the texture. However this proves to be sub-optimal because then each texel would be fetched many times, and a fetch is expensive. A more performant way of doing this is to minimize texel fetches by using a cache in the workgroup address space. This is made possible with the use of the [`workgroupBarrier()`](https://www.w3.org/TR/WGSL/#workgroupBarrier-builtin) instruction.\n\nYou can find such an example in the [webgpu-samples](https://github.com/webgpu/webgpu-samples/blob/main/src/sample/imageBlur/blur.wgsl).\n\nBelow is a simplified example. In real life there is some more code to handle out-of-bounds reads.\n```glsl\n@group(0) @binding(0) var inputTexture: texture_2d\u003cf32\u003e; // input texture\n@group(0) @binding(1) var outputTexture: texture_storage_2d\u003crgba8unorm, write\u003e; // output texture\n\nstruct ComputeIn {\n    @builtin(workgroup_id) workgroupId: vec3\u003cu32\u003e,\n    @builtin(local_invocation_id) localInvocationId: vec3\u003cu32\u003e,\n    @builtin(global_invocation_id) globalInvocationId: vec3\u003cu32\u003e,\n};\n\nconst direction = vec2\u003ci32\u003e(1,0); // horizontal blur\nconst workgroupSize = 128;\nconst blurRadius = 6;\n\nvar\u003cworkgroup\u003e workgroupCache : array\u003cvec4\u003cf32\u003e, workgroupSize\u003e; // cache stored in the workgroup address space\n\n@compute @workgroup_size(workgroupSize)\nfn main(in: ComputeIn) {\n    let textureSize = vec2\u003ci32\u003e(textureDimensions(inputTexture));\n    let globalInvocationId = vec2\u003ci32\u003e(in.globalInvocationId.xy) - vec2\u003ci32\u003e(2 * blurRadius * i32(in.workgroupId.x), 0);\n\n    let texelId = globalInvocationId.x * direction + globalInvocationId.y * (vec2\u003ci32\u003e(1) - direction);\n    let indexInCache = i32(in.localInvocationId.x);\n\n    // first, load workgroup cache, one single texel fetch per invocation\n    let currentFragment = textureLoad(inputTexture, texelId, 0); // texel fetch (might be out of texture range)\n    workgroupCache[indexInCache] = currentFragment; // write in workgroup cache\n\n    // wait for every invocation of in the workgroup\n    workgroupBarrier();\n    // at this point the workgroupCache stores a copy of a portion of the texture\n\n    // then compute blur by loading values from workgroupCache\n    if (texelId.x \u003c textureSize.x \u0026\u0026 texelId.y \u003c textureSize.y) {\n        var blurred = vec4\u003cf32\u003e(0);\n\n        // loop on items from indexInCache-blurRadius to indexInCache+blurRadius\n        for (let i = indexInCache-blurRadius; i \u003c= indexInCache+blurRadius; i++) {\n            blurred += workgroupCache[i];\n        }\n        blurred /= 2.0 * f32(blurRadius) + 1.0;\n\n        // then store the result in the output texture\n        textureStore(outputTexture, texelId, blurred);\n    }\n}\n```\n\n### Uniforms buffer packing\nIn GLSL when creating a structure, each attribute has requirements in terms of bytes alignment (and stride in case of arrays). Correctly packing a structure to minimize space requires a bit of care. Everything is explained in the [Alignment and Size](https://www.w3.org/TR/WGSL/#alignment-and-size) section of the spec.\n\nHere is an example of a structure that could be packed better:\n```glsl\nstruct Particle {            //            align(16) size(64)\n    position: vec3\u003cf32\u003e,     // offset(0)  align(16) size(12)\n    // 4 bytes wasted\n    velocity: vec3\u003cf32\u003e,     // offset(16) align(16) size(12)\n    // 4 bytes wasted\n    acceleration: vec3\u003cf32\u003e, // offset(32) align(16) size(12)\n    weight: f32,             // offset(44) align(4)  size(4) \n    foam: f32,               // offset(48) align(4)  size(4) \n    indexInCell: u32,        // offset(52) align(4)  size(4) \n    // 8 bytes wasted\n};\n```\nSince `SizeOf(vec3\u003cf32\u003e) = 12` and `AlignOf(vec3\u003cf32\u003e) = 16`, in this example there is a 4-bytes gap between `position`, `velocity` and `acceleration`. This wastes 8 bytes in padding. Worse it makes the structure 16 bytes too big because the `size` of a structure needs to be a multiple of its `align`: here, even if the used size would have been 56 bytes, there is more padding added at the end.\nA better way to pack this structure would be to use this padding to store the 4-bytes types like so:\n```glsl\nstruct Particle {            //            align(16) size(48)\n    position: vec3\u003cf32\u003e,     // offset(0)  align(16) size(12)\n    weight: f32,             // offset(12) align(4)  size(4) \n    velocity: vec3\u003cf32\u003e,     // offset(16) align(16) size(12)\n    foam: f32,               // offset(28) align(4)  size(4) \n    acceleration: vec3\u003cf32\u003e, // offset(32) align(16) size(12)\n    indexInCell: u32,        // offset(44) align(4)  size(4) \n};\n```\nThis way, no space is wasted and the structure is 75% smaller, down to 48 bytes instead of 64 ! This can seem anecdotal but has a huge impact in my project where I store 100000s of particles: it can save dozens of megabytes of GPU memory.\n\nFor large structures this is a bit tedious to tweak the attributes manually, so I created a helper class to automatically pack structures as much as possible.\n\n### Spatial indexing with atomicAdd\nWebGPU offers atomic operations described in the \"[Atomic Read-modify-write](https://www.w3.org/TR/WGSL/#atomic-rmw)\" section of the spec. These allow several invocations to work in parallel on the same data in storage or workgroup address space without risking race conditions.\n\nI use it in the step 2 of the spatial indexing process, to increment `pCount` the count of particles in each cell.\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"Spatial indexing: step 2\" src=\"src/readme/indexing-02.png\"/\u003e\n\u003c/div\u003e\nIn this step:\n- the cell's `pCount` were previously reset to 0\n- each invocation handles one particle.\n\nThe code snipet to increment the cells `pCount` and get the particle's local id is:\n```glsl\n\nstruct Cell {\n    pCount: atomic\u003cu32\u003e,\n};\n\nstruct Particle {\n    position: vec3\u003cf32\u003e,\n    indexInCell: u32,\n};\n\nstruct ComputeIn {\n    @builtin(global_invocation_id) globalInvocationId : vec3\u003cu32\u003e,\n};\n\n@group(0) @binding(0) var\u003cstorage,read_write\u003e cellsBuffer: array\u003cCell\u003e;\n@group(0) @binding(1) var\u003cstorage,read_write\u003e particlesBuffer: array\u003cParticle\u003e;\n\noverride particlesCount: u32;\n\nfn computeCellIndex(position: vec3\u003cf32\u003e) -\u003e u32 {\n    // compute it\n}\n\n@compute @workgroup_size(128)\nfn main(in: ComputeIn) {\n    let particleId = in.globalInvocationId.x;\n\n    if (particleId \u003c particlesCount) { // particlesCount might not be a multiple of 128, so a few invocations are wasted\n        let position = particlesBuffer[particleId].position;\n        let cellIndex = computeCellIndex(position);\n        // atomicAdd increments the cell's pCount and return its value pre-incrementation\n        particlesBuffer[particleId].indexInCell = atomicAdd(\u0026cellsBuffer[cellIndex].pCount, 1u);\n    }\n}\n```\n\n### Prefix sum\nImplementing prefix sum for parallel architectures is a well-known subject. I chose to implement a simple algorithm described below, which consists of two passes: the reduce pass and the down pass.\n\n\u003cdiv style=\"text-align:center\"\u003e\n    \u003cimg alt=\"Prefix sum (exclusive scan) algorithm I used.\" src=\"src/readme/prefix-sum.png\"/\u003e\n    \u003cp\u003e\n        \u003ci\u003ePrefix sum (exclusive scan) algorithm I used.\u003c/i\u003e\n    \u003c/p\u003e\n\u003c/div\u003e\n\n## Improvements\nThere are many ways this project could be improved.\nOn the engine side:\n- I could use an Smoothed particle hydrodynamics (SPH) algorithm for more realistic physics;\n- I could use a better integration scheme than Euler, even Verlet would be an improvement.\n- currently, all obstacles are modeled by spheres. This works well but has a major disadvantage: it makes all obstacles bumpy. For instance, balls don't roll well on mild slopes. This leaves a lot of room for improvement.\n\nOn the rendering side:\n- another way to render water from spheres would be to recontruct the water surface, for instance with marching cubes. This is a great subject in itself and implementing it on GPU would be interesting.\n- another way to compute water depth, without additive blending, would be to do a two-passes rendering: the first one to retrieve the depth of the backFaces, and then the second one computing the difference between the front face's depth. An adavantage of this method is that is also works with mesh geometry.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpiellardj%2Fwater-webgpu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpiellardj%2Fwater-webgpu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpiellardj%2Fwater-webgpu/lists"}