{"id":23689798,"url":"https://github.com/pipecruz/cuda-flocking-sim","last_synced_at":"2026-05-07T07:39:53.134Z","repository":{"id":270175123,"uuid":"909540708","full_name":"PipeCruz/CUDA-Flocking-Sim","owner":"PipeCruz","description":"CPU and GPU (CUDA) implementations of naive/optimized flocking algorithms","archived":false,"fork":false,"pushed_at":"2025-02-05T09:00:16.000Z","size":1129,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-19T23:41:41.670Z","etag":null,"topics":["cuda"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PipeCruz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-29T02:39:06.000Z","updated_at":"2025-02-05T09:00:19.000Z","dependencies_parsed_at":"2024-12-29T04:20:41.947Z","dependency_job_id":"de6d5664-67d4-4f2d-8154-4863d9065fe0","html_url":"https://github.com/PipeCruz/CUDA-Flocking-Sim","commit_stats":null,"previous_names":["pipecruz/cuda-flocking-sim"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PipeCruz%2FCUDA-Flocking-Sim","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PipeCruz%2FCUDA-Flocking-Sim/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PipeCruz%2FCUDA-Flocking-Sim/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PipeCruz%2FCUDA-Flocking-Sim/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PipeCruz","download_url":"https://codeload.github.com/PipeCruz/CUDA-Flocking-Sim/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239753700,"owners_count":19691159,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda"],"created_at":"2024-12-30T01:39:43.514Z","updated_at":"2026-01-13T23:30:18.364Z","avatar_url":"https://github.com/PipeCruz.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CUDA Flocking Simulator\n\nMy final project for CS179 @ Caltech. This was built entirely from scratch, I had no prior experience with OpenGL beforehand.\n\n\u003c!-- code block --\u003e\n## Installation \nThere's a few libraries that are needed as seen in the Makefile.\n\n `-lglfw -lGL -lGLU -lGLEW -lcudart`\n\nTo run locally you just need to find out how to install `libglfw3, GLEW, GLFW` on top of a regular cuda dev environment.\n\nTo run this project, run `make crun` or `make clean run`, and  change any of the constant parameters in `config.h`. You'll want to focus on scaling factors, distances, and quantity of particles to see different behaviors. Run with `USECPU = false` and `SIMTYPE = 2` to see the fastest / best frame rate algorithm. The default `NUM_BOIDS` argument is best suited for the GPU, anything over a thousand has significant performance issues, or it did on my machine. \n\n\n## Project Description\nWhat does the program do?\n1. Visualize lifelike flocking behavior of a massive \u003e1m amount of particles in 3 dimensions\n\t\t\t\t- Based on 3 rules, \"cohesion\", \"alignment\", and separation, each with their own radius and scale factor\n               - 2 algorithm implementations on CPU and GPU (4 total, technically 5)\n                --- 1. naive approach, O(n^2) on CPU, comparing each particle\n                --- 2. \"flattened grid\" approach, closer to O(nlogn) on CPU, only comparing particles in similar spatial distance, \"buckets\"\n                    - technically there's another implementation which generates a hash of the coordinate particles to compare ones that fall into the same bucket (modulo), but that's basically the flattened grid approach\n\n\t\t3. You can change the parameters in the `config.h` file  to your liking and see how the particle's behaviors changes\n\t\t4. You can use your mouse to rotate around the projected visualization of the flocking sim.\n    In my original planning, I thought I'd be doing more graphics stuff but just rendering the particles as points colored by the magnitude of their velocity gets the point across:)\n    NOTE: all the GPU rendering was done on a laptop with a GTX 1050ti, Ubuntu 22.05.5 LTS x86_64, and  Intel i7-8750H (12) @ 4.1GHz.\n    \n## Results\nWhat should we expect to see?\n\nsomething like this:\n\n\u003c!-- screenshot 100k_boids.png --\u003e\n\n![100k boids](100k_boids.png \"100k clustering boids\")\n\n[1 million boids](https://youtu.be/Hdiz2vlfsWM)\n\n[100k boids](https://youtu.be/BAFrjFGmaUk)\n\n## Performance Analysis\n\nAlmost all of this simulation can be parallelized, from computing distances between each boich which influences their changes in velocity, to updating them at the same time once such a change is implemented. Hence, the use of the CUDA-OpenGL interoperability library to update Vertex Buffer Objects way faster than when the particle updating is done on a CPU.  \n\nJust by running this program, it's clear that using a GPU to accelerate the simulation is highly effective. \n\nAre there things that could be improved? Yes. There's a lot of different collision algorithms that can be optimized for CUDA programming by implementing octrees and quadtrees.  Some other optimizations might be taking further advantage of CUDA's `thrust` library to get contiguous data accesses. Compute Shaders are also implemented in CUDA and can even further speed things up. That's after learning OpenGL/glsl which took up a huge amount of my time.\n\nIn every case, the GPU implementation is better. Recording some fps we see\n\nConsider 100k boids with the default paramaters, we see that on the cpu it starts at a steady 5 frames per second![100k cpu fps](100k_cpufps.png) and for the cuda accelerated one: ![100k gpu fps](100k_gpu_fps.png), there's no delay, it's 60fps\n\nFurthermore, we can see\n\n\nFor 1 million boids on the gpu:pictured here about 15 secs after starting, we see it's at a low fps, but still three times faster, around 15fps ![1 mil fps](1mil_gpu_fps.png)\n\nAs context, 10k on the cpu using the grid flattening method: ![10k cpu grid flat](10k_cpu_fps.png) hovers around half the fps of the CUDA 100k simulation, despite having 10 times less particles to match.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpipecruz%2Fcuda-flocking-sim","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpipecruz%2Fcuda-flocking-sim","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpipecruz%2Fcuda-flocking-sim/lists"}