{"id":13574930,"url":"https://github.com/rafbiels/sycl-collision-sim","last_synced_at":"2025-04-14T01:48:27.258Z","repository":{"id":154173355,"uuid":"624435375","full_name":"rafbiels/sycl-collision-sim","owner":"rafbiels","description":"Example SYCL implementation of collision detection","archived":false,"fork":false,"pushed_at":"2024-02-27T20:43:11.000Z","size":417,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-14T01:48:24.335Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rafbiels.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-06T13:12:48.000Z","updated_at":"2023-04-14T13:18:26.000Z","dependencies_parsed_at":null,"dependency_job_id":"b9544c7e-6764-4a78-97b7-13dc51a8506d","html_url":"https://github.com/rafbiels/sycl-collision-sim","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rafbiels%2Fsycl-collision-sim","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rafbiels%2Fsycl-collision-sim/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rafbiels%2Fsycl-collision-sim/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rafbiels%2Fsycl-collision-sim/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rafbiels","download_url":"https://codeload.github.com/rafbiels/sycl-collision-sim/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248809026,"owners_count":21164895,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T15:00:56.709Z","updated_at":"2025-04-14T01:48:27.236Z","avatar_url":"https://github.com/rafbiels.png","language":"C++","funding_links":[],"categories":["Table of Contents"],"sub_categories":["Mathematics and Science"],"readme":"# SYCL Collision Simulation\nDemo 3D simulation of rigid body physics with different shapes bouncing off each\nother confined in a box. Two implementations are provided, one sequential with\nstandard C++ code compiled for CPU, and parallel\n[SYCL](https://www.khronos.org/sycl/) implementation which can be compiled for\nany target device (e.g. a GPU) supported by a SYCL compiler.\n\nThe main aim of this project is to showcase how SYCL can be used to port CPU\ncode to GPUs with standard C++ syntax, achieving significant performance boost\nportable across devices from different vendors.\n\nThere is a video display with FPS counter for demonstration purposes (presented\nin the clip below), as well as a headless mode for benchmarking.\n\n\u003ca href=\"https://player.vimeo.com/video/873721038\"\u003e\n\u003cimg src=\"https://i.vimeocdn.com/video/1737354781-487f33094af9fd551060d0b2df485ea656d513fdfe10a8e933ef2fc047aa14cd-d\" width=\"50%\" /\u003e\n\u003c/a\u003e\n\n\n## How to build\n### Requirements\n\n* [CMake](https://cmake.org/) 3.12 or newer.\n* [Intel oneAPI DPC++](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler.html)\ncompiler or its\n[open source version](https://github.com/intel/llvm). Although most, if not\nall, of the code should compile with other SYCL implementations, the CMake\nconfiguration assumes the DPC++ compiler driver CLI for compilation flags setup.\n* [Magnum](https://doc.magnum.graphics/magnum/getting-started.html#getting-started-setup-install)\n(and its dependency\n[Corrade](https://doc.magnum.graphics/corrade/building-corrade.html#building-corrade-packages)) -\ngraphics middleware library used for the display as well as defining and\ntransforming actor body meshes.\n* [SDL2](https://wiki.libsdl.org/SDL2/Installation) for the graphical output\n(not needed when compiling for the headless mode).\n\n### Build with CMake\n\nLike any standard CMake project, the collision simulation can be built with\nsimply cloning the source, entering its directory, and typing:\n```sh\nmkdir build \u0026\u0026 cd build\nCXX=clang++ cmake ..\ncmake --build .\n```\nThe `CXX` variable should point to your DPC++ compiler, either the `clang++`\ndriver or the `icpx` (they have matching CLI, so both work).\n\nThere are several options you may want to set when configuring the project. Use\nthem by adding `-D\u003coption\u003e=\u003cvalue\u003e` in the first `cmake` command, with `\u003cvalue\u003e`\nbeing either `ON` or `OFF` for boolean options, and a number/string for others.\nThe available options are:\n* `HEADLESS` - boolean option switching to the headless mode without graphical\noutput, useful for benchmarking. The default is `OFF` (meaning graphical output\nis enabled).\n* `ACTOR_GRID_SIZE` - integer option. The simulation runs for a square grid of\nactors NxN where N=`ACTOR_GRID_SIZE`. The default value is 5 (meaning 25 actors\nare simulated). Due to some compile-time calculations based on this number, the\ncompiler may prevent you from setting it too high. The DPC++ version 2023.2.1\nallows 19 as the maximum value.\n* `ENABLE_CUDA` - boolean option enabling the compilation of SYCL code for\nNVIDIA GPU targets. The default value is `OFF`. Enabling this option requires\nthe CUDA toolkit to be installed (at least the ptx assembler and device bitcode library).\n* `ENABLE_HIP` - boolean option enabling the compilation of SYCL code for\nAMD GPU targets. The default value is `OFF`. Enabling this option requires\nthe ROCm toolkit to be installed (at least the device bitcode library).\n* `ENABLE_SPIR` - boolean option enabling the compilation of SYCL code for\nIntel devices. The compiled code can run both on Intel OpenCL and Level-Zero\nbackends. The default value is `ON`.\n\n## How to run\n\nThe application can be simply executed with `./collision-sim`. This executes the\nparallel simulation using SYCL. There is a single command-line option supported,\n`--cpu`, which causes the program to execute the sequential CPU implementation\ninstead.\n\nThe SYCL implementation uses the default device selector, thus, the DPC++ SYCL\nruntime\n[environment variables](https://intel.github.io/llvm-docs/EnvironmentVariables.html)\nmay be used to influence its behaviour. In particular, the\n`ONEAPI_DEVICE_SELECTOR` variable allows to pick a specific device among all\navailable ones.\n\n\n## Algorithms\n\nThe physics simulation is implemented in `src/SequentialSimulation.cpp` for the\nsequential C++ code and in `src/ParallelSimulation.cpp` for the SYCL version.\nIn addition `src/Util.cpp` implements some of the common computation. All other\nfiles in the project only define the data structures and deal with the graphical\ndisplay and application flow. The two simulation files implement the same\ngeneral algorithms, albeit with some adjustments relevant to either CPU or GPU\nprogramming.\n\n\n#### Rigid body motion\nThe implementation of Newton-Euler equations describing 3D rigid body motion\nfollows broadly  \n* D. Baraff, 2001, [*Rigid Body Simulation*](https://graphics.pixar.com/pbm2001/pdf/notesg.pdf)\n\nThe sequential simulation implements the algorithm in the\n`simulateMotionSequential` function, whereas the parallel simulation covers it\nin the `ActorKernel` which updates the full-body kinematic properties and\n`VertexKernel` which updates all body mesh vertex positions accordingly.\n\n#### World boundary collisions\nThe world boundary collision detection runs for every vertex of each actor body\nmesh and compares its location to the world edges. The first vertex detected to\nintersect with a world edge triggers a collision response implemented using the\nimpulse-based model described in:\n* Wikipedia article *Collision response*, section\n[*Impulse-based contact model*](https://en.wikipedia.org/wiki/Collision_response#Impulse-based_contact_model)\n[accessed 10/2023]\n\nwhere one of the colliding bodies is a wall with an infinite mass.\n\nThe sequential simulation (function `collideWorldSequential`) loops over all\nactors, and for each actor loops over all vertices. An early-exit condition is\ntriggered in the inner loop on the first detected collision. The parallel\nsimulation implements the world collision detection as part of the\n`VertexKernel` and stores the collision information for every vertex in memory.\nThis is then copied to host at the end of a simulation step, where the reduction\nto per-actor information and the impulse application are executed on the CPU.\n\n#### Broad-phase collision detection\nThe broad-range collision detection is applied to actors' axis-aligned bounding\nboxes (AABB) following the basic sweep and prune algorithm described in chapter\n2 of:\n* D.J. Tracy, S.R. Buss, B.M. Woods, 2009,\n[Efficient Large-Scale Sweep and Prune Methods with AABB Insertion and Removal](https://mathweb.ucsd.edu/~sbuss/ResearchWeb/EnhancedSweepPrune/SAP_paper_online.pdf)\n\nwhich consists of three stages:\n* calculation of the axis-aligned bounding box (AABB) for each actor\n* sorting AABB edges along each axis\n* overlap detection among the sorted edges and flagging those overlapping along all three axes\n\nFor the sequential implementation, the first stage is part of the actor's\n`updateVertexPositions` procedure called in `simulateMotionSequential` and the\nother two stages are implemented in `collideBroadSequential`. The parallel\nversion implements the three stages in the kernels: `AABBKernel`,\n`AABBSortKernel`, `AABBOverlapKernel`, respectively.\n\nThe sequential implementation follows the paper closely, sorting AABB edges\nalong each axis using insertion sort. This approach is highly optimal for CPU,\nbut challenging to implement reasonably for SIMT architectures like GPUs. For\nthis reason, the parallel simulation uses the odd-even merge-sort algorithm in\nthe sorting step of the algorithm.\n\n#### Narrow-phase collision detection\nAll actor pairs flagged as overlapping by the broad-phase algorithm are subject\nto narrow-phase collision detection. For each such pair of actors (A, B), the\nalgorithm searches for the closest triangle-vertex pair between all triangles of\nactor A and all vertices of actor B, as well as between all triangles of actor B\nand all vertices of actor A.\n\nThe 3D triangle-vertex distance is calculated by transforming the problem into\na 2D space following the \"2D method\" described in:\n* M.W. Jones, 1995,\n[*3D Distance from a Point to a Triangle*](http://www-compsci.swan.ac.uk/~csmark/PDFS/1995_3D_distance_point_to_triangle)\n\nThe \"2D method\" functions are implemented in `src/Util.cpp` and the same code is\nused in the sequential and parallel implementation, exploiting the flexibility\nof SYCL to compile into both device and host code. The `triangleTransform`\nfunction transforms a triangle-vertex pair into the coordinate system described\nin the paper, whereas the `closestPointOnTriangle` function finds a point within\nthe triangle boundaries that is the closest to the vertex, and returns the\nsquared distance between them. Finally, the `closestPointOnTriangleND` function\napplies this to an array of vertices for one triangle, and finds the vertex\nclosest to that triangle.\n\nThe pair of actors is flagged as colliding whenever the closest triangle-vertex\ndistance between them is below a fixed threshold encoded in\n`Constants::NarrowPhaseCollisionThreshold`.\n\nThe triangle and vertex loop logic and threshold application is implemented in\nthe `collideNarrowSequential` function for the sequential simulation. The\nparallel version starts with the `NarrowPhaseKernel` where each thread processes\na single triangle, pairing it with all vertices of the other actor. This is\nfollowed by the `TVReduceKernel` which reduces all the triangle-vertex pairs for\neach actor into a single one with the closest distance squared.\n\nThe parallel algorithm kernels' iteration space is calculated dynamically from\nthe broad-phase results and the computation is launched only for the overlapping\nactor pairs. Due to this, there is a synchronisation point where the broad-phase\nresults are copied to host and processed to define the narrow-phase iteration\nrange.\n\n#### Collision response\nThe collision response uses the impulse-based model described in:\n* Wikipedia article *Collision response*, section\n[*Impulse-based contact model*](https://en.wikipedia.org/wiki/Collision_response#Impulse-based_contact_model)\n[accessed 10/2023]\n\nThe sequential simulation implements it in the `impulseCollision` function and\nthe parallel version in the `ImpulseCollisionKernel`.\n\n## Known issues\n\n#### Sticky collisions\nIn some cases, when clipping occurs between the colliding shapes, the impulse\ncollision algorithm may result in the two objects \"sticking\" to each other for\nsome amount of time. This is a common flaw of the impulse-based collision\nresponse model, however, no workaround has been implemented so far in this\nproject.\n\n#### Segfault in HEADLESS mode without display (X11)\nEven though the headless mode doesn't produce any graphical output, a bug in\nthe underlying X11 library initialisation causes the application to crash on\nstartup when there is no display configured. This can be worked around on X11\nLinux with the X virtual framebuffer (`Xvfb`):\n```sh\nXvfb :1 -screen 0 1024x768x24 -fbdir $(mktemp -d) \u0026\nexport DISPLAY=:1\n./collision-sim\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frafbiels%2Fsycl-collision-sim","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frafbiels%2Fsycl-collision-sim","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frafbiels%2Fsycl-collision-sim/lists"}