{"id":15036092,"url":"https://github.com/projectphysx/fluidx3d","last_synced_at":"2025-05-13T23:09:24.368Z","repository":{"id":53210109,"uuid":"521191759","full_name":"ProjectPhysX/FluidX3D","owner":"ProjectPhysX","description":"The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.","archived":false,"fork":false,"pushed_at":"2025-05-13T04:49:17.000Z","size":22019,"stargazers_count":4416,"open_issues_count":30,"forks_count":382,"subscribers_count":62,"default_branch":"master","last_synced_at":"2025-05-13T05:29:25.241Z","etag":null,"topics":["benchmark","cfd","computational-fluid-dynamics","fluid-dynamics","fluid-simulation","fluid-solver","gpgpu","gpu","gpu-computing","high-performance-computing","hpc","interactive-visualization","lattice-boltzmann","lbm","opencl","physics","raytracing","scientific-computing","scientific-visualization","simulation"],"latest_commit_sha":null,"homepage":"https://youtube.com/@ProjectPhysX","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ProjectPhysX.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-08-04T08:49:44.000Z","updated_at":"2025-05-13T04:49:20.000Z","dependencies_parsed_at":"2023-10-03T12:19:55.714Z","dependency_job_id":"e5989ff1-6bd7-40b8-86bf-2d522c86fcc5","html_url":"https://github.com/ProjectPhysX/FluidX3D","commit_stats":{"total_commits":260,"total_committers":2,"mean_commits":130.0,"dds":"0.46923076923076923","last_synced_commit":"58ca271f2e91256f63fd0b24204cec874e713cfd"},"previous_names":[],"tags_count":29,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProjectPhysX%2FFluidX3D","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProjectPhysX%2FFluidX3D/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProjectPhysX%2FFluidX3D/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProjectPhysX%2FFluidX3D/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ProjectPhysX","download_url":"https://codeload.github.com/ProjectPhysX/FluidX3D/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254042080,"owners_count":22004839,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","cfd","computational-fluid-dynamics","fluid-dynamics","fluid-simulation","fluid-solver","gpgpu","gpu","gpu-computing","high-performance-computing","hpc","interactive-visualization","lattice-boltzmann","lbm","opencl","physics","raytracing","scientific-computing","scientific-visualization","simulation"],"created_at":"2024-09-24T20:30:07.002Z","updated_at":"2025-05-13T23:09:19.354Z","avatar_url":"https://github.com/ProjectPhysX.png","language":"C++","readme":"# FluidX3D\n\nThe fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via [OpenCL](https://github.com/ProjectPhysX/OpenCL-Wrapper \"OpenCL-Wrapper\"). Free for non-commercial use.\n\n\u003ca href=\"https://youtu.be/-MkRBeQkLk8\"\u003e\u003cimg src=\"https://img.youtube.com/vi/o3TPN142HxM/maxresdefault.jpg\" width=\"50%\"\u003e\u003c/img\u003e\u003c/a\u003e\u003ca href=\"https://youtu.be/oC6U1M0Fsug\"\u003e\u003cimg src=\"https://img.youtube.com/vi/oC6U1M0Fsug/maxresdefault.jpg\" width=\"50%\"\u003e\u003c/img\u003e\u003c/a\u003e\u003cbr\u003e\n\u003ca href=\"https://youtu.be/XOfXHgP4jnQ\"\u003e\u003cimg src=\"https://img.youtube.com/vi/XOfXHgP4jnQ/maxresdefault.jpg\" width=\"50%\"\u003e\u003c/img\u003e\u003c/a\u003e\u003ca href=\"https://youtu.be/K5eKxzklXDA\"\u003e\u003cimg src=\"https://img.youtube.com/vi/K5eKxzklXDA/maxresdefault.jpg\" width=\"50%\"\u003e\u003c/img\u003e\u003c/a\u003e\n(click on images to show videos on YouTube)\n\n\u003cdetails\u003e\u003csummary\u003eUpdate History\u003c/summary\u003e\n\n- [v1.0](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v1.0) (04.08.2022) [changes](https://github.com/ProjectPhysX/FluidX3D/commit/768073501af725e392a4b85885009e2fa6400e48) (public release)\n  - public release\n- [v1.1](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v1.1) (29.09.2022) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v1.0...v1.1) (GPU voxelization)\n  - added solid voxelization on GPU (slow algorithm)\n  - added tool to print current camera position (key \u003ckbd\u003eG\u003c/kbd\u003e)\n  - minor bug fix (workaround for Intel iGPU driver bug with triangle rendering)\n- [v1.2](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v1.2) (24.10.2022) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v1.1...v1.2) (force/torque compuatation)\n  - added functions to compute force/torque on objects\n  - added function to translate Mesh\n  - added Stokes drag validation setup\n- [v1.3](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v1.3) (10.11.2022) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v1.2...v1.3) (minor bug fixes)\n  - added unit conversion functions for torque\n  - `FORCE_FIELD` and `VOLUME_FORCE` can now be used independently\n  - minor bug fix (workaround for AMD legacy driver bug with binary number literals)\n- [v1.4](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v1.4) (14.12.2022) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v1.3...v1.4) (Linux graphics)\n  - complete rewrite of C++ graphics library to minimize API dependencies\n  - added interactive graphics mode on Linux with X11\n  - fixed streamline visualization bug in 2D\n- [v2.0](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.0) (09.01.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v1.4...v2.0) (multi-GPU upgrade)\n  - added (cross-vendor) multi-GPU support on a single node (PC/laptop/server)\n- [v2.1](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.1) (15.01.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.0...v2.1) (fast voxelization)\n  - made solid voxelization on GPU lightning fast (new algorithm, from minutes to milliseconds)\n- [v2.2](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.0) (20.01.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.1...v2.2) (velocity voxelization)\n  - added option to voxelize moving/rotating geometry on GPU, with automatic velocity initialization for each grid point based on center of rotation, linear velocity and rotational velocity\n  - cells that are converted from solid-\u003efluid during re-voxelization now have their DDFs properly initialized\n  - added option to not auto-scale mesh during `read_stl(...)`, with negative `size` parameter\n  - added kernel for solid boundary rendering with marching-cubes\n- [v2.3](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.3) (30.01.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.2...v2.3) (particles)\n  - added particles with immersed-boundary method (either passive or 2-way-coupled, only supported with single-GPU)\n  - minor optimization to GPU voxelization algorithm (workgroup threads outside mesh bounding-box return after ray-mesh intersections have been found)\n  - displayed GPU memory allocation size is now fully accurate\n  - fixed bug in `write_line()` function in `src/utilities.hpp`\n  - removed `.exe` file extension for Linux/macOS\n- [v2.4](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.4) (11.03.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.3...v2.4) (UI improvements)\n  - added a help menu with key \u003ckbd\u003eH\u003c/kbd\u003e that shows keyboard/mouse controls, visualization settings and simulation stats\n  - improvements to keyboard/mouse control (\u003ckbd\u003e+\u003c/kbd\u003e/\u003ckbd\u003e-\u003c/kbd\u003e for zoom, \u003ckbd\u003emouseclick\u003c/kbd\u003e frees/locks cursor)\n  - added suggestion of largest possible grid resolution if resolution is set larger than memory allows\n  - minor optimizations in multi-GPU communication (insignificant performance difference)\n  - fixed bug in temperature equilibrium function for temperature extension\n  - fixed erroneous double literal for Intel iGPUs in skybox color functions\n  - fixed bug in make.sh where multi-GPU device IDs would not get forwarded to the executable\n  - minor bug fixes in graphics engine (free cursor not centered during rotation, labels in VR mode)\n  - fixed bug in `LBM::voxelize_stl()` size parameter standard initialization\n- [v2.5](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.5) (11.04.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.4...v2.5) (raytracing overhaul)\n  - implemented light absorption in fluid for raytracing graphics (no performance impact)\n  - improved raytracing framerate when camera is inside fluid\n  - fixed skybox pole flickering artifacts\n  - fixed bug where moving objects during re-voxelization would leave an erroneous trail of solid grid cells behind\n- [v2.6](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.6) (16.04.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.5...v2.6) (Intel Arc patch)\n  - patched OpenCL issues of Intel Arc GPUs: now VRAM allocations \u003e4GB are possible and correct VRAM capacity is reported\n- [v2.7](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.7) (29.05.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.6...v2.7) (visualization upgrade)\n  - added slice visualization (key \u003ckbd\u003e2\u003c/kbd\u003e / key \u003ckbd\u003e3\u003c/kbd\u003e modes, then switch through slice modes with key \u003ckbd\u003eT\u003c/kbd\u003e, move slice with keys \u003ckbd\u003eQ\u003c/kbd\u003e/\u003ckbd\u003eE\u003c/kbd\u003e)\n  - made flag wireframe / solid surface visualization kernels toggleable with key \u003ckbd\u003e1\u003c/kbd\u003e\n  - added surface pressure visualization (key \u003ckbd\u003e1\u003c/kbd\u003e when `FORCE_FIELD` is enabled and `lbm.calculate_force_on_boundaries();` is called)\n  - added binary `.vtk` export function for meshes with `lbm.write_mesh_to_vtk(Mesh* mesh);`\n  - added `time_step_multiplicator` for `integrate_particles()` function in PARTICLES extension\n  - made correction of wrong memory reporting on Intel Arc more robust\n  - fixed bug in `write_file()` template functions\n  - reverted back to separate `cl::Context` for each OpenCL device, as the shared Context otherwise would allocate extra VRAM on all other unused Nvidia GPUs\n  - removed Debug and x86 configurations from Visual Studio solution file (one less complication for compiling)\n  - fixed bug that particles could get too close to walls and get stuck, or leave the fluid phase (added boundary force)\n- [v2.8](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.8) (24.06.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.7...v2.8) (documentation + polish)\n  - finally added more [documentation](DOCUMENTATION.md)\n  - cleaned up all sample setups in `setup.cpp` for more beginner-friendliness, and added required extensions in `defines.hpp` as comments to all setups\n  - improved loading of composite `.stl` geometries, by adding an option to omit automatic mesh repositioning, added more functionality to `Mesh` struct in `utilities.hpp`\n  - added `uint3 resolution(float3 box_aspect_ratio, uint memory)` function to compute simulation box resolution based on box aspect ratio and VRAM occupation in MB\n  - added `bool lbm.graphics.next_frame(...)` function to export images for a specified video length in the `main_setup` compute loop\n  - added `VIS_...` macros to ease setting visualization modes in headless graphics mode in `lbm.graphics.visualization_modes`\n  - simulation box dimensions are now automatically made equally divisible by domains for multi-GPU simulations\n  - fixed Info/Warning/Error message formatting for loading files and made Info/Warning/Error message labels colored\n  - added Ahmed body setup as an example on how body forces and drag coefficient are computed\n  - added Cessna 172 and Bell 222 setups to showcase loading composite .stl geometries and revoxelization of moving parts\n  - added optional semi-transparent rendering mode (`#define GRAPHICS_TRANSPARENCY 0.7f` in `defines.hpp`)\n  - fixed flickering of streamline visualization in interactive graphics\n  - improved smooth positioning of streamlines in slice mode\n  - fixed bug where `mass` and `massex` in `SURFACE` extension were also allocated in CPU RAM (not required)\n  - fixed bug in Q-criterion rendering of halo data in multi-GPU mode, reduced gap width between domains\n  - removed shared memory optimization from mesh voxelization kernel, as it crashes on Nvidia GPUs with new GPU drivers and is incompatible with old OpenCL 1.0 GPUs\n  - fixed raytracing attenuation color when no surface is at the simulation box walls with periodic boundaries\n- [v2.9](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.9) (31.07.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.8...v2.9) (multithreading)\n  - added cross-platform `parallel_for` implementation in `utilities.hpp` using `std::threads`\n  - significantly (\u003e4x) faster simulation startup with multithreaded geometry initialization and sanity checks\n  - faster `calculate_force_on_object()` and `calculate_torque_on_object()` functions with multithreading\n  - added total runtime and LBM runtime to `lbm.write_status()`\n  - fixed bug in voxelization ray direction for re-voxelizing rotating objects\n  - fixed bug in `Mesh::get_bounding_box_size()`\n  - fixed bug in `print_message()` function in `utilities.hpp`\n- [v2.10](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.10) (05.11.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.9...v2.10) (frustrum culling)\n  - improved rasterization performance via frustrum culling when only part of the simulation box is visible\n  - improved switching between centered/free camera mode\n  - refactored OpenCL rendering library\n  - unit conversion factors are now automatically printed in console when `units.set_m_kg_s(...)` is used\n  - faster startup time for FluidX3D benchmark\n  - miner bug fix in `voxelize_mesh(...)` kernel\n  - fixed bug in `shading(...)`\n  - replaced slow (in multithreading) `std::rand()` function with standard C99 LCG\n  - more robust correction of wrong VRAM capacity reporting on Intel Arc GPUs\n  - fixed some minor compiler warnings\n- [v2.11](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.11) (07.12.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.10...v2.11) (improved Linux graphics)\n  - interactive graphics on Linux are now in fullscreen mode too, fully matching Windows\n  - made CPU/GPU buffer initialization significantly faster with `std::fill` and `enqueueFillBuffer` (overall ~8% faster simulation startup)\n  - added operating system info to OpenCL device driver version printout\n  - fixed flickering with frustrum culling at very small field of view\n  - fixed bug where rendered/exported frame was not updated when `visualization_modes` changed\n- [v2.12](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.12) (18.01.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.11...v2.12) (faster startup)\n  - ~3x faster source code compiling on Linux using multiple CPU cores if [`make`](https://www.gnu.org/software/make/) is installed\n  - significantly faster simulation initialization (~40% single-GPU, ~15% multi-GPU)\n  - minor bug fix in `Memory_Container::reset()` function\n- [v2.13](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.13) (11.02.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.12...v2.13) (improved .vtk export)\n  - data in exported `.vtk` files is now automatically converted to SI units\n  - ~2x faster `.vtk` export with multithreading\n  - added unit conversion functions for `TEMPERATURE` extension\n  - fixed graphical artifacts with axis-aligned camera in raytracing\n  - fixed `get_exe_path()` for macOS\n  - fixed X11 multi-monitor issues on Linux\n  - workaround for Nvidia driver bug: `enqueueFillBuffer` is broken for large buffers on Nvidia GPUs\n  - fixed slow numeric drift issues caused by `-cl-fast-relaxed-math`\n  - fixed wrong Maximum Allocation Size reporting in `LBM::write_status()`\n  - fixed missing scaling of coordinates to SI units in `LBM::write_mesh_to_vtk()`\n- [v2.14](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.14) (03.03.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.13...v2.14) (visualization upgrade)\n  - coloring can now be switched between velocity/density/temperature with key \u003ckbd\u003eZ\u003c/kbd\u003e\n  - uniform improved color palettes for velocity/density/temperature visualization\n  - color scale with automatic unit conversion can now be shown with key \u003ckbd\u003eH\u003c/kbd\u003e\n  - slice mode for field visualization now draws fully filled-in slices instead of only lines for velocity vectors\n  - shading in `VIS_FLAG_SURFACE` and `VIS_PHI_RASTERIZE` modes is smoother now\n  - `make.sh` now automatically detects operating system and X11 support on Linux and only runs FluidX3D if last compilation was successful\n  - fixed compiler warnings on Android\n  - fixed `make.sh` failing on some systems due to nonstandard interpreter path\n  - fixed that `make` would not compile with multiple cores on some systems\n- [v2.15](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.15) (09.04.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.14...v2.15) (framerate boost)\n  - eliminated one frame memory copy and one clear frame operation in rendering chain, for 20-70% higher framerate on both Windows and Linux\n  - enabled `g++` compiler optimizations for faster startup and higher rendering framerate\n  - fixed bug in multithreaded sanity checks\n  - fixed wrong unit conversion for thermal expansion coefficient\n  - fixed density to pressure conversion in LBM units\n  - fixed bug that raytracing kernel could lock up simulation\n  - fixed minor visual artifacts with raytracing\n  - fixed that console sometimes was not cleared before `INTERACTIVE_GRAPHICS_ASCII` rendering starts\n- [v2.16](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.16) (02.05.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.15...v2.16) (bug fixes)\n  - simplified 10% faster marching-cubes implementation with 1D interpolation on edges instead of 3D interpolation, allowing to get rid of edge table\n  - added faster, simplified marching-cubes variant for solid surface rendering where edges are always halfway between grid cells\n  - refactoring in OpenCL rendering kernels\n  - fixed that voxelization failed in Intel OpenCL CPU Runtime due to array out-of-bounds access\n  - fixed that voxelization did not always produce binary identical results in multi-GPU compared to single-GPU\n  - fixed that velocity voxelization failed for free surface simulations\n  - fixed terrible performance on ARM GPUs by macro-replacing fused-multiply-add (`fma`) with `a*b+c`\n  - fixed that \u003ckbd\u003eY\u003c/kbd\u003e/\u003ckbd\u003eZ\u003c/kbd\u003e keys were incorrect for `QWERTY` keyboard layout in Linux\n  - fixed that free camera movement speed in help overlay was not updated in stationary image when scrolling\n  - fixed that cursor would sometimes flicker when scrolling on trackpads with Linux-X11 interactive graphics\n  - fixed flickering of interactive rendering with multi-GPU when camera is not moved\n  - fixed missing `XInitThreads()` call that could crash Linux interactive graphics on some systems\n  - fixed z-fighting between `graphics_rasterize_phi()` and `graphics_flags_mc()` kernels\n- [v2.17](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.17) (05.06.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.16...v2.17) (unlimited domain resolution)\n  - domains are no longer limited to 4.29 billion (2³², 1624³) grid cells or 225 GB memory; if more are used, the OpenCL code will automatically compile with 64-bit indexing\n  - new, faster raytracing-based field visualization for single-GPU simulations\n  - added [GPU Driver and OpenCL Runtime installation instructions](DOCUMENTATION.md#0-install-gpu-drivers-and-opencl-runtime) to documentation\n  - refactored `INTERACTIVE_GRAPHICS_ASCII`\n  - fixed memory leak in destructors of `floatN`, `floatNxN`, `doubleN`, `doubleNxN` (all unused)\n  - made camera movement/rotation/zoom behavior independent of framerate\n  - fixed that `smart_device_selection()` would print a wrong warning if device reports 0 MHz clock speed\n- [v2.18](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.18) (21.07.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.17...v2.18) (more bug fixes)\n  - added support for high refresh rate monitors on Linux\n  - more compact OpenCL Runtime installation scripts in Documentation\n  - driver/runtime installation instructions will now be printed to console if no OpenCL devices are available\n  - added domain information to `LBM::write_status()`\n  - added `LBM::index` function for `uint3` input parameter\n  - fixed that very large simulations sometimes wouldn't render properly by increasing maximum render distance from 10k to 2.1M\n  - fixed mouse input stuttering at high screen refresh rate on Linux\n  - fixed graphical artifacts in free surface raytracing on Intel CPU Runtime for OpenCL\n  - fixed runtime estimation printed in console for setups with multiple `lbm.run(...)` calls\n  - fixed density oscillations in sample setups (too large `lbm_u`)\n  - fixed minor graphical artifacts in `raytrace_phi()`\n  - fixed minor graphical artifacts in `ray_grid_traverse_sum()`\n  - fixed wrong printed time step count on raindrop sample setup\n- [v2.19](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.19) (07.09.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.18...v2.19) (camera splines)\n  - the camera can now fly along a smooth path through a list of provided keyframe camera placements, [using Catmull-Rom splines](https://github.com/ProjectPhysX/FluidX3D/blob/master/DOCUMENTATION.md#video-rendering)\n  - more accurate remaining runtime estimation that includes time spent on rendering\n  - enabled FP16S memory compression by default\n  - printed camera placement using key \u003ckbd\u003eG\u003c/kbd\u003e is now formatted for easier copy/paste\n  - added benchmark chart in Readme using mermaid gantt chart\n  - placed memory allocation info during simulation startup at better location\n  - fixed threading conflict between `INTERACTIVE_GRAPHICS` and `lbm.graphics.write_frame();`\n  - fixed maximum buffer allocation size limit for AMD GPUs and in Intel CPU Runtime for OpenCL\n  - fixed wrong `Re\u003cRe_max` info printout for 2D simulations\n  - minor fix in `bandwidth_bytes_per_cell_device()`\n- [v3.0](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.0) (16.11.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.19...v3.0) (larger CPU/iGPU simulations)\n  - reduced memory footprint on CPUs and iGPU from 72 to 55 Bytes/cell (fused OpenCL host+device buffers for `rho`/`u`/`flags`), allowing 31% higher resolution in the same RAM capacity\n  - faster hardware-supported and faster fallback emulation atomic floating-point addition for `PARTICLES` extension\n  - hardened `calculate_f_eq()` against bad user input for `D2Q9`\n  - fixed velocity voxelization for overlapping geometry with different velocity\n  - fixed Remaining Time printout during paused simulation\n  - fixed CPU/GPU memory printout for CPU/iGPU simulations\n- [v3.1](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.1) (08.02.2025) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v3.0...v3.1) (more bug fixes)\n  - faster `enqueueReadBuffer()` on modern CPUs with 64-Byte-aligned `host_buffer`\n  - hardened ray intersection functions against planar ray edge case\n  - updated OpenCL headers\n  - better OpenCL device specs detection using vendor ID and Nvidia compute capability\n  - better VRAM capacity reporting correction for Intel dGPUs\n  - improved styling of performance mermaid gantt chart in Readme\n  - added multi-GPU performance mermaid gantt chart in Readme\n  - updated driver install guides\n  - fixed voxelization being broken on some GPUs\n  - added workaround for compiler bug in Intel CPU Runtime for OpenCL that causes Q-criterion isosurface rendering corruption\n  - fixed TFlops estimate for Intel Battlemage GPUs\n  - fixed wrong device name reporting for AMD GPUs\n- [v3.2](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.2) (09.03.2025) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v3.1...v3.2) (fast force/torque summation)\n  - implemented GPU-accelerated force/torque summation (~20x faster than CPU-multithreaded implementation before)\n  - simplified calculating object force/torque in setups\n  - improved coloring in `VIS_FIELD`/`ray_grid_traverse_sum()`\n  - updated OpenCL-Wrapper now compiles OpenCL C code with `-cl-std=CL3.0` if available\n  - fixed compiling on macOS with new OpenCL headers\n\n\u003c/details\u003e\n\n\n\n## How to get started?\n\nRead the [FluidX3D Documentation](DOCUMENTATION.md)!\n\n\n\n## Compute Features - Getting the Memory Problem under Control\n\n- \u003cdetails\u003e\u003csummary\u003e\u003ca name=\"cfd-model\"\u003e\u003c/a\u003eCFD model: lattice Boltzmann method (LBM)\u003c/summary\u003e\n\n  - streaming (part 2/2)\u003cp align=\"center\"\u003e\u003ci\u003ef\u003c/i\u003e\u003csub\u003e0\u003c/sub\u003e\u003csup\u003etemp\u003c/sup\u003e(\u003ci\u003ex\u003c/i\u003e,\u003ci\u003et\u003c/i\u003e) = \u003ci\u003ef\u003c/i\u003e\u003csub\u003e0\u003c/sub\u003e(\u003ci\u003ex\u003c/i\u003e, \u003ci\u003et\u003c/i\u003e)\u003cbr\u003e\u003ci\u003ef\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e\u003csup\u003etemp\u003c/sup\u003e(\u003ci\u003ex\u003c/i\u003e,\u003ci\u003et\u003c/i\u003e) = \u003ci\u003ef\u003c/i\u003e\u003csub\u003e(\u003ci\u003et\u003c/i\u003e%2 ? \u003ci\u003ei\u003c/i\u003e : (\u003ci\u003ei\u003c/i\u003e%2 ? \u003ci\u003ei\u003c/i\u003e+1 : \u003ci\u003ei\u003c/i\u003e-1))\u003c/sub\u003e(\u003ci\u003ei\u003c/i\u003e%2 ? \u003ci\u003ex\u003c/i\u003e : \u003ci\u003ex\u003c/i\u003e-\u003ci\u003ee\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e, \u003ci\u003et\u003c/i\u003e) \u0026nbsp; for \u0026nbsp; \u003ci\u003ei\u003c/i\u003e \u0026isin; [1, \u003ci\u003eq\u003c/i\u003e-1]\u003c/p\u003e\n  - collision\u003cp align=\"center\"\u003e\u003ci\u003e\u0026rho;\u003c/i\u003e(\u003ci\u003ex\u003c/i\u003e,\u003ci\u003et\u003c/i\u003e) = (\u0026Sigma;\u003csub\u003e\u003ci\u003ei\u003c/i\u003e\u003c/sub\u003e \u003ci\u003ef\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e\u003csup\u003etemp\u003c/sup\u003e(\u003ci\u003ex\u003c/i\u003e,\u003ci\u003et\u003c/i\u003e)) + 1\u003cbr\u003e\u003cbr\u003e\u003ci\u003eu\u003c/i\u003e(\u003ci\u003ex\u003c/i\u003e,\u003ci\u003et\u003c/i\u003e) = \u003csup\u003e1\u003c/sup\u003e\u0026#8725;\u003csub\u003e\u003ci\u003e\u0026rho;\u003c/i\u003e(\u003ci\u003ex\u003c/i\u003e,\u003ci\u003et\u003c/i\u003e)\u003c/sub\u003e \u0026Sigma;\u003csub\u003e\u003ci\u003ei\u003c/i\u003e\u003c/sub\u003e \u003ci\u003ec\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e \u003ci\u003ef\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e\u003csup\u003etemp\u003c/sup\u003e(\u003ci\u003ex\u003c/i\u003e,\u003ci\u003et\u003c/i\u003e)\u003cbr\u003e\u003cbr\u003e\u003ci\u003ef\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e\u003csup\u003eeq-shifted\u003c/sup\u003e(\u003ci\u003ex\u003c/i\u003e,\u003ci\u003et\u003c/i\u003e) = \u003ci\u003ew\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e \u003ci\u003e\u0026rho;\u003c/i\u003e · (\u003csup\u003e(\u003ci\u003eu\u003c/i\u003e\u003csub\u003e°\u003c/sub\u003e\u003ci\u003ec\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e)\u003csup\u003e2\u003c/sup\u003e\u003c/sup\u003e\u0026#8725;\u003csub\u003e(2\u003ci\u003ec\u003c/i\u003e\u003csup\u003e4\u003c/sup\u003e)\u003c/sub\u003e - \u003csup\u003e(\u003ci\u003eu\u003c/i\u003e\u003csub\u003e°\u003c/sub\u003e\u003ci\u003eu\u003c/i\u003e)\u003c/sup\u003e\u0026#8725;\u003csub\u003e(2c\u003csup\u003e2\u003c/sup\u003e)\u003c/sub\u003e + \u003csup\u003e(\u003ci\u003eu\u003c/i\u003e\u003csub\u003e°\u003c/sub\u003e\u003ci\u003ec\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e)\u003c/sup\u003e\u0026#8725;\u003csub\u003e\u003ci\u003ec\u003c/i\u003e\u003csup\u003e2\u003c/sup\u003e\u003c/sub\u003e) + \u003ci\u003ew\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e (\u003ci\u003e\u0026rho;\u003c/i\u003e-1)\u003cbr\u003e\u003cbr\u003e\u003ci\u003ef\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e\u003csup\u003etemp\u003c/sup\u003e(\u003ci\u003ex\u003c/i\u003e, \u003ci\u003et\u003c/i\u003e+\u0026Delta;\u003ci\u003et\u003c/i\u003e) = \u003ci\u003ef\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e\u003csup\u003etemp\u003c/sup\u003e(\u003ci\u003ex\u003c/i\u003e,\u003ci\u003et\u003c/i\u003e) + \u003ci\u003e\u0026Omega;\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e(\u003ci\u003ef\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e\u003csup\u003etemp\u003c/sup\u003e(\u003ci\u003ex\u003c/i\u003e,\u003ci\u003et\u003c/i\u003e), \u003ci\u003ef\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e\u003csup\u003eeq-shifted\u003c/sup\u003e(\u003ci\u003ex\u003c/i\u003e,\u003ci\u003et\u003c/i\u003e), \u003ci\u003e\u0026tau;\u003c/i\u003e)\u003c/p\u003e\n  - streaming (part 1/2)\u003cp align=\"center\"\u003e\u003ci\u003ef\u003c/i\u003e\u003csub\u003e0\u003c/sub\u003e(\u003ci\u003ex\u003c/i\u003e, \u003ci\u003et\u003c/i\u003e+\u0026Delta;\u003ci\u003et\u003c/i\u003e) = \u003ci\u003ef\u003c/i\u003e\u003csub\u003e0\u003c/sub\u003e\u003csup\u003etemp\u003c/sup\u003e(\u003ci\u003ex\u003c/i\u003e, \u003ci\u003et\u003c/i\u003e+\u0026Delta;\u003ci\u003et\u003c/i\u003e)\u003cbr\u003e\u003ci\u003ef\u003c/i\u003e\u003csub\u003e(\u003ci\u003et\u003c/i\u003e%2 ? (\u003ci\u003ei\u003c/i\u003e%2 ? \u003ci\u003ei\u003c/i\u003e+1 : \u003ci\u003ei\u003c/i\u003e-1) : \u003ci\u003ei\u003c/i\u003e)\u003c/sub\u003e(\u003ci\u003ei\u003c/i\u003e%2 ? \u003ci\u003ex\u003c/i\u003e+\u003ci\u003ee\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e : \u003ci\u003ex\u003c/i\u003e, \u003ci\u003et\u003c/i\u003e+\u0026Delta;\u003ci\u003et\u003c/i\u003e) = \u003ci\u003ef\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e\u003csup\u003etemp\u003c/sup\u003e(\u003ci\u003ex\u003c/i\u003e, \u003ci\u003et\u003c/i\u003e+\u0026Delta;\u003ci\u003et\u003c/i\u003e) \u0026nbsp; for \u0026nbsp; \u003ci\u003ei\u003c/i\u003e \u0026isin; [1, \u003ci\u003eq\u003c/i\u003e-1]\u003c/p\u003e\n  - \u003cdetails\u003e\u003csummary\u003evariables and \u003ca href=\"https://doi.org/10.15495/EPub_UBT_00005400\"\u003enotation\u003c/a\u003e\u003c/summary\u003e\n\n    | variable             | SI units                            | defining equation                                   | description                                                                     |\n    | :------------------: | :---------------------------------: | :-------------------------------------------------: | :------------------------------------------------------------------------------ |\n    |                      |                                     |                                                     |                                                                                 |\n    | \u003ci\u003ex\u003c/i\u003e             | m                                   | \u003ci\u003ex\u003c/i\u003e = (x,y,z)\u003csup\u003eT\u003c/sup\u003e                      | 3D position in Cartesian coordinates                                            |\n    | \u003ci\u003et\u003c/i\u003e             | s                                   | -                                                   | time                                                                            |\n    | \u003ci\u003e\u0026rho;\u003c/i\u003e         | \u003csup\u003ekg\u003c/sup\u003e\u0026#8725;\u003csub\u003em³\u003c/sub\u003e   | \u003ci\u003e\u0026rho;\u003c/i\u003e = (\u0026Sigma;\u003csub\u003e\u003ci\u003ei\u003c/i\u003e\u003c/sub\u003e \u003ci\u003ef\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e)+1 | mass density of fluid                                            |\n    | \u003ci\u003ep\u003c/i\u003e             | \u003csup\u003ekg\u003c/sup\u003e\u0026#8725;\u003csub\u003em\u0026nbsp;s²\u003c/sub\u003e | \u003ci\u003ep\u003c/i\u003e = \u003ci\u003ec\u003c/i\u003e² \u003ci\u003e\u0026rho;\u003c/i\u003e              | pressure of fluid                                                               |\n    | \u003ci\u003eu\u003c/i\u003e | \u003csup\u003em\u003c/sup\u003e\u0026#8725;\u003csub\u003es\u003c/sub\u003e | \u003ci\u003eu\u003c/i\u003e = \u003csup\u003e1\u003c/sup\u003e\u0026#8725;\u003csub\u003e\u003ci\u003e\u0026rho;\u003c/i\u003e\u003c/sub\u003e \u0026Sigma;\u003csub\u003e\u003ci\u003ei\u003c/i\u003e\u003c/sub\u003e \u003ci\u003ec\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e \u003ci\u003ef\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e | velocity of fluid        |\n    | \u003ci\u003e\u0026nu;\u003c/i\u003e          | \u003csup\u003em²\u003c/sup\u003e\u0026#8725;\u003csub\u003es\u003c/sub\u003e    | \u003ci\u003e\u0026nu;\u003c/i\u003e = \u003csup\u003e\u003ci\u003e\u0026mu;\u003c/i\u003e\u003c/sup\u003e\u0026#8725;\u003csub\u003e\u003ci\u003e\u0026rho;\u003c/i\u003e\u003c/sub\u003e | kinematic shear viscosity of fluid                               |\n    | \u003ci\u003e\u0026mu;\u003c/i\u003e          | \u003csup\u003ekg\u003c/sup\u003e\u0026#8725;\u003csub\u003em\u0026nbsp;s\u003c/sub\u003e | \u003ci\u003e\u0026mu;\u003c/i\u003e = \u003ci\u003e\u0026rho;\u003c/i\u003e \u003ci\u003e\u0026nu;\u003c/i\u003e          | dynamic viscosity of fluid                                                      |\n    |                      |                                     |                                                     |                                                                                 |\n    | \u003ci\u003ef\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e | \u003csup\u003ekg\u003c/sup\u003e\u0026#8725;\u003csub\u003em³\u003c/sub\u003e   | -                                                   | shifted density distribution functions (DDFs)                                   |\n    | \u0026Delta;\u003ci\u003ex\u003c/i\u003e      | m                                   | \u0026Delta;\u003ci\u003ex\u003c/i\u003e = 1                                 | lattice constant (in LBM units)                                                 |\n    | \u0026Delta;\u003ci\u003et\u003c/i\u003e      | s                                   | \u0026Delta;\u003ci\u003et\u003c/i\u003e = 1                                 | simulation time step (in LBM units)                                             |\n    | \u003ci\u003ec\u003c/i\u003e | \u003csup\u003em\u003c/sup\u003e\u0026#8725;\u003csub\u003es\u003c/sub\u003e | \u003ci\u003ec\u003c/i\u003e = \u003csup\u003e1\u003c/sup\u003e\u0026#8725;\u003csub\u003e\u0026radic;3\u003c/sub\u003e \u003csup\u003e\u0026Delta;\u003ci\u003ex\u003c/i\u003e\u003c/sup\u003e\u0026#8725;\u003csub\u003e\u0026Delta;\u003ci\u003et\u003c/i\u003e\u003c/sub\u003e | lattice speed of sound (in LBM units) |\n    | \u003ci\u003ei\u003c/i\u003e             | 1                                   | 0 \u0026le; \u003ci\u003ei\u003c/i\u003e \u003c \u003ci\u003eq\u003c/i\u003e                          | LBM streaming direction index                                                   |\n    | \u003ci\u003eq\u003c/i\u003e             | 1                                   | \u003ci\u003eq\u003c/i\u003e \u0026isin; {\u0026nbsp;9,15,19,27\u0026nbsp;}            | number of LBM streaming directions                                              |\n    | \u003ci\u003ee\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e | m                                   | D2Q9 / D3Q15/19/27                                  | LBM streaming directions                                                        |\n    | \u003ci\u003ec\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e | \u003csup\u003em\u003c/sup\u003e\u0026#8725;\u003csub\u003es\u003c/sub\u003e     | \u003ci\u003ec\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e = \u003csup\u003e\u003ci\u003ee\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e\u003c/sup\u003e\u0026#8725;\u003csub\u003e\u0026Delta;\u003ci\u003et\u003c/i\u003e\u003c/sub\u003e | LBM streaming velocities                    |\n    | \u003ci\u003ew\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e | 1                                   | \u0026Sigma;\u003csub\u003e\u003ci\u003ei\u003c/i\u003e\u003c/sub\u003e \u003ci\u003ew\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e = 1 | LBM velocity set weights                                                        |\n    | \u003ci\u003e\u0026Omega;\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e | \u003csup\u003ekg\u003c/sup\u003e\u0026#8725;\u003csub\u003em³\u003c/sub\u003e | SRT or TRT                                      | LBM collision operator                                                          |\n    | \u003ci\u003e\u0026tau;\u003c/i\u003e         | s                                  | \u003ci\u003e\u0026tau;\u003c/i\u003e = \u003csup\u003e\u003ci\u003e\u0026nu;\u003c/i\u003e\u003c/sup\u003e\u0026#8725;\u003csub\u003e\u003ci\u003ec\u003c/i\u003e²\u003c/sub\u003e + \u003csup\u003e\u0026Delta;\u003ci\u003et\u003c/i\u003e\u003c/sup\u003e\u0026#8725;\u003csub\u003e2\u003c/sub\u003e | LBM relaxation time |\n\n    \u003c/details\u003e\n  - velocity sets: D2Q9, D3Q15, D3Q19 (default), D3Q27\n  - collision operators: single-relaxation-time (SRT/BGK) (default), two-relaxation-time (TRT)\n  - [DDF-shifting](https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats) and other algebraic optimization to minimize round-off error\n\n  \u003c/details\u003e\n\n\u003c!-- markdown equations don't render properly in mobile browser\n  - streaming (part 2/2):\n$$j=0\\\\ \\textrm{for}\\\\ i=0$$\n$$j=t\\\\%2\\\\ ?\\\\ i\\\\ :\\\\ (i\\\\%2\\\\ ?\\\\ i+1\\\\ :\\\\ i-1)\\\\ \\textrm{for}\\\\ i\\in[1,q-1]$$\n$$f_i^\\textrm{temp}(\\vec{x},t)=f_j(i\\\\%2\\\\ ?\\\\ \\vec{x}\\\\ :\\\\ \\vec{x}-\\vec{e}_i,\\\\ t)$$\n  - collision:\n$$\\rho(\\vec{x},t)=\\left(\\sum_i f_i^\\textrm{temp}(\\vec{x},t)\\right)+1$$\n$$\\vec{u}(\\vec{x},t)=\\frac{1}{\\rho(\\vec{x},t)}\\sum_i\\vec{c}_i f_i^\\textrm{temp}(\\vec{x},t)$$\n$$f_i^\\textrm{eq-shifted}(\\vec{x},t)=w_i \\rho \\cdot\\left(\\frac{(\\vec{u} _{^{^\\circ}}\\vec{c}_i)^2}{2 c^4}-\\frac{\\vec{u} _{^{^\\circ}}\\vec{u}}{2 c^2}+\\frac{\\vec{u} _{^{^\\circ}}\\vec{c}_i}{c^2}\\right)+w_i (\\rho-1)$$\n$$f_i^\\textrm{temp}(\\vec{x},\\\\ t+\\Delta t)=f_i^\\textrm{temp}(\\vec{x},t)+\\Omega_i(f_i^\\textrm{temp}(\\vec{x},t),\\\\ f_i^\\textrm{eq-shifted}(\\vec{x},t),\\\\ \\tau)$$\n  - streaming (part 1/2):\n$$j=0\\\\ \\textrm{for}\\\\ i=0$$\n$$j=t\\\\%2\\\\ ?\\\\ (i\\\\%2\\\\ ?\\\\ i+1\\\\ :\\\\ i-1)\\\\ :\\\\ i\\\\ \\textrm{for}\\\\ i\\in[1,q-1]$$\n$$f_j(i\\\\%2\\\\ ?\\\\ \\vec{x}+\\vec{e}_i\\\\ :\\\\ \\vec{x},\\\\ t+\\Delta t)=f_i^\\textrm{temp}(\\vec{x},\\\\ t+\\Delta t)$$\n --\u003e\n\n- \u003cdetails\u003e\u003csummary\u003e\u003ca name=\"vram-footprint\"\u003e\u003c/a\u003eoptimized to minimize VRAM footprint to 1/6 of other LBM codes\u003c/summary\u003e\n\n  - traditional LBM (D3Q19) with FP64 requires ~344 Bytes/cell\u003cbr\u003e\n    - 🟧🟧🟧🟧🟧🟧🟧🟧🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟨🟨🟨🟨🟨🟨🟨🟨🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥\u003cbr\u003e(density 🟧, velocity 🟦, flags 🟨, 2 copies of DDFs 🟩/🟥; each square = 1 Byte)\n    - allows for 3 Million cells per 1 GB VRAM\n  - FluidX3D (D3Q19) requires only 55 Bytes/cell with [Esoteric-Pull](https://doi.org/10.3390/computation10060092)+[FP16](https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats)\u003cbr\u003e\n    - 🟧🟧🟧🟧🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟨🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩\u003cbr\u003e(density 🟧, velocity 🟦, flags 🟨, DDFs 🟩; each square = 1 Byte)\n    - allows for 19 Million cells per 1 GB VRAM\n    - in-place streaming with [Esoteric-Pull](https://doi.org/10.3390/computation10060092): eliminates redundant copy of density distribution functions (DDFs) in memory; almost cuts memory demand in half and slightly increases performance due to implicit bounce-back boundaries; offers optimal memory access patterns for single-cell in-place streaming\n    - [decoupled arithmetic precision (FP32) and memory precision (FP32 or FP16S or FP16C)](https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats): all arithmetic is done in FP32 for compatibility on all hardware, but DDFs in memory can be compressed to FP16S or FP16C: almost cuts memory demand in half again and almost doubles performance, without impacting overall accuracy for most setups\n    - \u003cdetails\u003e\u003csummary\u003eonly 8 flag bits per lattice point (can be used independently / at the same time)\u003c/summary\u003e\n\n      - `TYPE_S` (stationary or moving) solid boundaries\n      - `TYPE_E` equilibrium boundaries (inflow/outflow)\n      - `TYPE_T` temperature boundaries\n      - `TYPE_F` free surface (fluid)\n      - `TYPE_I` free surface (interface)\n      - `TYPE_G` free surface (gas)\n      - `TYPE_X` remaining for custom use or further extensions\n      - `TYPE_Y` remaining for custom use or further extensions\n\n      \u003c/details\u003e\n  - large cost saving: comparison of maximum single-GPU grid resolution for D3Q19 LBM\n\n    | GPU\u0026nbsp;VRAM\u0026nbsp;capacity      | 1\u0026nbsp;GB | 2\u0026nbsp;GB | 3\u0026nbsp;GB | 4\u0026nbsp;GB | 6\u0026nbsp;GB | 8\u0026nbsp;GB | 10\u0026nbsp;GB | 11\u0026nbsp;GB | 12\u0026nbsp;GB | 16\u0026nbsp;GB | 20\u0026nbsp;GB | 24\u0026nbsp;GB | 32\u0026nbsp;GB | 40\u0026nbsp;GB | 48\u0026nbsp;GB | 64\u0026nbsp;GB | 80\u0026nbsp;GB | 94\u0026nbsp;GB | 128\u0026nbsp;GB | 192\u0026nbsp;GB | 256\u0026nbsp;GB |\n    | :------------------------------- | --------: | --------: | --------: | --------: | --------: | --------: | ---------: | ---------: | ---------: | ---------: | ---------: | ---------: | ---------: | ---------: | ---------: | ---------: | ---------: | ---------: | ----------: | ----------: | ----------: |\n    | approximate\u0026nbsp;GPU\u0026nbsp;price  | $25\u003cbr\u003eGT\u0026nbsp;210 | $25\u003cbr\u003eGTX\u0026nbsp;950 | $12\u003cbr\u003eGTX\u0026nbsp;1060 | $50\u003cbr\u003eGT\u0026nbsp;730 | $35\u003cbr\u003eGTX\u0026nbsp;1060 | $70\u003cbr\u003eRX\u0026nbsp;470 | $500\u003cbr\u003eRTX\u0026nbsp;3080 | $240\u003cbr\u003eGTX\u0026nbsp;1080\u0026nbsp;Ti | $75\u003cbr\u003eTesla\u0026nbsp;M40 | $75\u003cbr\u003eInstinct\u0026nbsp;MI25 | $900\u003cbr\u003eRX\u0026nbsp;7900\u0026nbsp;XT | $205\u003cbr\u003eTesla\u0026nbsp;P40 | $600\u003cbr\u003eInstinct\u0026nbsp;MI60 | $5500\u003cbr\u003eA100 | $2400\u003cbr\u003eRTX\u0026nbsp;8000 | $10k\u003cbr\u003eInstinct\u0026nbsp;MI210 | $11k\u003cbr\u003eA100 | \u003e$40k\u003cbr\u003eH100\u0026nbsp;NVL | ?\u003cbr\u003eGPU\u0026nbsp;Max\u0026nbsp;1550 | ~$10k\u003cbr\u003eMI300X | - |\n    | traditional\u0026nbsp;LBM\u0026nbsp;(FP64) |      144³ |      182³ |      208³ |      230³ |      262³ |      288³ |       312³ |       322³ |       330³ |       364³ |       392³ |       418³ |       460³ |       494³ |       526³ |       578³ |       624³ |       658³ |        730³ |        836³ |        920³ |\n    | FluidX3D\u0026nbsp;(FP32/FP32)        |      224³ |      282³ |      322³ |      354³ |      406³ |      448³ |       482³ |       498³ |       512³ |       564³ |       608³ |       646³ |       710³ |       766³ |       814³ |       896³ |       966³ |      1018³ |       1130³ |       1292³ |       1422³ |\n    | FluidX3D\u0026nbsp;(FP32/FP16)        |      266³ |      336³ |      384³ |      424³ |      484³ |      534³ |       574³ |       594³ |       610³ |       672³ |       724³ |       770³ |       848³ |       912³ |       970³ |      1068³ |      1150³ |      1214³ |       1346³ |       1540³ |       1624³ |\n\n  \u003c/details\u003e\n- \u003cdetails\u003e\u003csummary\u003e\u003ca name=\"multi-gpu\"\u003e\u003c/a\u003ecross-vendor multi-GPU support on a single computer/server\u003c/summary\u003e\n\n  - domain decomposition allows pooling VRAM from multiple GPUs for much larger grid resolution\n  - GPUs don't have to be identical, not even from the same vendor - \u003ca href=\"https://youtu.be/_8Ed8ET9gBU\"\u003eany combination of AMD+Intel+Nvidia GPUs will work\u003c/a\u003e - but similar VRAM capacity/bandwidth is recommended\n  - domain communication architecture (simplified)\n    ```diff\n    ++   .-----------------------------------------------------------------.   ++\n    ++   |                              GPU 0                              |   ++\n    ++   |                          LBM Domain 0                           |   ++\n    ++   '-----------------------------------------------------------------'   ++\n    ++              |                 selective                /|\\             ++\n    ++             \\|/               in-VRAM copy               |              ++\n    ++        .-------------------------------------------------------.        ++\n    ++        |               GPU 0 - Transfer Buffer 0               |        ++\n    ++        '-------------------------------------------------------'        ++\n    !!                            |     PCIe     /|\\                           !!\n    !!                           \\|/    copy      |                            !!\n    @@        .-------------------------.   .-------------------------.        @@\n    @@        | CPU - Transfer Buffer 0 |   | CPU - Transfer Buffer 1 |        @@\n    @@        '-------------------------'\\ /'-------------------------'        @@\n    @@                           pointer  X   swap                             @@\n    @@        .-------------------------./ \\.-------------------------.        @@\n    @@        | CPU - Transfer Buffer 1 |   | CPU - Transfer Buffer 0 |        @@\n    @@        '-------------------------'   '-------------------------'        @@\n    !!                           /|\\    PCIe      |                            !!\n    !!                            |     copy     \\|/                           !!\n    ++        .-------------------------------------------------------.        ++\n    ++        |               GPU 1 - Transfer Buffer 1               |        ++\n    ++        '-------------------------------------------------------'        ++\n    ++             /|\\                selective                 |              ++\n    ++              |                in-VRAM copy              \\|/             ++\n    ++   .-----------------------------------------------------------------.   ++\n    ++   |                              GPU 1                              |   ++\n    ++   |                          LBM Domain 1                           |   ++\n    ++   '-----------------------------------------------------------------'   ++\n    ##                                    |                                    ##\n    ##                      domain synchronization barrier                     ##\n    ##                                    |                                    ##\n    ||   -------------------------------------------------------------\u003e time   ||\n    ```\n  - domain communication architecture (detailed)\n    ```diff\n    ++   .-----------------------------------------------------------------.   ++\n    ++   |                              GPU 0                              |   ++\n    ++   |                          LBM Domain 0                           |   ++\n    ++   '-----------------------------------------------------------------'   ++\n    ++     |  selective in- /|\\  |  selective in- /|\\  |  selective in- /|\\    ++\n    ++    \\|/ VRAM copy (X)  |  \\|/ VRAM copy (Y)  |  \\|/ VRAM copy (Z)  |     ++\n    ++   .---------------------.---------------------.---------------------.   ++\n    ++   |    GPU 0 - TB 0X+   |    GPU 0 - TB 0Y+   |    GPU 0 - TB 0Z+   |   ++\n    ++   |    GPU 0 - TB 0X-   |    GPU 0 - TB 0Y-   |    GPU 0 - TB 0Z-   |   ++\n    ++   '---------------------'---------------------'---------------------'   ++\n    !!          | PCIe /|\\            | PCIe /|\\            | PCIe /|\\         !!\n    !!         \\|/ copy |            \\|/ copy |            \\|/ copy |          !!\n    @@   .---------. .---------.---------. .---------.---------. .---------.   @@\n    @@   | CPU 0X+ | | CPU 1X- | CPU 0Y+ | | CPU 3Y- | CPU 0Z+ | | CPU 5Z- |   @@\n    @@   | CPU 0X- | | CPU 2X+ | CPU 0Y- | | CPU 4Y+ | CPU 0Z- | | CPU 6Z+ |   @@\n    @@   '---------\\ /---------'---------\\ /---------'---------\\ /---------'   @@\n    @@      pointer X swap (X)    pointer X swap (Y)    pointer X swap (Z)     @@\n    @@   .---------/ \\---------.---------/ \\---------.---------/ \\---------.   @@\n    @@   | CPU 1X- | | CPU 0X+ | CPU 3Y- | | CPU 0Y+ | CPU 5Z- | | CPU 0Z+ |   @@\n    @@   | CPU 2X+ | | CPU 0X- | CPU 4Y+ | | CPU 0Y- | CPU 6Z+ | | CPU 0Z- |   @@\n    @@   '---------' '---------'---------' '---------'---------' '---------'   @@\n    !!         /|\\ PCIe |            /|\\ PCIe |            /|\\ PCIe |          !!\n    !!          | copy \\|/            | copy \\|/            | copy \\|/         !!\n    ++   .--------------------..---------------------..--------------------.   ++\n    ++   |   GPU 1 - TB 1X-   ||    GPU 3 - TB 3Y-   ||   GPU 5 - TB 5Z-   |   ++\n    ++   :====================::=====================::====================:   ++\n    ++   |   GPU 2 - TB 2X+   ||    GPU 4 - TB 4Y+   ||   GPU 6 - TB 6Z+   |   ++\n    ++   '--------------------''---------------------''--------------------'   ++\n    ++    /|\\ selective in-  |  /|\\ selective in-  |  /|\\ selective in-  |     ++\n    ++     |  VRAM copy (X) \\|/  |  VRAM copy (Y) \\|/  |  VRAM copy (Z) \\|/    ++\n    ++   .--------------------..---------------------..--------------------.   ++\n    ++   |        GPU 1       ||        GPU 3        ||        GPU 5       |   ++\n    ++   |    LBM Domain 1    ||    LBM Domain 3     ||    LBM Domain 5    |   ++\n    ++   :====================::=====================::====================:   ++\n    ++   |        GPU 2       ||        GPU 4        ||        GPU 6       |   ++\n    ++   |    LBM Domain 2    ||    LBM Domain 4     ||    LBM Domain 6    |   ++\n    ++   '--------------------''---------------------''--------------------'   ++\n    ##              |                     |                     |              ##\n    ##              |      domain synchronization barriers      |              ##\n    ##              |                     |                     |              ##\n    ||   -------------------------------------------------------------\u003e time   ||\n    ```\n\n  \u003c/details\u003e\n- \u003cdetails\u003e\u003csummary\u003e\u003ca name=\"performance\"\u003e\u003c/a\u003epeak performance on GPUs (datacenter/gaming/professional/laptop)\u003c/summary\u003e\n\n  - [single-GPU/CPU benchmarks](#single-gpucpu-benchmarks)\n  - [multi-GPU benchmarks](#multi-gpu-benchmarks)\n\n  \u003c/details\u003e\n- \u003cdetails\u003e\u003csummary\u003e\u003ca name=\"extensions\"\u003e\u003c/a\u003epowerful model extensions\u003c/summary\u003e\n\n  - [boundary types](https://doi.org/10.15495/EPub_UBT_00005400)\n    - stationary mid-grid bounce-back boundaries (stationary solid boundaries)\n    - moving mid-grid bounce-back boundaries (moving solid boundaries)\n    - equilibrium boundaries (non-reflective inflow/outflow)\n    - temperature boundaries (fixed temperature)\n  - global force per volume (Guo forcing), can be modified on-the-fly\n  - local force per volume (force field)\n    - optional computation of forces from the fluid on solid boundaries\n  - state-of-the-art [free surface LBM](https://doi.org/10.3390/computation10060092) (FSLBM) implementation:\n    - [volume-of-fluid model](https://doi.org/10.15495/EPub_UBT_00005400)\n    - [fully analytic PLIC](https://doi.org/10.3390/computation10020021) for efficient curvature calculation\n    - improved mass conservation\n    - ultra efficient implementation with only [4 kernels](https://doi.org/10.3390/computation10060092) additionally to `stream_collide()` kernel\n  - thermal LBM to simulate thermal convection\n    - D3Q7 subgrid for thermal DDFs\n    - in-place streaming with [Esoteric-Pull](https://doi.org/10.3390/computation10060092) for thermal DDFs\n    - optional [FP16S or FP16C compression](https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats) for thermal DDFs with [DDF-shifting](https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats)\n  - Smagorinsky-Lilly subgrid turbulence LES model to keep simulations with very large Reynolds number stable\n    \u003cp align=\"center\"\u003e\u003ci\u003e\u0026Pi;\u003csub\u003e\u0026alpha;\u0026beta;\u003c/sub\u003e\u003c/i\u003e = \u0026Sigma;\u003csub\u003e\u003ci\u003ei\u003c/i\u003e\u003c/sub\u003e \u003ci\u003ee\u003csub\u003ei\u0026alpha;\u003c/sub\u003e\u003c/i\u003e \u003ci\u003ee\u003csub\u003ei\u0026beta;\u003c/sub\u003e\u003c/i\u003e (\u003ci\u003ef\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e   - \u003ci\u003ef\u003csub\u003ei\u003c/sub\u003e\u003c/i\u003e\u003csup\u003eeq-shifted\u003c/sup\u003e)\u003cbr\u003e\u003cbr\u003eQ = \u0026Sigma;\u003csub\u003e\u003ci\u003e\u0026alpha;\u0026beta;\u003c/i\u003e\u003c/sub\u003e   \u003ci\u003e\u0026Pi;\u003csub\u003e\u0026alpha;\u0026beta;\u003c/sub\u003e\u003c/i\u003e\u003csup\u003e2\u003c/sup\u003e\u003cbr\u003e\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;______________________\u003cbr\u003e\u0026tau; = \u0026frac12; (\u0026tau;\u003csub\u003e0\u003c/sub\u003e + \u0026radic; \u0026tau;\u003csub\u003e0\u003c/sub\u003e\u003csup\u003e2\u003c/sup\u003e + \u003csup\u003e(16\u0026radic;2)\u003c/sup\u003e\u0026#8725;\u003csub\u003e(\u003ci\u003e3\u0026pi;\u003c/i\u003e\u003csup\u003e2\u003c/sup\u003e)\u003c/sub\u003e \u003csup\u003e\u0026radic;Q\u003c/sup\u003e\u0026#8725;\u003csub\u003e\u003ci\u003e\u0026rho;\u003c/i\u003e\u003c/sub\u003e )\u003c/p\u003e\n  - particles with immersed-boundary method (either passive or 2-way-coupled, single-GPU only)\n\n  \u003c/details\u003e\n\n\n\n## Solving the Visualization Problem\n\n- FluidX3D can do simulations so large that storing the volumetric data for later rendering becomes unmanageable (like 120GB for a single frame, hundreds of TeraByte for a video)\n- instead, FluidX3D allows [rendering raw simulation data directly in VRAM](https://www.researchgate.net/publication/360501260_Combined_scientific_CFD_simulation_and_interactive_raytracing_with_OpenCL), so no large volumetric files have to be exported to the hard disk (see my [technical talk](https://youtu.be/pD8JWAZ2f8o))\n- the rendering is so fast that it works interactively in real time for both rasterization and raytracing\n- rasterization and raytracing are done in OpenCL and work on all GPUs, even the ones without RTX/DXR raytracing cores or without any rendering hardware at all (like A100, MI200, ...)\n- if no monitor is available (like on a remote Linux server), there is an [ASCII rendering mode](https://youtu.be/pD8JWAZ2f8o\u0026t=1456) to interactively visualize the simulation in the terminal (even in WSL and/or through SSH)\n- rendering is fully multi-GPU-parallelized via seamless domain decomposition rasterization\n- with interactive graphics mode disabled, image resolution can be as large as VRAM allows for (4K/8K/16K and above)\n- (interacitive) visualization modes:\n  - flag wireframe / solid surface (and force vectors on solid cells or surface pressure if the extension is used)\n  - velocity field (with slice mode)\n  - streamlines (with slice mode)\n  - velocity-colored Q-criterion isosurface\n  - rasterized free surface with [marching-cubes](http://paulbourke.net/geometry/polygonise/)\n  - [raytraced free surface](https://www.researchgate.net/publication/360501260_Combined_scientific_CFD_simulation_and_interactive_raytracing_with_OpenCL) with fast ray-grid traversal and marching-cubes, either 1-4 rays/pixel or 1-10 rays/pixel\n\n\n\n## Solving the Compatibility Problem\n\n- FluidX3D is written in OpenCL 1.2, so it runs on all hardware from all vendors (Nvidia, AMD, Intel, ...):\n  - world's fastest datacenter GPUs: MI300X, H100 (NVL), A100, MI200, MI100, V100(S), GPU Max 1100, ...\n  - gaming GPUs (desktop/laptop): Nvidia GeForce, AMD Radeon, Intel Arc\n  - professional/workstation GPUs: Nvidia Quadro, AMD Radeon Pro / FirePro, Intel Arc Pro\n  - integrated GPUs\n  - CPUs (requires [installation of Intel CPU Runtime for OpenCL](DOCUMENTATION.md#0-install-gpu-drivers-and-opencl-runtime))\n  - Intel Xeon Phi (requires [installation of Intel CPU Runtime for OpenCL](DOCUMENTATION.md#0-install-gpu-drivers-and-opencl-runtime))\n  - smartphone ARM GPUs\n- native cross-vendor multi-GPU implementation\n  - uses PCIe communication, so no SLI/Crossfire/NVLink/InfinityFabric required\n  - single-node parallelization, so no MPI installation required\n  - [GPUs don't even have to be from the same vendor](https://youtu.be/_8Ed8ET9gBU), but similar memory capacity and bandwidth are recommended\n- works on [Windows](DOCUMENTATION.md#windows) and [Linux](DOCUMENTATION.md#linux--macos--android) with C++17, with limited support also for [macOS](DOCUMENTATION.md#linux--macos--android) and [Android](DOCUMENTATION.md#linux--macos--android)\n- supports [importing and voxelizing triangle meshes](DOCUMENTATION.md#loading-stl-files) from binary `.stl` files, with fast GPU voxelization\n- supports [exporting volumetric data](DOCUMENTATION.md#data-export) as binary `.vtk` files\n- supports [exporting triangle meshes](DOCUMENTATION.md#data-export) as binary `.vtk` files\n- supports [exporting rendered images](DOCUMENTATION.md#video-rendering) as `.png`/`.qoi`/`.bmp` files; encoding runs in parallel on the CPU while the simulation on GPU can continue without delay\n\n\n\n## Single-GPU/CPU Benchmarks\n\nHere are [performance benchmarks](https://doi.org/10.3390/computation10060092) on various hardware in MLUPs/s, or how many million lattice cells are updated per second. The settings used for the benchmark are D3Q19 SRT with no extensions enabled (only LBM with implicit mid-grid bounce-back boundaries) and the setup consists of an empty cubic box with sufficient size (typically 256³). Without extensions, a single lattice cell requires:\n- a memory capacity of 93 (FP32/FP32) or 55 (FP32/FP16) Bytes\n- a memory bandwidth of 153 (FP32/FP32) or 77 (FP32/FP16) Bytes per time step\n- 363 (FP32/FP32) or 406 (FP32/FP16S) or 1275 (FP32/FP16C) FLOPs per time step (FP32+INT32 operations counted combined)\n\nIn consequence, the arithmetic intensity of this implementation is 2.37 (FP32/FP32) or 5.27 (FP32/FP16S) or 16.56 (FP32/FP16C) FLOPs/Byte. So performance is only limited by memory bandwidth. The table in the left 3 columns shows the hardware specs as found in the data sheets (theoretical peak FP32 compute performance, memory capacity, theoretical peak memory bandwidth). The right 3 columns show the measured FluidX3D performance for FP32/FP32, FP32/FP16S, FP32/FP16C floating-point precision settings, with the ([roofline model](https://en.wikipedia.org/wiki/Roofline_model) efficiency) in round brackets, indicating how much % of theoretical peak memory bandwidth are being used.\n\nIf your GPU/CPU is not on the list yet, you can report your benchmarks [here](https://github.com/ProjectPhysX/FluidX3D/issues/8).\n\n```mermaid\ngantt\n\ntitle FluidX3D Performance [MLUPs/s] - FP32 arithmetic, (fastest of FP32/FP16S/FP16C) memory storage\ndateFormat X\naxisFormat %s\n%%{\n\tinit: {\n\t\t\"gantt\": {\n\t\t\t'titleTopMargin': 42,\n\t\t\t'topPadding': 70,\n\t\t\t'leftPadding': 260,\n\t\t\t'rightPadding': 5,\n\t\t\t'sectionFontSize': 20,\n\t\t\t'fontSize': 20,\n\t\t\t'barHeight': 20,\n\t\t\t'barGap': 3,\n\t\t\t'numberSectionStyles': 2\n\t\t},\n\t\t'theme': 'forest',\n\t\t'themeVariables': {\n\t\t\t'sectionBkgColor': '#99999999',\n\t\t\t'altSectionBkgColor': '#00000000',\n\t\t\t'titleColor': '#AFAFAF',\n\t\t\t'textColor': '#AFAFAF',\n\t\t\t'taskTextColor': 'black',\n\t\t\t'taskBorderColor': '#487E3A'\n\t\t}\n\t}\n}%%\n\nsection MI300X\n\t41327 :crit, 0, 41327\nsection MI250 (1 GCD)\n\t9030 :crit, 0, 9030\nsection MI210\n\t9547 :crit, 0, 9547\nsection MI100\n\t8542 :crit, 0, 8542\nsection MI60\n\t5111 :crit, 0, 5111\nsection MI50 32GB\n\t8477 :crit, 0, 8477\nsection Radeon VII\n\t7778 :crit, 0, 7778\nsection GPU Max 1100\n\t6303 :done, 0, 6303\nsection GH200 94GB GPU\n\t34689 : 0, 34689\nsection H100 NVL\n\t32922 : 0, 32922\nsection H100 SXM5 80GB HBM3\n\t29561 : 0, 29561\nsection H100 PCIe 80GB HBM2e\n\t20624 : 0, 20624\nsection A100 SXM4 80GB\n\t18448 : 0, 18448\nsection A100 PCIe 80GB\n\t17896 : 0, 17896\nsection PG506-242/243\n\t15654 : 0, 15654\nsection A100 SXM4 40GB\n\t16013 : 0, 16013\nsection A100 PCIe 40GB\n\t16035 : 0, 16035\nsection CMP 170HX\n\t12392 : 0, 12392\nsection A30\n\t9721 : 0, 9721\nsection V100 SXM2 32GB\n\t8947 : 0, 8947\nsection V100 PCIe 16GB\n\t10325 : 0, 10325\nsection GV100\n\t6641 : 0, 6641\nsection Titan V\n\t7253 : 0, 7253\nsection P100 PCIe 16GB\n\t5950 : 0, 5950\nsection P100 PCIe 12GB\n\t4141 : 0, 4141\nsection GTX TITAN\n\t2500 : 0, 2500\nsection K40m\n\t1868 : 0, 1868\nsection K80 (1 GPU)\n\t1642 : 0, 1642\nsection K20c\n\t1507 : 0, 1507\n\nsection RX 9070 XT\n\t6688 :crit, 0, 6688\nsection RX 9070\n\t6019 :crit, 0, 6019\nsection RX 7900 XTX\n\t7716 :crit, 0, 7716\nsection PRO W7900\n\t5939 :crit, 0, 5939\nsection RX 7900 XT\n\t5986 :crit, 0, 5986\nsection RX 7800 XT\n\t3105 :crit, 0, 3105\nsection PRO W7800\n\t4426 :crit, 0, 4426\nsection RX 7900 GRE\n\t4570 :crit, 0, 4570\nsection PRO W7700\n\t2943 :crit, 0, 2943\nsection RX 7600\n\t2561 :crit, 0, 2561\nsection PRO W7600\n\t2287 :crit, 0, 2287\nsection PRO W7500\n\t1682 :crit, 0, 1682\nsection RX 6900 XT\n\t4227 :crit, 0, 4227\nsection RX 6800 XT\n\t4241 :crit, 0, 4241\nsection PRO W6800\n\t3361 :crit, 0, 3361\nsection RX 6700 XT\n\t2908 :crit, 0, 2908\nsection RX 6800M\n\t3213 :crit, 0, 3213\nsection RX 6700M\n\t2429 :crit, 0, 2429\nsection RX 6600\n\t1839 :crit, 0, 1839\nsection RX 6500 XT\n\t1030 :crit, 0, 1030\nsection RX 5700 XT\n\t3253 :crit, 0, 3253\nsection RX 5700\n\t3167 :crit, 0, 3167\nsection RX 5600 XT\n\t2214 :crit, 0, 2214\nsection RX Vega 64\n\t3227 :crit, 0, 3227\nsection RX 590\n\t1688 :crit, 0, 1688\nsection RX 580 4GB\n\t1848 :crit, 0, 1848\nsection RX 580 2048SP 8GB\n\t1622 :crit, 0, 1622\nsection R9 390X\n\t2217 :crit, 0, 2217\nsection HD 7850\n\t635 :crit, 0, 635\nsection Arc B580 LE\n\t4979 :done, 0, 4979\nsection Arc A770 LE\n\t4568 :done, 0, 4568\nsection Arc A750 LE\n\t4314 :done, 0, 4314\nsection Arc A580\n\t3889 :done, 0, 3889\nsection Arc Pro A40\n\t985 :done, 0, 985\nsection Arc A380\n\t1115 :done, 0, 1115\nsection RTX 5090\n\t19141 : 0, 19141\nsection RTX 5080\n\t10304 : 0, 10304\nsection RTX 5070\n\t7238 : 0, 7238\nsection RTX 4090\n\t11496 : 0, 11496\nsection RTX 6000 Ada\n\t10293 : 0, 10293\nsection L40S\n\t7637 : 0, 7637\nsection L40\n\t7945 : 0, 7945\nsection RTX 4080 Super\n\t8218 : 0, 8218\nsection RTX 4080\n\t7933 : 0, 7933\nsection RTX 4070 Ti Super\n\t7295 : 0, 7295\nsection RTX 4090M\n\t6901 : 0, 6901\nsection RTX 4070 Super\n\t5554 : 0, 5554\nsection RTX 4070\n\t5016 : 0, 5016\nsection RTX 4080M\n\t5114 : 0, 5114\nsection RTX 4000 Ada\n\t4221 : 0, 4221\nsection RTX 4060\n\t3124 : 0, 3124\nsection RTX 4070M\n\t3092 : 0, 3092\nsection RTX 2000 Ada\n\t2526 : 0, 2526\nsection RTX 3090 Ti\n\t10956 : 0, 10956\nsection RTX 3090\n\t10732 : 0, 10732\nsection RTX 3080 Ti\n\t9832 : 0, 9832\nsection RTX 3080 12GB\n\t9657 : 0, 9657\nsection RTX A6000\n\t8814 : 0, 8814\nsection RTX 3080 10GB\n\t8118 : 0, 8118\nsection RTX 3070 Ti\n\t6807 : 0, 6807\nsection RTX 3080M Ti\n\t5908 : 0, 5908\nsection RTX 3070\n\t5096 : 0, 5096\nsection RTX 3060 Ti\n\t5129 : 0, 5129\nsection RTX A4000\n\t4945 : 0, 4945\nsection RTX A5000M\n\t4461 : 0, 4461\nsection RTX 3060\n\t4070 : 0, 4070\nsection RTX 3060M\n\t4012 : 0, 4012\nsection A2\n\t2051 : 0, 2051\nsection RTX 3050M Ti\n\t2341 : 0, 2341\nsection RTX 3050M\n\t2339 : 0, 2339\nsection Titan RTX\n\t7554 : 0, 7554\nsection RTX 6000\n\t6879 : 0, 6879\nsection RTX 8000 Passive\n\t5607 : 0, 5607\nsection RTX 2080 Ti\n\t6853 : 0, 6853\nsection RTX 2080 Super\n\t5284 : 0, 5284\nsection RTX 5000\n\t4773 : 0, 4773\nsection RTX 2080\n\t4977 : 0, 4977\nsection RTX 2070 Super\n\t4893 : 0, 4893\nsection RTX 2070\n\t5017 : 0, 5017\nsection RTX 2060 Super\n\t5035 : 0, 5035\nsection RTX 4000\n\t4584 : 0, 4584\nsection RTX 2060 KO\n\t3376 : 0, 3376\nsection RTX 2060\n\t3604 : 0, 3604\nsection GTX 1660 Super\n\t3551 : 0, 3551\nsection T4\n\t2887 : 0, 2887\nsection GTX 1660 Ti\n\t3041 : 0, 3041\nsection GTX 1660\n\t1992 : 0, 1992\nsection GTX 1650M 896C\n\t1858 : 0, 1858\nsection GTX 1650M 1024C\n\t1400 : 0, 1400\nsection T500\n\t665 : 0, 665\nsection Titan Xp\n\t5495 : 0, 5495\nsection GTX 1080 Ti\n\t4877 : 0, 4877\nsection GTX 1080\n\t3182 : 0, 3182\nsection GTX 1060 6GB\n\t1925 : 0, 1925\nsection GTX 1060M\n\t1882 : 0, 1882\nsection GTX 1050M Ti\n\t1224 : 0, 1224\nsection P1000\n\t839 : 0, 839\nsection GTX 980 Ti\n\t2703 : 0, 2703\nsection GTX 980\n\t1965 : 0, 1965\nsection GTX 970\n\t1721 : 0, 1721\nsection M4000\n\t1519 : 0, 1519\nsection M60 (1 GPU)\n\t1571 : 0, 1571\nsection GTX 960M\n\t872 : 0, 872\nsection GTX 770\n\t1215 : 0, 1215\nsection GTX 680 4GB\n\t1274 : 0, 1274\nsection K2000\n\t444 : 0, 444\nsection GT 630 (OEM)\n\t185 : 0, 185\nsection NVS 290\n\t9 : 0, 9\nsection Arise 1020\n\t6 :active, 0, 6\n\nsection M2 Ultra (76-CU, 192GB)\n\t8769 :active, 0, 8769\nsection M2 Max (38-CU, 32GB)\n\t4641 :active, 0, 4641\nsection M1 Ultra (64-CU, 128GB)\n\t8418 :active, 0, 8418\nsection M1 Max (24-CU, 32GB)\n\t4496 :active, 0, 4496\nsection M1 Pro (16-CU, 16GB)\n\t2329 :active, 0, 2329\nsection M1 (8-CU, 16GB)\n\t759 :active, 0, 759\nsection Radeon 8060S (Max+ 395)\n\t2563 :crit, 0, 2563\nsection Radeon 780M (Z1 Extreme)\n\t860 :crit, 0, 860\nsection Radeon Graphics (7800X3D)\n\t498 :crit, 0, 498\nsection Vega 8 (4750G)\n\t511 :crit, 0, 511\nsection Vega 8 (3500U)\n\t288 :crit, 0, 288\nsection Arc 140V GPU (16GB)\n\t1282 :done, 0, 1282\nsection Arc Graphics (Ultra 9 185H)\n\t724 :done, 0, 724\nsection Iris Xe Graphics (i7-1265U)\n\t621 :done, 0, 621\nsection UHD Xe 32EUs\n\t245 :done, 0, 245\nsection UHD 770\n\t475 :done, 0, 475\nsection UHD 630\n\t301 :done, 0, 301\nsection UHD P630\n\t288 :done, 0, 288\nsection HD 5500\n\t192 :done, 0, 192\nsection HD 4600\n\t115 :done, 0, 115\nsection Orange Pi 5 Mali-G610 MP4\n\t232 :active, 0, 232\nsection Samsung Mali-G72 MP18\n\t230 :active, 0, 230\n\nsection 2x EPYC 9754\n\t5179 :crit, 0, 5179\nsection 2x EPYC 9654\n\t1814 :crit, 0, 1814\nsection 2x EPYC 9554\n\t2552 :crit, 0, 2552\nsection 1x EPYC 9124\n\t772 :crit, 0, 772\nsection 2x EPYC 7713\n\t1418 :crit, 0, 1418\nsection 2x EPYC 7352\n\t739 :crit, 0, 739\nsection 2x EPYC 7313\n\t498 :crit, 0, 498\nsection 2x EPYC 7302\n\t784 :crit, 0, 784\nsection 2x 6980P\n\t7875 :done, 0, 7875\nsection 2x 6979P\n\t8135 :done, 0, 8135\nsection 2x Platinum 8592+\n\t3135 :done, 0, 3135\nsection 2x Gold 6548N\n\t1811 :done, 0, 1811\nsection 2x CPU Max 9480\n\t2037 :done, 0, 2037\nsection 2x Platinum 8480+\n\t2162 :done, 0, 2162\nsection 2x Platinum 8470\n\t2068 :done, 0, 2068\nsection 2x Gold 6438Y+\n\t1945 :done, 0, 1945\nsection 2x Platinum 8380\n\t1410 :done, 0, 1410\nsection 2x Platinum 8358\n\t1285 :done, 0, 1285\nsection 2x Platinum 8256\n\t396 :done, 0, 396\nsection 2x Platinum 8153\n\t691 :done, 0, 691\nsection 2x Gold 6248R\n\t755 :done, 0, 755\nsection 2x Gold 6128\n\t254 :done, 0, 254\nsection Phi 7210\n\t415 :done, 0, 415\nsection 4x E5-4620 v4\n\t460 :done, 0, 460\nsection 2x E5-2630 v4\n\t264 :done, 0, 264\nsection 2x E5-2623 v4\n\t125 :done, 0, 125\nsection 2x E5-2680 v3\n\t304 :done, 0, 304\nsection GH200 Neoverse-V2\n\t1323 : 0, 1323\nsection TR PRO 7995WX\n\t1715 :crit, 0, 1715\nsection TR 3970X\n\t463 :crit, 0, 463\nsection TR 1950X\n\t273 :crit, 0, 273\nsection Ryzen 7900X3D\n\t521 :crit, 0, 521\nsection Ryzen 7800X3D\n\t363 :crit, 0, 363\nsection Ryzen 5700X3D\n\t229 :crit, 0, 229\nsection FX-6100\n\t22 :crit, 0, 22\nsection Athlon X2 QL-65\n\t3 :crit, 0, 3\nsection Ultra 7 258V\n\t287 :done, 0, 287\nsection Ultra 9 185H\n\t317 :done, 0, 317\nsection i9-14900K\n\t490 :done, 0, 490\nsection i7-13700K\n\t504 :done, 0, 504\nsection i7-1265U\n\t128 :done, 0, 128\nsection i9-11900KB\n\t208 :done, 0, 208\nsection i9-10980XE\n\t286 :done, 0, 286\nsection E-2288G\n\t198 :done, 0, 198\nsection i7-9700\n\t103 :done, 0, 103\nsection i5-9600\n\t147 :done, 0, 147\nsection i7-8700K\n\t152 :done, 0, 152\nsection E-2176G\n\t201 :done, 0, 201\nsection i7-7700HQ\n\t108 :done, 0, 108\nsection E3-1240 v5\n\t141 :done, 0, 141\nsection i5-5300U\n\t37 :done, 0, 37\nsection i7-4770\n\t104 :done, 0, 104\nsection i7-4720HQ\n\t80 :done, 0, 80\nsection N2807\n\t7 :done, 0, 7\n```\n\n\u003cdetails\u003e\u003csummary\u003eSingle-GPU/CPU Benchmark Table\u003c/summary\u003e\n\nColors: 🔴 AMD, 🔵 Intel, 🟢 Nvidia, ⚪ Apple, 🟡 ARM, 🟤 Glenfly\n\n| Device                                           | FP32\u003cbr\u003e[TFlops/s] | Mem\u003cbr\u003e[GB] | BW\u003cbr\u003e[GB/s] | FP32/FP32\u003cbr\u003e[MLUPs/s] | FP32/FP16S\u003cbr\u003e[MLUPs/s] | FP32/FP16C\u003cbr\u003e[MLUPs/s] |\n| :----------------------------------------------- | -----------------: | ----------: | -----------: | ---------------------: | ----------------------: | ----------------------: |\n|                                                  |                    |             |              |                        |                         |                         |\n| 🔴\u0026nbsp;Instinct\u0026nbsp;MI300X                     |             163.40 |         192 |         5300 |       22867\u0026nbsp;(66%) |        41327\u0026nbsp;(60%) |        31670\u0026nbsp;(46%) |\n| 🔴\u0026nbsp;Instinct\u0026nbsp;MI250\u0026nbsp;(1\u0026nbsp;GCD)    |              45.26 |          64 |         1638 |             5638 (53%) |              9030 (42%) |              8506 (40%) |\n| 🔴\u0026nbsp;Instinct\u0026nbsp;MI210                      |              45.26 |          64 |         1638 |             6517 (61%) |              9547 (45%) |              8829 (41%) |\n| 🔴\u0026nbsp;Instinct\u0026nbsp;MI100                      |              46.14 |          32 |         1228 |             5093 (63%) |              8133 (51%) |              8542 (54%) |\n| 🔴\u0026nbsp;Instinct\u0026nbsp;MI60                       |              14.75 |          32 |         1024 |             3570 (53%) |              5047 (38%) |              5111 (38%) |\n| 🔴\u0026nbsp;Instinct\u0026nbsp;MI50\u0026nbsp;32GB             |              13.25 |          32 |         1024 |             4446 (66%) |              8477 (64%) |              4406 (33%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;VII                          |              13.83 |          16 |         1024 |             4898 (73%) |              7778 (58%) |              5256 (40%) |\n| 🔵\u0026nbsp;Data\u0026nbsp;Center\u0026nbsp;GPU\u0026nbsp;Max\u0026nbsp;1100 |          22.22 |          48 |         1229 |             3769 (47%) |              6303 (39%) |              3520 (22%) |\n| 🟢\u0026nbsp;GH200\u0026nbsp;94GB\u0026nbsp;GPU                 |              66.91 |          94 |         4000 |       20595\u0026nbsp;(79%) |        34689\u0026nbsp;(67%) |        19407\u0026nbsp;(37%) |\n| 🟢\u0026nbsp;H100\u0026nbsp;NVL                            |              60.32 |          94 |         3938 |       20303\u0026nbsp;(79%) |        32922\u0026nbsp;(64%) |        18424\u0026nbsp;(36%) |\n| 🟢\u0026nbsp;H100\u0026nbsp;SXM5\u0026nbsp;80GB\u0026nbsp;HBM3       |              66.91 |          80 |         3350 |       17602\u0026nbsp;(80%) |        29561\u0026nbsp;(68%) |        20227\u0026nbsp;(46%) |\n| 🟢\u0026nbsp;H100\u0026nbsp;PCIe\u0026nbsp;80GB\u0026nbsp;HBM2e      |              51.01 |          80 |         2000 |       11128\u0026nbsp;(85%) |        20624\u0026nbsp;(79%) |        13862\u0026nbsp;(53%) |\n| 🟢\u0026nbsp;A100\u0026nbsp;SXM4\u0026nbsp;80GB                 |              19.49 |          80 |         2039 |       10228\u0026nbsp;(77%) |        18448\u0026nbsp;(70%) |        11197\u0026nbsp;(42%) |\n| 🟢\u0026nbsp;A100\u0026nbsp;PCIe\u0026nbsp;80GB                 |              19.49 |          80 |         1935 |             9657 (76%) |        17896\u0026nbsp;(71%) |        10817\u0026nbsp;(43%) |\n| 🟢\u0026nbsp;PG506-243\u0026nbsp;/\u0026nbsp;PG506-242          |              22.14 |          64 |         1638 |             8195 (77%) |        15654\u0026nbsp;(74%) |        12271\u0026nbsp;(58%) |\n| 🟢\u0026nbsp;A100\u0026nbsp;SXM4\u0026nbsp;40GB                 |              19.49 |          40 |         1555 |             8522 (84%) |        16013\u0026nbsp;(79%) |        11251\u0026nbsp;(56%) |\n| 🟢\u0026nbsp;A100\u0026nbsp;PCIe\u0026nbsp;40GB                 |              19.49 |          40 |         1555 |             8526 (84%) |        16035\u0026nbsp;(79%) |        11088\u0026nbsp;(55%) |\n| 🟢\u0026nbsp;CMP\u0026nbsp;170HX                           |               6.32 |           8 |         1493 |             7684 (79%) |        12392\u0026nbsp;(64%) |              6859 (35%) |\n| 🟢\u0026nbsp;A30                                      |              10.32 |          24 |          933 |             5004 (82%) |              9721 (80%) |              5726 (47%) |\n| 🟢\u0026nbsp;Tesla\u0026nbsp;V100\u0026nbsp;SXM2\u0026nbsp;32GB      |              15.67 |          32 |          900 |             4471 (76%) |              8947 (77%) |              7217 (62%) |\n| 🟢\u0026nbsp;Tesla\u0026nbsp;V100\u0026nbsp;PCIe\u0026nbsp;16GB      |              14.13 |          16 |          900 |             5128 (87%) |        10325\u0026nbsp;(88%) |              7683 (66%) |\n| 🟢\u0026nbsp;Quadro\u0026nbsp;GV100                        |              16.66 |          32 |          870 |             3442 (61%) |              6641 (59%) |              5863 (52%) |\n| 🟢\u0026nbsp;Titan\u0026nbsp;V                             |              14.90 |          12 |          653 |             3601 (84%) |              7253 (86%) |              6957 (82%) |\n| 🟢\u0026nbsp;Tesla\u0026nbsp;P100\u0026nbsp;16GB                |               9.52 |          16 |          732 |             3295 (69%) |              5950 (63%) |              4176 (44%) |\n| 🟢\u0026nbsp;Tesla\u0026nbsp;P100\u0026nbsp;12GB                |               9.52 |          12 |          549 |             2427 (68%) |              4141 (58%) |              3999 (56%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;TITAN              |               4.71 |           6 |          288 |             1460 (77%) |              2500 (67%) |              1113 (30%) |\n| 🟢\u0026nbsp;Tesla\u0026nbsp;K40m                          |               4.29 |          12 |          288 |             1131 (60%) |              1868 (50%) |               912 (24%) |\n| 🟢\u0026nbsp;Tesla\u0026nbsp;K80\u0026nbsp;(1\u0026nbsp;GPU)         |               4.11 |          12 |          240 |              916 (58%) |              1642 (53%) |               943 (30%) |\n| 🟢\u0026nbsp;Tesla\u0026nbsp;K20c                          |               3.52 |           5 |          208 |              861 (63%) |              1507 (56%) |               720 (27%) |\n|                                                  |                    |             |              |                        |                         |                         |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;9070\u0026nbsp;XT         |              48.66 |          16 |          640 |             3089 (74%) |              6688 (80%) |              6090 (73%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;9070                 |              36.13 |          16 |          640 |             3007 (72%) |              5746 (69%) |              6019 (72%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;7900\u0026nbsp;XTX        |              61.44 |          24 |          960 |             3665 (58%) |              7644 (61%) |              7716 (62%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;PRO\u0026nbsp;W7900               |              61.30 |          48 |          864 |             3107 (55%) |              5939 (53%) |              5780 (52%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;7900\u0026nbsp;XT         |              51.61 |          20 |          800 |             3013 (58%) |              5856 (56%) |              5986 (58%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;7800\u0026nbsp;XT         |              37.32 |          16 |          624 |             1704 (42%) |              3105 (38%) |              3061 (38%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;PRO\u0026nbsp;W7800               |              45.20 |          32 |          576 |             1872 (50%) |              4426 (59%) |              4145 (55%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;7900\u0026nbsp;GRE        |              42.03 |          16 |          576 |             1996 (53%) |              4570 (61%) |              4463 (60%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;PRO\u0026nbsp;W7700               |              28.30 |          16 |          576 |             1547 (41%) |              2943 (39%) |              2899 (39%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;7600                 |              21.75 |           8 |          288 |             1250 (66%) |              2561 (68%) |              2512 (67%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;PRO\u0026nbsp;W7600               |              20.00 |           8 |          288 |             1179 (63%) |              2263 (61%) |              2287 (61%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;PRO\u0026nbsp;W7500               |              12.20 |           8 |          172 |              856 (76%) |              1630 (73%) |              1682 (75%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;6900\u0026nbsp;XT         |              23.04 |          16 |          512 |             1968 (59%) |              4227 (64%) |              4207 (63%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;6800\u0026nbsp;XT         |              20.74 |          16 |          512 |             2008 (60%) |              4241 (64%) |              4224 (64%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;PRO\u0026nbsp;W6800               |              17.83 |          32 |          512 |             1620 (48%) |              3361 (51%) |              3180 (48%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;6700\u0026nbsp;XT         |              13.21 |          12 |          384 |             1408 (56%) |              2883 (58%) |              2908 (58%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;6800M                |              11.78 |          12 |          384 |             1439 (57%) |              3190 (64%) |              3213 (64%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;6700M                |              10.60 |          10 |          320 |             1194 (57%) |              2388 (57%) |              2429 (58%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;6600                 |               8.93 |           8 |          224 |              963 (66%) |              1817 (62%) |              1839 (63%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;6500\u0026nbsp;XT         |               5.77 |           4 |          144 |              459 (49%) |              1011 (54%) |              1030 (55%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;5700\u0026nbsp;XT         |               9.75 |           8 |          448 |             1368 (47%) |              3253 (56%) |              3049 (52%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;5700                 |               7.72 |           8 |          448 |             1521 (52%) |              3167 (54%) |              2758 (47%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;5600\u0026nbsp;XT         |               6.73 |           6 |          288 |             1136 (60%) |              2214 (59%) |              2148 (57%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;Vega\u0026nbsp;64         |              13.35 |           8 |          484 |             1875 (59%) |              2878 (46%) |              3227 (51%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;590                  |               5.53 |           8 |          256 |             1257 (75%) |              1573 (47%) |              1688 (51%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;580\u0026nbsp;4GB         |               6.50 |           4 |          256 |              946 (57%) |              1848 (56%) |              1577 (47%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;RX\u0026nbsp;580\u0026nbsp;2048SP\u0026nbsp;8GB |           4.94 |           8 |          224 |              868 (59%) |              1622 (56%) |              1240 (43%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;R9\u0026nbsp;390X                 |               5.91 |           8 |          384 |             1733 (69%) |              2217 (44%) |              1722 (35%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;HD\u0026nbsp;7850                 |               1.84 |           2 |          154 |              112 (11%) |               120 ( 6%) |               635 (32%) |\n| 🔵\u0026nbsp;Arc\u0026nbsp;B580\u0026nbsp;LE                    |              14.59 |          12 |          456 |             2598 (87%) |              4443 (75%) |              4979 (84%) |\n| 🔵\u0026nbsp;Arc\u0026nbsp;A770\u0026nbsp;LE                    |              19.66 |          16 |          560 |             2663 (73%) |              4568 (63%) |              4519 (62%) |\n| 🔵\u0026nbsp;Arc\u0026nbsp;A750\u0026nbsp;LE                    |              17.20 |           8 |          512 |             2555 (76%) |              4314 (65%) |              4047 (61%) |\n| 🔵\u0026nbsp;Arc\u0026nbsp;A580                            |              12.29 |           8 |          512 |             2534 (76%) |              3889 (58%) |              3488 (52%) |\n| 🔵\u0026nbsp;Arc\u0026nbsp;Pro\u0026nbsp;A40                    |               5.02 |           6 |          192 |              594 (47%) |               985 (40%) |               927 (37%) |\n| 🔵\u0026nbsp;Arc\u0026nbsp;A380                            |               4.20 |           6 |          186 |              622 (51%) |              1097 (45%) |              1115 (46%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;5090               |             104.88 |          32 |         1792 |             9522 (81%) |             18459 (79%) |             19141 (82%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;5080               |              56.34 |          16 |          960 |             5174 (82%) |             10252 (82%) |             10304 (83%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;5070               |              30.84 |          12 |          672 |             3658 (83%) |              7238 (83%) |              7107 (81%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;4090               |              82.58 |          24 |         1008 |             5624 (85%) |             11091 (85%) |             11496 (88%) |\n| 🟢\u0026nbsp;RTX\u0026nbsp;6000\u0026nbsp;Ada                   |              91.10 |          48 |          960 |             4997 (80%) |             10249 (82%) |             10293 (83%) |\n| 🟢\u0026nbsp;L40S                                     |              91.61 |          48 |          864 |             3788 (67%) |              7637 (68%) |              7617 (68%) |\n| 🟢\u0026nbsp;L40                                      |              90.52 |          48 |          864 |             3870 (69%) |              7778 (69%) |              7945 (71%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;4080\u0026nbsp;Super    |              52.22 |          16 |          736 |             4089 (85%) |              7660 (80%) |              8218 (86%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;4080               |              55.45 |          16 |          717 |             3914 (84%) |              7626 (82%) |              7933 (85%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;4070\u0026nbsp;Ti\u0026nbsp;Super |         44.10 |          16 |          672 |             3694 (84%) |              6435 (74%) |              7295 (84%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;4090M              |              28.31 |          16 |          576 |             3367 (89%) |              6545 (87%) |              6901 (92%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;4070\u0026nbsp;Super    |              35.55 |          12 |          504 |             2751 (83%) |              5149 (79%) |              5554 (85%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;4070               |              29.15 |          12 |          504 |             2646 (80%) |              4548 (69%) |              5016 (77%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;4080M              |              33.85 |          12 |          432 |             2577 (91%) |              5086 (91%) |              5114 (91%) |\n| 🟢\u0026nbsp;RTX\u0026nbsp;4000\u0026nbsp;Ada                   |              26.73 |          20 |          360 |             2130 (91%) |              3964 (85%) |              4221 (90%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;4060               |              15.11 |           8 |          272 |             1614 (91%) |              3052 (86%) |              3124 (88%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;4070M              |              18.25 |           8 |          256 |             1553 (93%) |              2945 (89%) |              3092 (93%) |\n| 🟢\u0026nbsp;RTX\u0026nbsp;2000\u0026nbsp;Ada                   |              12.00 |          16 |          224 |             1351 (92%) |              2452 (84%) |              2526 (87%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;3090\u0026nbsp;Ti       |              40.00 |          24 |         1008 |             5717 (87%) |             10956 (84%) |             10400 (79%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;3090               |              39.05 |          24 |          936 |             5418 (89%) |             10732 (88%) |             10215 (84%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;3080\u0026nbsp;Ti       |              37.17 |          12 |          912 |             5202 (87%) |              9832 (87%) |              9347 (79%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;3080\u0026nbsp;12GB     |              32.26 |          12 |          912 |             5071 (85%) |              9657 (81%) |              8615 (73%) |\n| 🟢\u0026nbsp;RTX\u0026nbsp;A6000                           |              40.00 |          48 |          768 |             4421 (88%) |              8814 (88%) |              8533 (86%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;3080\u0026nbsp;10GB     |              29.77 |          10 |          760 |             4230 (85%) |              8118 (82%) |              7714 (78%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;3070\u0026nbsp;Ti       |              21.75 |           8 |          608 |             3490 (88%) |              6807 (86%) |              5926 (75%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;3080M\u0026nbsp;Ti      |              23.61 |          16 |          512 |             2985 (89%) |              5908 (89%) |              5780 (87%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;3070               |              20.31 |           8 |          448 |             2578 (88%) |              5096 (88%) |              5060 (87%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;3060\u0026nbsp;Ti       |              16.49 |           8 |          448 |             2644 (90%) |              5129 (88%) |              4718 (81%) |\n| 🟢\u0026nbsp;RTX\u0026nbsp;A4000                           |              19.17 |          16 |          448 |             2500 (85%) |              4945 (85%) |              4664 (80%) |\n| 🟢\u0026nbsp;RTX\u0026nbsp;A5000M                          |              16.59 |          16 |          448 |             2228 (76%) |              4461 (77%) |              3662 (63%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;3060               |              13.17 |          12 |          360 |             2108 (90%) |              4070 (87%) |              3566 (76%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;3060M              |              10.94 |           6 |          336 |             2019 (92%) |              4012 (92%) |              3572 (82%) |\n| 🟢\u0026nbsp;A2                                       |               4.53 |          15 |          200 |             1031 (79%) |              2051 (79%) |              1199 (46%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;3050M\u0026nbsp;Ti      |               7.60 |           4 |          192 |             1181 (94%) |              2341 (94%) |              2253 (90%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;3050M              |               7.13 |           4 |          192 |             1180 (94%) |              2339 (94%) |              2016 (81%) |\n| 🟢\u0026nbsp;Titan\u0026nbsp;RTX                           |              16.31 |          24 |          672 |             3471 (79%) |              7456 (85%) |              7554 (87%) |\n| 🟢\u0026nbsp;Quadro\u0026nbsp;RTX\u0026nbsp;6000                |              16.31 |          24 |          672 |             3307 (75%) |              6836 (78%) |              6879 (79%) |\n| 🟢\u0026nbsp;Quadro\u0026nbsp;RTX\u0026nbsp;8000\u0026nbsp;Passive   |              14.93 |          48 |          624 |             2591 (64%) |              5408 (67%) |              5607 (69%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;2080\u0026nbsp;Ti       |              13.45 |          11 |          616 |             3194 (79%) |              6700 (84%) |              6853 (86%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;2080\u0026nbsp;Super    |              11.34 |           8 |          496 |             2434 (75%) |              5284 (82%) |              5087 (79%) |\n| 🟢\u0026nbsp;Quadro\u0026nbsp;RTX\u0026nbsp;5000                |              11.15 |          16 |          448 |             2341 (80%) |              4766 (82%) |              4773 (82%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;2080               |              10.07 |           8 |          448 |             2318 (79%) |              4977 (86%) |              4963 (85%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;2070\u0026nbsp;Super    |               9.22 |           8 |          448 |             2255 (77%) |              4866 (84%) |              4893 (84%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;2070               |               7.47 |           8 |          448 |             2444 (83%) |              4387 (75%) |              5017 (86%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;2060\u0026nbsp;Super    |               7.18 |           8 |          448 |             2503 (85%) |              5035 (87%) |              4463 (77%) |\n| 🟢\u0026nbsp;Quadro\u0026nbsp;RTX\u0026nbsp;4000                |               7.12 |           8 |          416 |             2284 (84%) |              4584 (85%) |              4062 (75%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;2060\u0026nbsp;KO       |               6.74 |           6 |          336 |             1643 (75%) |              3376 (77%) |              3266 (75%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;2060               |               6.74 |           6 |          336 |             1681 (77%) |              3604 (83%) |              3571 (82%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;1660\u0026nbsp;Super    |               5.03 |           6 |          336 |             1696 (77%) |              3551 (81%) |              3040 (70%) |\n| 🟢\u0026nbsp;Tesla\u0026nbsp;T4                            |               8.14 |          15 |          300 |             1356 (69%) |              2869 (74%) |              2887 (74%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;1660\u0026nbsp;Ti       |               5.48 |           6 |          288 |             1467 (78%) |              3041 (81%) |              3019 (81%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;1660               |               5.07 |           6 |          192 |             1016 (81%) |              1924 (77%) |              1992 (80%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;1650M\u0026nbsp;896C    |               2.72 |           4 |          192 |              963 (77%) |              1836 (74%) |              1858 (75%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;1650M\u0026nbsp;1024C   |               3.20 |           4 |          128 |              706 (84%) |              1214 (73%) |              1400 (84%) |\n| 🟢\u0026nbsp;T500                                     |               3.04 |           4 |           80 |              339 (65%) |               578 (56%) |               665 (64%) |\n| 🟢\u0026nbsp;Titan\u0026nbsp;Xp                            |              12.15 |          12 |          548 |             2919 (82%) |              5495 (77%) |              5375 (76%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;1080\u0026nbsp;Ti       |              12.06 |          11 |          484 |             2631 (83%) |              4837 (77%) |              4877 (78%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;1080               |               9.78 |           8 |          320 |             1623 (78%) |              3100 (75%) |              3182 (77%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;1060\u0026nbsp;6GB      |               4.57 |           6 |          192 |              997 (79%) |              1925 (77%) |              1785 (72%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;1060M              |               4.44 |           6 |          192 |              983 (78%) |              1882 (75%) |              1803 (72%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;1050M Ti           |               2.49 |           4 |          112 |              631 (86%) |              1224 (84%) |              1115 (77%) |\n| 🟢\u0026nbsp;Quadro\u0026nbsp;P1000                        |               1.89 |           4 |           82 |              426 (79%) |               839 (79%) |               778 (73%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;980\u0026nbsp;Ti        |               6.05 |           6 |          336 |             1509 (69%) |              2703 (62%) |              2381 (55%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;980                |               4.98 |           4 |          224 |             1018 (70%) |              1965 (68%) |              1872 (64%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;970                |               4.17 |           4 |          224 |              980 (67%) |              1721 (59%) |              1623 (56%) |\n| 🟢\u0026nbsp;Quadro\u0026nbsp;M4000                        |               2.57 |           8 |          192 |              899 (72%) |              1519 (61%) |              1050 (42%) |\n| 🟢\u0026nbsp;Tesla\u0026nbsp;M60\u0026nbsp;(1\u0026nbsp;GPU)         |               4.82 |           8 |          160 |              853 (82%) |              1571 (76%) |              1557 (75%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;960M               |               1.51 |           4 |           80 |              442 (84%) |               872 (84%) |               627 (60%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;770                |               3.33 |           2 |          224 |              800 (55%) |              1215 (42%) |               876 (30%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GTX\u0026nbsp;680\u0026nbsp;4GB       |               3.33 |           4 |          192 |              783 (62%) |              1274 (51%) |               814 (33%) |\n| 🟢\u0026nbsp;Quadro\u0026nbsp;K2000                        |               0.73 |           2 |           64 |              312 (75%) |               444 (53%) |               171 (21%) |\n| 🟢\u0026nbsp;GeForce\u0026nbsp;GT\u0026nbsp;630\u0026nbsp;(OEM)      |               0.46 |           2 |           29 |              151 (81%) |               185 (50%) |                78 (21%) |\n| 🟢\u0026nbsp;Quadro\u0026nbsp;NVS\u0026nbsp;290                 |               0.03 |         1/4 |            6 |                9 (22%) |                 4 ( 5%) |                 4 ( 5%) |\n| 🟤\u0026nbsp;Arise\u0026nbsp;1020                          |               1.50 |           2 |           19 |                6 ( 5%) |                 6 ( 2%) |                 6 ( 2%) |\n|                                                  |                    |             |              |                        |                         |                         |\n| ⚪\u0026nbsp;M2\u0026nbsp;Ultra\u0026nbsp;GPU\u0026nbsp;76CU\u0026nbsp;192GB |           19.46 |         147 |          800 |             4629 (89%) |              8769 (84%) |              7972 (77%) |\n| ⚪\u0026nbsp;M2\u0026nbsp;Max\u0026nbsp;GPU\u0026nbsp;38CU\u0026nbsp;32GB |               9.73 |          22 |          400 |             2405 (92%) |              4641 (89%) |              2444 (47%) |\n| ⚪\u0026nbsp;M1\u0026nbsp;Ultra\u0026nbsp;GPU\u0026nbsp;64CU\u0026nbsp;128GB |           16.38 |          98 |          800 |             4519 (86%) |              8418 (81%) |              6915 (67%) |\n| ⚪\u0026nbsp;M1\u0026nbsp;Max\u0026nbsp;GPU\u0026nbsp;24CU\u0026nbsp;32GB |               6.14 |          22 |          400 |             2369 (91%) |              4496 (87%) |              2777 (53%) |\n| ⚪\u0026nbsp;M1\u0026nbsp;Pro\u0026nbsp;GPU\u0026nbsp;16CU\u0026nbsp;16GB |               4.10 |          11 |          200 |             1204 (92%) |              2329 (90%) |              1855 (71%) |\n| ⚪\u0026nbsp;M1\u0026nbsp;GPU\u0026nbsp;8CU\u0026nbsp;16GB           |               2.05 |          11 |           68 |              384 (86%) |               758 (85%) |               759 (86%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;8060S\u0026nbsp;Graphics\u0026nbsp;(Max+\u0026nbsp;395)) | 29.70 |          15 |          256 |             1231 (74%) |              2541 (76%) |              2563 (77%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;780M\u0026nbsp;(Z1\u0026nbsp;Extreme)  |               8.29 |           8 |          102 |              443 (66%) |               860 (65%) |               820 (62%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;Graphics\u0026nbsp;(7800X3D)      |               0.56 |          12 |          102 |              338 (51%) |               498 (37%) |               283 (21%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;Vega\u0026nbsp;8\u0026nbsp;(4750G)     |               2.15 |          27 |           57 |              263 (71%) |               511 (70%) |               501 (68%) |\n| 🔴\u0026nbsp;Radeon\u0026nbsp;Vega\u0026nbsp;8\u0026nbsp;(3500U)     |               1.23 |           7 |           38 |              157 (63%) |               282 (57%) |               288 (58%) |\n| 🔵\u0026nbsp;Arc\u0026nbsp;140V\u0026nbsp;GPU\u0026nbsp;(16GB)       |               3.99 |          16 |          137 |              636 (71%) |              1282 (72%) |               773 (44%) |\n| 🔵\u0026nbsp;Arc\u0026nbsp;Graphics\u0026nbsp;(Ultra\u0026nbsp;9\u0026nbsp;185H) |        4.81 |          14 |           90 |              271 (46%) |               710 (61%) |               724 (62%) |\n| 🔵\u0026nbsp;Iris\u0026nbsp;Xe\u0026nbsp;Graphics\u0026nbsp;(i7-1265U) |             1.92 |          13 |           77 |              342 (68%) |               621 (62%) |               574 (58%) |\n| 🔵\u0026nbsp;UHD\u0026nbsp;Graphics\u0026nbsp;Xe\u0026nbsp;32EUs     |               0.74 |          25 |           51 |              128 (38%) |               245 (37%) |               216 (32%) |\n| 🔵\u0026nbsp;UHD\u0026nbsp;Graphics\u0026nbsp;770               |               0.82 |          30 |           90 |              342 (58%) |               475 (41%) |               278 (24%) |\n| 🔵\u0026nbsp;UHD\u0026nbsp;Graphics\u0026nbsp;630               |               0.46 |           7 |           51 |              151 (45%) |               301 (45%) |               187 (28%) |\n| 🔵\u0026nbsp;UHD\u0026nbsp;Graphics\u0026nbsp;P630              |               0.46 |          51 |           42 |              177 (65%) |               288 (53%) |               137 (25%) |\n| 🔵\u0026nbsp;HD\u0026nbsp;Graphics\u0026nbsp;5500               |               0.35 |           3 |           26 |               75 (45%) |               192 (58%) |               108 (32%) |\n| 🔵\u0026nbsp;HD\u0026nbsp;Graphics\u0026nbsp;4600               |               0.38 |           2 |           26 |              105 (63%) |               115 (35%) |                34 (10%) |\n| 🟡\u0026nbsp;Mali-G610\u0026nbsp;MP4 (Orange\u0026nbsp;Pi\u0026nbsp;5) |             0.06 |          16 |           34 |              130 (58%) |               232 (52%) |                93 (21%) |\n| 🟡\u0026nbsp;Mali-G72\u0026nbsp;MP18 (Samsung\u0026nbsp;S9+)    |               0.24 |           4 |           29 |              110 (59%) |               230 (62%) |                21 ( 6%) |\n|                                                  |                    |             |              |                        |                         |                         |\n| 🔴\u0026nbsp;2x\u0026nbsp;EPYC\u0026nbsp;9754                   |              50.79 |        3072 |          922 |             3276 (54%) |              5077 (42%) |              5179 (43%) |\n| 🔴\u0026nbsp;2x\u0026nbsp;EPYC\u0026nbsp;9654                   |              43.62 |        1536 |          922 |             1381 (23%) |              1814 (15%) |              1801 (15%) |\n| 🔴\u0026nbsp;2x\u0026nbsp;EPYC\u0026nbsp;9554                   |              30.72 |         384 |          922 |             2552 (42%) |              2127 (18%) |              2144 (18%) |\n| 🔴\u0026nbsp;1x\u0026nbsp;EPYC\u0026nbsp;9124                   |               3.69 |         128 |          307 |              772 (38%) |               579 (15%) |               586 (15%) |\n| 🔴\u0026nbsp;2x\u0026nbsp;EPYC\u0026nbsp;7713                   |               8.19 |         512 |          410 |             1298 (48%) |               492 ( 9%) |              1418 (27%) |\n| 🔴\u0026nbsp;2x\u0026nbsp;EPYC\u0026nbsp;7352                   |               3.53 |         512 |          410 |              739 (28%) |               106 ( 2%) |               412 ( 8%) |\n| 🔴\u0026nbsp;2x\u0026nbsp;EPYC\u0026nbsp;7313                   |               3.07 |         128 |          410 |              498 (19%) |               367 ( 7%) |               418 ( 8%) |\n| 🔴\u0026nbsp;2x\u0026nbsp;EPYC\u0026nbsp;7302                   |               3.07 |         128 |          410 |              784 (29%) |               336 ( 6%) |               411 ( 8%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;6980P                  |              98.30 |        6144 |         1690 |             7875 (71%) |              5112 (23%) |              5610 (26%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;6979P                  |              92.16 |        3072 |         1690 |             8135 (74%) |              4175 (19%) |              4622 (21%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;Platinum\u0026nbsp;8592+    |              31.13 |        1024 |          717 |             3135 (67%) |              2359 (25%) |              2466 (26%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;Gold\u0026nbsp;6548N        |              22.94 |        2048 |          666 |             1811 (42%) |              1388 (16%) |              1425 (16%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;CPU\u0026nbsp;Max\u0026nbsp;9480 |              27.24 |         256 |          614 |             2037 (51%) |              1520 (19%) |              1464 (18%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;Platinum\u0026nbsp;8480+    |              28.67 |         512 |          614 |             2162 (54%) |              1845 (23%) |              1884 (24%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;Platinum\u0026nbsp;8470     |              25.29 |        2048 |          614 |             1865 (46%) |              1909 (24%) |              2068 (26%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;Gold\u0026nbsp;6438Y+       |              16.38 |        1024 |          614 |             1945 (48%) |              1219 (15%) |              1257 (16%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;Platinum\u0026nbsp;8380     |              23.55 |        2048 |          410 |             1410 (53%) |              1159 (22%) |              1298 (24%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;Platinum\u0026nbsp;8358     |              21.30 |         256 |          410 |             1285 (48%) |              1007 (19%) |              1120 (21%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;Platinum\u0026nbsp;8256     |               3.89 |        1536 |          282 |              396 (22%) |               158 ( 4%) |               175 ( 5%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;Platinum\u0026nbsp;8153     |               8.19 |         384 |          256 |              691 (41%) |               290 ( 9%) |               328 (10%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;Gold\u0026nbsp;6248R        |              18.43 |         384 |          282 |              755 (41%) |               566 (15%) |               694 (19%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;Gold\u0026nbsp;6128         |               5.22 |         192 |          256 |              254 (15%) |               185 ( 6%) |               193 ( 6%) |\n| 🔵\u0026nbsp;Xeon\u0026nbsp;Phi\u0026nbsp;7210                  |               5.32 |         192 |          102 |              415 (62%) |               193 (15%) |               223 (17%) |\n| 🔵\u0026nbsp;4x\u0026nbsp;Xeon\u0026nbsp;E5-4620\u0026nbsp;v4        |               2.69 |         512 |          273 |              460 (26%) |               275 ( 8%) |               239 ( 7%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;E5-2630\u0026nbsp;v4        |               1.41 |          64 |          137 |              264 (30%) |               146 ( 8%) |               129 ( 7%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;E5-2623\u0026nbsp;v4        |               0.67 |          64 |          137 |              125 (14%) |                66 ( 4%) |                59 ( 3%) |\n| 🔵\u0026nbsp;2x\u0026nbsp;Xeon\u0026nbsp;E5-2680\u0026nbsp;v3        |               1.92 |         128 |          137 |              304 (34%) |               234 (13%) |               291 (16%) |\n| 🟢\u0026nbsp;GH200\u0026nbsp;Neoverse-V2\u0026nbsp;CPU          |               7.88 |         480 |          384 |             1323 (53%) |               853 (17%) |               683 (14%) |\n| 🔴\u0026nbsp;Threadripper\u0026nbsp;PRO\u0026nbsp;7995WX        |              15.36 |         256 |          333 |             1134 (52%) |              1697 (39%) |              1715 (40%) |\n| 🔴\u0026nbsp;Threadripper\u0026nbsp;3970X                  |               3.79 |         128 |          102 |              376 (56%) |               103 ( 8%) |               463 (35%) |\n| 🔴\u0026nbsp;Threadripper\u0026nbsp;1950X                  |               0.87 |         128 |           85 |              273 (49%) |                43 ( 4%) |               151 (14%) |\n| 🔴\u0026nbsp;Ryzen\u0026nbsp;9\u0026nbsp;7900X3D                |               1.69 |         128 |           83 |              278 (51%) |               521 (48%) |               462 (43%) |\n| 🔴\u0026nbsp;Ryzen\u0026nbsp;7\u0026nbsp;7800X3D                |               1.08 |          32 |          102 |              296 (44%) |               361 (27%) |               363 (27%) |\n| 🔴\u0026nbsp;Ryzen\u0026nbsp;7\u0026nbsp;5700X3D                |               0.87 |          32 |           51 |              229 (68%) |               135 (20%) |               173 (26%) |\n| 🔴\u0026nbsp;FX-6100                                  |               0.16 |          16 |           26 |               11 ( 7%) |                11 ( 3%) |                22 ( 7%) |\n| 🔴\u0026nbsp;Athlon\u0026nbsp;X2\u0026nbsp;QL-65                |               0.03 |           4 |           11 |                3 ( 4%) |                 2 ( 2%) |                 3 ( 2%) |\n| 🔵\u0026nbsp;Core\u0026nbsp;Ultra\u0026nbsp;7\u0026nbsp;258V         |               0.56 |          32 |          137 |              287 (32%) |               123 ( 7%) |               167 ( 9%) |\n| 🔵\u0026nbsp;Core\u0026nbsp;Ultra\u0026nbsp;9\u0026nbsp;185H         |               1.79 |          16 |           90 |              317 (54%) |               267 (23%) |               288 (25%) |\n| 🔵\u0026nbsp;Core\u0026nbsp;i9-14900K                      |               3.74 |          32 |           96 |              443 (71%) |               453 (36%) |               490 (39%) |\n| 🔵\u0026nbsp;Core\u0026nbsp;i7-13700K                      |               2.51 |          64 |           90 |              504 (86%) |               398 (34%) |               424 (36%) |\n| 🔵\u0026nbsp;Core\u0026nbsp;i7-1265U                       |               1.23 |          32 |           77 |              128 (26%) |                62 ( 6%) |                58 ( 6%) |\n| 🔵\u0026nbsp;Core\u0026nbsp;i9-11900KB                     |               0.84 |          32 |           51 |              109 (33%) |               195 (29%) |               208 (31%) |\n| 🔵\u0026nbsp;Core\u0026nbsp;i9-10980XE                     |               3.23 |         128 |           94 |              286 (47%) |               251 (21%) |               223 (18%) |\n| 🔵\u0026nbsp;Xeon\u0026nbsp;E-2288G                        |               0.95 |          32 |           43 |              196 (70%) |               182 (33%) |               198 (36%) |\n| 🔵\u0026nbsp;Core\u0026nbsp;i7-9700                        |               0.77 |          64 |           43 |              103 (37%) |                62 (11%) |                95 (17%) |\n| 🔵\u0026nbsp;Core\u0026nbsp;i5-9600                        |               0.60 |          16 |           43 |              146 (52%) |               127 (23%) |               147 (27%) |\n| 🔵\u0026nbsp;Core\u0026nbsp;i7-8700K                       |               0.71 |          16 |           51 |              152 (45%) |               134 (20%) |               116 (17%) |\n| 🔵\u0026nbsp;Xeon\u0026nbsp;E-2176G                        |               0.71 |          64 |           42 |              201 (74%) |               136 (25%) |               148 (27%) |\n| 🔵\u0026nbsp;Core\u0026nbsp;i7-7700HQ                      |               0.36 |          12 |           38 |               81 (32%) |                82 (16%) |               108 (22%) |\n| 🔵\u0026nbsp;Xeon\u0026nbsp;E3-1240\u0026nbsp;v5                |               0.50 |          32 |           34 |              141 (63%) |                75 (17%) |                88 (20%) |\n| 🔵\u0026nbsp;Core\u0026nbsp;i7-4770                        |               0.44 |          16 |           26 |              104 (62%) |                69 (21%) |                59 (18%) |\n| 🔵\u0026nbsp;Core\u0026nbsp;i7-4720HQ                      |               0.33 |          16 |           26 |               80 (48%) |                23 ( 7%) |                60 (18%) |\n| 🔵\u0026nbsp;Celeron\u0026nbsp;N2807                       |               0.01 |           4 |           11 |                7 (10%) |                 3 ( 2%) |                 3 ( 2%) |\n\n\u003c/details\u003e\n\n\n\n## Multi-GPU Benchmarks\n\nMulti-GPU benchmarks are done at the largest possible grid resolution with cubic domains, and either 2x1x1, 2x2x1 or 2x2x2 of these domains together. The (percentages in round brackets) are single-GPU [roofline model](https://en.wikipedia.org/wiki/Roofline_model) efficiency, and the (multiplicators in round brackets) are scaling factors relative to benchmarked single-GPU performance.\n\n```mermaid\ngantt\n\ntitle FluidX3D Performance [MLUPs/s] - FP32 arithmetic, (fastest of FP32/FP16S/FP16C) memory storage\ndateFormat X\naxisFormat %s\n%%{\n\tinit: {\n\t\t\"gantt\": {\n\t\t\t'titleTopMargin': 42,\n\t\t\t'topPadding': 70,\n\t\t\t'leftPadding': 260,\n\t\t\t'rightPadding': 5,\n\t\t\t'sectionFontSize': 20,\n\t\t\t'fontSize': 20,\n\t\t\t'barHeight': 20,\n\t\t\t'barGap': 3,\n\t\t\t'numberSectionStyles': 2\n\t\t},\n\t\t'theme': 'forest',\n\t\t'themeVariables': {\n\t\t\t'sectionBkgColor': '#99999999',\n\t\t\t'altSectionBkgColor': '#00000000',\n\t\t\t'titleColor': '#AFAFAF',\n\t\t\t'textColor': '#AFAFAF',\n\t\t\t'taskTextColor': 'black',\n\t\t\t'taskBorderColor': '#487E3A'\n\t\t}\n\t}\n}%%\n\n\nsection 8x Instinct MI300X\n\t204924 :crit, 0, 204924\nsection 4x Instinct MI300X\n\t109546 :crit, 0, 109546\nsection 2x Instinct MI300X\n\t61053 :crit, 0, 61053\nsection 1x Instinct MI300X\n\t41327 :crit, 0, 41327\n\nsection 4x Instinct MI250 (8 GCD)\n\t53521 :crit, 0, 53521\nsection 2x Instinct MI250 (4 GCD)\n\t29627 :crit, 0, 29627\nsection 1x Instinct MI250 (2 GCD\n\t17338 :crit, 0, 17338\nsection 1x Instinct MI250 (1 GCD)\n\t9030 :crit, 0, 9030\n\nsection 32x Instinct MI210 GigaIO\n\t50952 :crit, 0, 50952\nsection 24x Instinct MI210 GigaIO\n\t45033 :crit, 0, 45033\nsection 16x Instinct MI210 GigaIO\n\t37922 :crit, 0, 37922\nsection 8x Instinct MI210 GigaIO\n\t27996 :crit, 0, 27996\nsection 4x Instinct MI210 GigaIO\n\t17232 :crit, 0, 17232\nsection 2x Instinct MI210 GigaIO\n\t13539 :crit, 0, 13539\nsection 1x Instinct MI210 GigaIO\n\t9105 :crit, 0, 9105\n\nsection 4x Instinct MI210\n\t31408 :crit, 0, 31408\nsection 2x Instinct MI210\n\t16156 :crit, 0, 16156\nsection 1x Instinct MI210\n\t8757 :crit, 0, 8757\n\nsection 3x  MI50 + 1x A100 40GB\n\t22759 :active,crit, 0, 22759\nsection 3x Instinct MI50 32GB\n\t21693 :crit, 0, 21693\nsection 2x Instinct MI50 32GB\n\t14484 :crit, 0, 14484\nsection 1x Instinct MI50 32GB\n\t8477 :crit, 0, 8477\n\nsection 8x Radeon VII\n\t30826 :crit, 0, 30826\nsection 4x Radeon VII\n\t24273 :crit, 0, 24273\nsection 2x Radeon VII\n\t15591 :crit, 0, 15591\nsection 1x Radeon VII\n\t7778 :crit, 0, 7778\n\nsection 4x DC GPU Max 1100\n\t22777 :done, 0, 22777\nsection 2x DC GPU Max 1100\n\t11815 :done, 0, 11815\nsection 1x DC GPU Max 1100\n\t6209 :done, 0, 6209\n\nsection 4x H100 NVL\n\t82122 : 0, 82122\nsection 2x H100 NVL\n\t49958 : 0, 49958\nsection 1x H100 NVL\n\t32922 : 0, 32922\n\nsection 4x H100 SXM5 80GB HBM3\n\t78462 : 0, 78462\nsection 2x H100 SXM5 80GB HBM3\n\t46189 : 0, 46189\nsection 1x H100 SXM5 80GB HBM3\n\t28522 : 0, 28522\n\nsection 4x A100 PCIe 80GB\n\t52056 : 0, 52056\nsection 2x A100 PCIe 80GB\n\t27165 : 0, 27165\nsection 1x A100 PCIe 80GB\n\t17896 : 0, 17896\n\nsection 4x PG506-243/242\n\t41088 : 0, 41088\nsection 2x PG506-243/242\n\t24168 : 0, 24168\nsection 1x PG506-243/242\n\t15654 : 0, 15654\n\nsection 8x A100 SXM4 40GB\n\t72965 : 0, 72965\nsection 4x A100 SXM4 40GB\n\t42400 : 0, 42400\nsection 2x A100 SXM4 40GB\n\t23707 : 0, 23707\nsection 1x A100 SXM4 40GB\n\t15917 : 0, 15917\n\nsection 4x V100 SXM2 32GB\n\t26527 : 0, 26527\nsection 2x V100 SXM2 32GB\n\t15469 : 0, 15469\nsection 1x V100 SXM2 32GB\n\t8947 : 0, 8947\n\nsection 3x K40m + 1x Titan Xp\n\t5174 : 0, 5174\nsection 2x Tesla K40m\n\t3300 : 0, 3300\nsection 1x Tesla K40m\n\t1868 : 0, 1868\n\nsection 1x Tesla K80 (2 GPU)\n\t3448 : 0, 3448\nsection 1x Tesla K80 (1 GPU)\n\t1642 : 0, 1642\n\nsection 2x L40S\n\t13640 : 0, 13640\nsection 1x L40S\n\t7669 : 0, 7669\n\nsection 2x L40\n\t14164 : 0, 14164\nsection 1x L40\n\t7945 : 0, 7945\n\nsection 8x RTX A6000\n\t40063 : 0, 40063\nsection 4x RTX A6000\n\t27915 : 0, 27915\nsection 2x RTX A6000\n\t15026 : 0, 15026\nsection 1x RTX A6000\n\t8814 : 0, 8814\n\nsection 2x A2\n\t3539 : 0, 3539\nsection 1x A2\n\t2051 : 0, 2051\n\nsection 2x Quadro RTX 8000 Pa.\n\t10214 : 0, 10214\nsection 1x Quadro RTX 8000 Pa.\n\t5607 : 0, 5607\n\nsection 7x 2080 Ti + 1x A100 40GB\n\t33857 : 0, 33857\nsection 4x GeForce RTX 2080 Ti\n\t18598 : 0, 18598\nsection 2x GeForce RTX 2080 Ti\n\t10922 : 0, 10922\nsection 1x GeForce RTX 2080 Ti\n\t6853 : 0, 6853\n\nsection 2x Arc A770\n\t8745 :done, 0, 8745\nsection 1x Arc A770\n\t4568 :done, 0, 4568\n\nsection 1x A100 + 1x P100 + 2x A2 + 3x MI50 + 1x A770\n\t17296 :active,done, 0, 17296\nsection 1x A770 + 1x Titan Xp\n\t8380 :active,done, 0, 8380\n```\n\n\u003cdetails\u003e\u003csummary\u003eMulti-GPU Benchmark Table\u003c/summary\u003e\n\nColors: 🔴 AMD, 🔵 Intel, 🟢 Nvidia, ⚪ Apple, 🟡 ARM, 🟤 Glenfly\n\n| Device                                                          | FP32\u003cbr\u003e[TFlops/s] | Mem\u003cbr\u003e[GB] | BW\u003cbr\u003e[GB/s] | FP32/FP32\u003cbr\u003e[MLUPs/s] | FP32/FP16S\u003cbr\u003e[MLUPs/s] | FP32/FP16C\u003cbr\u003e[MLUPs/s] |\n| :-------------------------------------------------------------- | -----------------: | ----------: | -----------: | ---------------------: | ----------------------: | ----------------------: |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🔴\u0026nbsp;8x\u0026nbsp;Instinct\u0026nbsp;MI300X                            |            1307.20 |        1536 |        42400 |     152835\u0026nbsp;(6.7x) |      192297\u0026nbsp;(4.7x) |      204924\u0026nbsp;(6.5x) |\n| 🔴\u0026nbsp;4x\u0026nbsp;Instinct\u0026nbsp;MI300X                            |             653.60 |         768 |        21200 |      83678\u0026nbsp;(3.7x) |      103200\u0026nbsp;(2.5x) |      109546\u0026nbsp;(3.5x) |\n| 🔴\u0026nbsp;2x\u0026nbsp;Instinct\u0026nbsp;MI300X                            |             326.80 |         384 |        10600 |      46673\u0026nbsp;(2.0x) |       61053\u0026nbsp;(1.5x) |       57391\u0026nbsp;(1.8x) |\n| 🔴\u0026nbsp;1x\u0026nbsp;Instinct\u0026nbsp;MI300X                            |             163.40 |         192 |         5300 |       22867\u0026nbsp;(66%) |        41327\u0026nbsp;(60%) |        31670\u0026nbsp;(46%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🔴\u0026nbsp;4x\u0026nbsp;Instinct\u0026nbsp;MI250\u0026nbsp;(8\u0026nbsp;GCD)           |             362.08 |         512 |        13107 |      27350\u0026nbsp;(4.9x) |            52258 (5.8x) |            53521 (6.3x) |\n| 🔴\u0026nbsp;2x\u0026nbsp;Instinct\u0026nbsp;MI250\u0026nbsp;(4\u0026nbsp;GCD)           |             181.04 |         256 |         6554 |      16925\u0026nbsp;(3.0x) |            29163 (3.2x) |            29627 (3.5x) |\n| 🔴\u0026nbsp;1x\u0026nbsp;Instinct\u0026nbsp;MI250\u0026nbsp;(2\u0026nbsp;GCD)           |              90.52 |         128 |         3277 |            9460 (1.7x) |            14313 (1.6x) |            17338 (2.0x) |\n| 🔴\u0026nbsp;1x\u0026nbsp;Instinct\u0026nbsp;MI250\u0026nbsp;(1\u0026nbsp;GCD)           |              45.26 |          64 |         1638 |             5638 (53%) |              9030 (42%) |              8506 (40%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🔴\u0026nbsp;32x\u0026nbsp;Instinct\u0026nbsp;MI210\u0026nbsp;GigaIO                |            1448.32 |        2048 |        52429 |      23881\u0026nbsp;(3.8x) |            50952 (6.0x) |            48848 (5.4x) |\n| 🔴\u0026nbsp;24x\u0026nbsp;Instinct\u0026nbsp;MI210\u0026nbsp;GigaIO                |            1086.24 |        1536 |        39322 |      22056\u0026nbsp;(3.5x) |            45033 (5.3x) |            44631 (4.9x) |\n| 🔴\u0026nbsp;16x\u0026nbsp;Instinct\u0026nbsp;MI210\u0026nbsp;GigaIO                |             724.16 |        1024 |        26214 |      18094\u0026nbsp;(2.9x) |            37360 (4.4x) |            37922 (4.2x) |\n| 🔴\u0026nbsp;\u0026nbsp;\u0026nbsp;8x\u0026nbsp;Instinct\u0026nbsp;MI210\u0026nbsp;GigaIO     |             362.08 |         512 |        13107 |      13546\u0026nbsp;(2.1x) |            27996 (3.3x) |            27820 (3.1x) |\n| 🔴\u0026nbsp;\u0026nbsp;\u0026nbsp;4x\u0026nbsp;Instinct\u0026nbsp;MI210\u0026nbsp;GigaIO     |             181.04 |         256 |         6554 |            8816 (1.4x) |            17232 (2.0x) |            16892 (1.9x) |\n| 🔴\u0026nbsp;\u0026nbsp;\u0026nbsp;2x\u0026nbsp;Instinct\u0026nbsp;MI210\u0026nbsp;GigaIO     |              90.52 |         128 |         3277 |            7245 (1.1x) |            12050 (1.4x) |            13539 (1.5x) |\n| 🔴\u0026nbsp;\u0026nbsp;\u0026nbsp;1x\u0026nbsp;Instinct\u0026nbsp;MI210\u0026nbsp;GigaIO     |              45.26 |          64 |         1638 |             6347 (59%) |              8486 (40%) |              9105 (43%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🔴\u0026nbsp;4x\u0026nbsp;Instinct\u0026nbsp;MI210                             |             181.04 |         256 |         6554 |           17075 (2.6x) |            31408 (3.6x) |            30643 (3.5x) |\n| 🔴\u0026nbsp;2x\u0026nbsp;Instinct\u0026nbsp;MI210                             |              90.52 |         128 |         3277 |            9624 (1.5x) |            15909 (1.8x) |            16156 (1.8x) |\n| 🔴\u0026nbsp;1x\u0026nbsp;Instinct\u0026nbsp;MI210                             |              45.26 |          64 |         1638 |             6454 (60%) |              8757 (41%) |              8751 (41%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🔴\u0026nbsp;3x\u0026nbsp;MI50\u0026nbsp;32GB + 🟢\u0026nbsp;1x\u0026nbsp;A100\u0026nbsp;40GB |              52.99 |         128 |         4096 |      13159\u0026nbsp;(3.0x) |       22759\u0026nbsp;(2.7x) |       11953\u0026nbsp;(2.7x) |\n| 🔴\u0026nbsp;3x\u0026nbsp;Instinct\u0026nbsp;MI50\u0026nbsp;32GB                    |              39.74 |          96 |         3072 |      11709\u0026nbsp;(2.6x) |       21693\u0026nbsp;(2.6x) |             9969 (2.3x) |\n| 🔴\u0026nbsp;2x\u0026nbsp;Instinct\u0026nbsp;MI50\u0026nbsp;32GB                    |              26.50 |          64 |         2048 |            7803 (1.8x) |       14484\u0026nbsp;(1.7x) |             6647 (1.5x) |\n| 🔴\u0026nbsp;1x\u0026nbsp;Instinct\u0026nbsp;MI50\u0026nbsp;32GB                    |              13.25 |          32 |         1024 |             4446 (66%) |              8477 (64%) |              4406 (33%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🔴\u0026nbsp;8x\u0026nbsp;Radeon\u0026nbsp;VII                                 |             110.64 |         128 |         8192 |      21946\u0026nbsp;(4.5x) |            30826 (4.0x) |            24572 (4.7x) |\n| 🔴\u0026nbsp;4x\u0026nbsp;Radeon\u0026nbsp;VII                                 |              55.32 |          64 |         4096 |      12911\u0026nbsp;(2.6x) |            24273 (3.1x) |            17080 (3.2x) |\n| 🔴\u0026nbsp;2x\u0026nbsp;Radeon\u0026nbsp;VII                                 |              27.66 |          32 |         2048 |            8113 (1.7x) |            15591 (2.0x) |            10352 (2.0x) |\n| 🔴\u0026nbsp;1x\u0026nbsp;Radeon\u0026nbsp;VII                                 |              13.83 |          16 |         1024 |             4898 (73%) |              7778 (58%) |              5256 (40%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🔵\u0026nbsp;4x\u0026nbsp;DC\u0026nbsp;GPU\u0026nbsp;Max\u0026nbsp;1100                  |              88.88 |         192 |         4915 |           12162 (3.5x) |            22777 (3.7x) |            11759 (3.6x) |\n| 🔵\u0026nbsp;2x\u0026nbsp;DC\u0026nbsp;GPU\u0026nbsp;Max\u0026nbsp;1100                  |              44.44 |          96 |         2458 |            6301 (1.8x) |            11815 (1.9x) |             5970 (1.8x) |\n| 🔵\u0026nbsp;1x\u0026nbsp;DC\u0026nbsp;GPU\u0026nbsp;Max\u0026nbsp;1100                  |              22.22 |          48 |         1229 |             3487 (43%) |              6209 (39%) |              3252 (20%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🟢\u0026nbsp;4x\u0026nbsp;H100\u0026nbsp;NVL                                   |             241.28 |         376 |        15752 |      44284\u0026nbsp;(2.2x) |            82122 (2.5x) |            53855 (2.9x) |\n| 🟢\u0026nbsp;2x\u0026nbsp;H100\u0026nbsp;NVL                                   |             120.64 |         188 |         7876 |      29050\u0026nbsp;(1.4x) |            49958 (1.5x) |            30586 (1.7x) |\n| 🟢\u0026nbsp;1x\u0026nbsp;H100\u0026nbsp;NVL                                   |              60.32 |          94 |         3938 |       20303\u0026nbsp;(79%) |             32922 (64%) |             18424 (36%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🟢\u0026nbsp;4x\u0026nbsp;H100\u0026nbsp;SXM5\u0026nbsp;80GB\u0026nbsp;HBM3              |             267.63 |         320 |        13400 |      46442\u0026nbsp;(2.7x) |            78462 (2.8x) |            60490 (3.0x) |\n| 🟢\u0026nbsp;2x\u0026nbsp;H100\u0026nbsp;SXM5\u0026nbsp;80GB\u0026nbsp;HBM3              |             133.82 |         160 |         6700 |      26838\u0026nbsp;(1.6x) |            46189 (1.6x) |            34147 (1.7x) |\n| 🟢\u0026nbsp;1x\u0026nbsp;H100\u0026nbsp;SXM5\u0026nbsp;80GB\u0026nbsp;HBM3              |              66.91 |          80 |         3350 |       17262\u0026nbsp;(79%) |             28522 (66%) |             20065 (46%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🟢\u0026nbsp;4x\u0026nbsp;A100\u0026nbsp;PCIe\u0026nbsp;80GB                        |              77.96 |         320 |         7740 |      25957\u0026nbsp;(2.7x) |       52056\u0026nbsp;(2.9x) |       33283\u0026nbsp;(3.1x) |\n| 🟢\u0026nbsp;2x\u0026nbsp;A100\u0026nbsp;PCIe\u0026nbsp;80GB                        |              38.98 |         160 |         3870 |      15742\u0026nbsp;(1.6x) |       27165\u0026nbsp;(1.5x) |       17510\u0026nbsp;(1.6x) |\n| 🟢\u0026nbsp;1x\u0026nbsp;A100\u0026nbsp;PCIe\u0026nbsp;80GB                        |              19.49 |          80 |         1935 |             9657 (76%) |        17896\u0026nbsp;(71%) |        10817\u0026nbsp;(43%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🟢\u0026nbsp;4x\u0026nbsp;PG506-243\u0026nbsp;/\u0026nbsp;PG506-242                 |              88.57 |         256 |         6554 |      23097\u0026nbsp;(2.8x) |       41088\u0026nbsp;(2.6x) |       36130\u0026nbsp;(2.9x) |\n| 🟢\u0026nbsp;2x\u0026nbsp;PG506-243\u0026nbsp;/\u0026nbsp;PG506-242                 |              44.28 |         128 |         3277 |      13885\u0026nbsp;(1.7x) |       24168\u0026nbsp;(1.5x) |       20906\u0026nbsp;(1.7x) |\n| 🟢\u0026nbsp;1x\u0026nbsp;PG506-243\u0026nbsp;/\u0026nbsp;PG506-242                 |              22.14 |          64 |         1638 |             8195 (77%) |        15654\u0026nbsp;(74%) |        12271\u0026nbsp;(58%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🟢\u0026nbsp;8x\u0026nbsp;A100\u0026nbsp;SXM4\u0026nbsp;40GB                        |             155.92 |         320 |        12440 |      37619\u0026nbsp;(4.4x) |            72965 (4.6x) |            63009 (7.2x) |\n| 🟢\u0026nbsp;4x\u0026nbsp;A100\u0026nbsp;SXM4\u0026nbsp;40GB                        |              77.96 |         160 |         6220 |      23411\u0026nbsp;(2.7x) |            42400 (2.7x) |            29017 (3.3x) |\n| 🟢\u0026nbsp;2x\u0026nbsp;A100\u0026nbsp;SXM4\u0026nbsp;40GB                        |              38.98 |          80 |         3110 |      14311\u0026nbsp;(1.7x) |            23707 (1.5x) |            15512 (1.8x) |\n| 🟢\u0026nbsp;1x\u0026nbsp;A100\u0026nbsp;SXM4\u0026nbsp;40GB                        |              19.49 |          40 |         1555 |             8543 (84%) |        15917\u0026nbsp;(79%) |              8748 (43%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🟢\u0026nbsp;4x\u0026nbsp;Tesla\u0026nbsp;V100\u0026nbsp;SXM2\u0026nbsp;32GB             |              62.68 |         128 |         3600 |      13135\u0026nbsp;(2.9x) |            26527 (3.0x) |            22686 (3.1x) |\n| 🟢\u0026nbsp;2x\u0026nbsp;Tesla\u0026nbsp;V100\u0026nbsp;SXM2\u0026nbsp;32GB             |              31.34 |          64 |         1800 |            7953 (1.8x) |            15469 (1.7x) |            12932 (1.8x) |\n| 🟢\u0026nbsp;1x\u0026nbsp;Tesla\u0026nbsp;V100\u0026nbsp;SXM2\u0026nbsp;32GB             |              15.67 |          32 |          900 |             4471 (76%) |              8947 (77%) |              7217 (62%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🟢\u0026nbsp;3x\u0026nbsp;K40m\u0026nbsp;+\u0026nbsp;1x\u0026nbsp;Titan\u0026nbsp;Xp          |              17.16 |          48 |         1154 |            3117 (2.8x) |             5174 (2.8x) |             3127 (3.4x) |\n| 🟢\u0026nbsp;2x\u0026nbsp;Tesla\u0026nbsp;K40m                                 |               8.58 |          24 |          577 |            1971 (1.7x) |             3300 (1.8x) |             1801 (2.0x) |\n| 🟢\u0026nbsp;1x\u0026nbsp;Tesla\u0026nbsp;K40m                                 |               4.29 |          12 |          288 |             1131 (60%) |              1868 (50%) |               912 (24%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🟢\u0026nbsp;1x\u0026nbsp;Tesla\u0026nbsp;K80\u0026nbsp;(2\u0026nbsp;GPU)                |               8.22 |          24 |          480 |            2086 (2.3x) |             3448 (2.1x) |             2174 (2.3x) |\n| 🟢\u0026nbsp;1x\u0026nbsp;Tesla\u0026nbsp;K80\u0026nbsp;(1\u0026nbsp;GPU)                |               4.11 |          12 |          240 |              916 (58%) |              1642 (53%) |               943 (30%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🟢\u0026nbsp;2x\u0026nbsp;L40S                                            |             183.22 |          96 |         1728 |            6888 (1.8x) |            13099 (1.8x) |            13640 (1.8x) |\n| 🟢\u0026nbsp;1x\u0026nbsp;L40S                                            |              91.61 |          48 |          864 |             3824 (68%) |              7463 (67%) |              7669 (68%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🟢\u0026nbsp;2x\u0026nbsp;L40                                             |             181.04 |          96 |         1728 |            7137 (1.8x) |            13547 (1.7x) |            14164 (1.8x) |\n| 🟢\u0026nbsp;1x\u0026nbsp;L40                                             |              90.52 |          48 |          864 |             3870 (69%) |              7778 (69%) |              7945 (71%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🟢\u0026nbsp;8x\u0026nbsp;RTX\u0026nbsp;A6000                                  |             320.00 |         384 |         6144 |      19311\u0026nbsp;(4.4x) |            40063 (4.5x) |            39004 (4.6x) |\n| 🟢\u0026nbsp;4x\u0026nbsp;RTX\u0026nbsp;A6000                                  |             160.00 |         192 |         3072 |      14314\u0026nbsp;(3.2x) |            27915 (3.2x) |            27227 (3.2x) |\n| 🟢\u0026nbsp;2x\u0026nbsp;RTX\u0026nbsp;A6000                                  |              80.00 |          96 |         1536 |            8041 (1.8x) |            15026 (1.7x) |            14795 (1.7x) |\n| 🟢\u0026nbsp;1x\u0026nbsp;RTX\u0026nbsp;A6000                                  |              40.00 |          48 |          768 |             4421 (88%) |              8814 (88%) |              8533 (86%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🟢\u0026nbsp;2x\u0026nbsp;A2                                              |               9.06 |          30 |          400 |            1927 (1.9x) |             3539 (1.7x) |             2232 (1.9x) |\n| 🟢\u0026nbsp;1x\u0026nbsp;A2                                              |               4.53 |          15 |          200 |             1031 (79%) |              2051 (79%) |              1199 (46%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🟢\u0026nbsp;2x\u0026nbsp;Quadro\u0026nbsp;RTX\u0026nbsp;8000\u0026nbsp;Pa.              |              29.86 |          96 |         1248 |            4767 (1.8x) |             9607 (1.8x) |            10214 (1.8x) |\n| 🟢\u0026nbsp;1x\u0026nbsp;Quadro\u0026nbsp;RTX\u0026nbsp;8000\u0026nbsp;Pa.              |              14.93 |          48 |          624 |             2591 (64%) |              5408 (67%) |              5607 (69%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🟢\u0026nbsp;7x\u0026nbsp;2080\u0026nbsp;Ti\u0026nbsp;+\u0026nbsp;1x\u0026nbsp;A100\u0026nbsp;40GB |             107.60 |          88 |         4928 |      16146\u0026nbsp;(5.1x) |            33732 (5.0x) |            33857 (4.9x) |\n| 🟢\u0026nbsp;4x\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;2080\u0026nbsp;Ti              |              53.80 |          44 |         2464 |            9117 (2.9x) |            18415 (2.7x) |            18598 (2.7x) |\n| 🟢\u0026nbsp;2x\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;2080\u0026nbsp;Ti              |              26.90 |          22 |         1232 |            5085 (1.6x) |            10770 (1.6x) |            10922 (1.6x) |\n| 🟢\u0026nbsp;1x\u0026nbsp;GeForce\u0026nbsp;RTX\u0026nbsp;2080\u0026nbsp;Ti              |              13.45 |          11 |          616 |             3194 (79%) |              6700 (84%) |              6853 (86%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🔵\u0026nbsp;2x\u0026nbsp;Arc\u0026nbsp;A770                                   |              39.32 |          32 |         1120 |            4954 (1.9x) |             8745 (1.9x) |             8329 (1.8x) |\n| 🔵\u0026nbsp;1x\u0026nbsp;Arc\u0026nbsp;A770                                   |              19.66 |          16 |          560 |             2663 (73\u0026) |              4568 (63%) |              4519 (62%) |\n|                                                                 |                    |             |              |                        |                         |                         |\n| 🟢\u0026nbsp;1x\u0026nbsp;A100\u0026nbsp;40GB + 🟢\u0026nbsp;1x\u0026nbsp;P100\u0026nbsp;16GB + 🟢\u0026nbsp;2x\u0026nbsp;A2 + 🔴\u0026nbsp;3x\u0026nbsp;MI50\u0026nbsp;32GB + 🔵\u0026nbsp;1x\u0026nbsp;A770\u0026nbsp;16GB | 54.36 | 180 | 2400 | 9903 (63%) | 17296 (55%) | 12041 (39%) |\n| 🔵\u0026nbsp;1x\u0026nbsp;A770\u0026nbsp;+\u0026nbsp;🟢\u0026nbsp;1x\u0026nbsp;Titan\u0026nbsp;Xp  |              24.30 |          24 |         1095 |             4717 (66%) |              8380 (59%) |              8026 (56%) |\n\n\u003c/details\u003e\n\n\n\n## FAQs\n\n### General\n\n- \u003cdetails\u003e\u003csummary\u003eHow to learn using FluidX3D?\u003c/summary\u003e\u003cbr\u003eFollow the \u003ca href=\"https://github.com/ProjectPhysX/FluidX3D/blob/master/DOCUMENTATION.md\"\u003eFluidX3D Documentation\u003c/a\u003e!\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eWhat physical model does FluidX3D use?\u003c/summary\u003e\u003cbr\u003eFluidX3D implements the lattice Boltzmann method, a type of direct numerical simulation (DNS), the most accurate type of fluid simulation, but also the most computationally challenging. Optional extension models include volume force (Guo forcing), free surface (\u003ca href=\"https://doi.org/10.3390/computation10060092\"\u003evolume-of-fluid\u003c/a\u003e and \u003ca href=\"https://doi.org/10.3390/computation10020021\"\u003ePLIC\u003c/a\u003e), a temperature model and Smagorinsky-Lilly subgrid turbulence model.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eFluidX3D only uses FP32 or even FP32/FP16, in contrast to FP64. Are simulation results physically accurate?\u003c/summary\u003e\u003cbr\u003eYes, in all but extreme edge cases. The code has been specially optimized to minimize arithmetic round-off errors and make the most out of lower precision. With these optimizations, accuracy in most cases is indistinguishable from FP64 double-precision, even with FP32/FP16 mixed-precision. Details can be found in \u003ca href=\"https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats\"\u003ethis paper\u003c/a\u003e.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eCompared to the benchmark numbers stated \u003ca href=\"https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats\"\u003ehere\u003c/a\u003e, efficiency seems much lower but performance is slightly better for most devices. How can this be?\u003c/summary\u003e\u003cbr\u003eIn that paper, the One-Step-Pull swap algorithm is implemented, using only misaligned reads and coalesced writes. On almost all GPUs, the performance penalty for misaligned writes is much larger than for misaligned reads, and sometimes there is almost no penalty for misaligned reads at all. Because of this, One-Step-Pull runs at peak bandwidth and thus peak efficiency.\u003cbr\u003eHere, a different swap algorithm termed \u003ca href=\"https://doi.org/10.3390/computation10060092\"\u003eEsoteric-Pull\u003c/a\u003e is used, a type of in-place streaming. This makes the LBM require much less memory (93 vs. 169 (FP32/FP32) or 55 vs. 93 (FP32/FP16) Bytes/cell for D3Q19), and also less memory bandwidth (153 vs. 171 (FP32/FP32) or 77 vs. 95 (FP32/FP16) Bytes/cell per time step for D3Q19) due to so-called implicit bounce-back boundaries. However memory access now is half coalesced and half misaligned for both reads and writes, so memory access efficiency is lower. For overall performance, these two effects approximately cancel out. The benefit of Esoteric-Pull - being able to simulate domains twice as large with the same amount of memory - clearly outweights the cost of slightly lower memory access efficiency, especially since performance is not reduced overall.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eWhy don't you use CUDA? Wouldn't that be more efficient?\u003c/summary\u003e\u003cbr\u003eNo, that is a wrong myth. OpenCL is exactly as efficient as CUDA on Nvidia GPUs if optimized properly. \u003ca href=\"https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats\"\u003eHere\u003c/a\u003e I did roofline model and analyzed OpenCL performance on various hardware. OpenCL efficiency on modern Nvidia GPUs can be 100% with the right memory access pattern, so CUDA can't possibly be any more efficient. Without any performance advantage, there is no reason to use proprietary CUDA over OpenCL, since OpenCL is compatible with a lot more hardware.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eWhy no multi-relaxation-time (MRT) collision operator?\u003c/summary\u003e\u003cbr\u003eThe idea of MRT is to linearly transform the DDFs into \"moment space\" by matrix multiplication and relax these moments individually, promising better stability and accuracy. In practice, in the vast majority of cases, it has zero or even negative effects on stability and accuracy, and simple SRT is much superior. Apart from the kinematic shear viscosity and conserved terms, the remaining moments are non-physical quantities and their tuning is a blackbox. Although MRT can be implemented in an efficient manner with only a single matrix-vector multiplication in registers, leading to identical performance compared to SRT by remaining bandwidth-bound, storing the matrices vastly elongates and over-complicates the code for no real benefit.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n### Hardware\n\n- \u003cdetails\u003e\u003csummary\u003eCan FluidX3D run on multiple GPUs at the same time?\u003c/summary\u003e\u003cbr\u003eYes. The simulation grid is then split in domains, one for each GPU (domain decomposition method). The GPUs essentially pool their memory, enabling much larger grid resolution and higher performance. Rendering is parallelized across multiple GPUs as well; each GPU renders its own domain with a 3D offset, then rendered frames from all GPUs are overlayed with their z-buffers. Communication between domains is done over PCIe, so no SLI/Crossfire/NVLink/InfinityFabric is required. All GPUs must however be installed in the same node (PC/laptop/server). Even \u003ca href=\"https://youtu.be/_8Ed8ET9gBU\"\u003eunholy combinations of AMD+Intel+Nvidia GPUs will work\u003c/a\u003e, although it is recommended to only use GPUs with similar memory capacity and bandwidth together. Using a fast gaming GPU and slow integrated GPU together would only decrease performance due to communication overhead.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eI'm on a budget and have only a cheap computer. Can I run FluidX3D on my toaster PC/laptop?\u003c/summary\u003e\u003cbr\u003eAbsolutely. Today even the most inexpensive hardware, like integrated GPUs or entry-level gaming GPUs, support OpenCL. You might be a bit more limited on memory capacity and grid resolution, but you should be good to go. I've tested FluidX3D on very old and inexpensive hardware and even on my Samsung S9+ smartphone, and it runs just fine, although admittedly a bit slower.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eI don't have an expensive workstation GPU, but only a gaming GPU. Will performance suffer?\u003c/summary\u003e\u003cbr\u003eNo. Efficiency on gaming GPUs is exactly as good as on their \"professional\"/workstation counterparts. Performance often is even better as gaming GPUs have higher boost clocks.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eDo I need a GPU with ECC memory?\u003c/summary\u003e\u003cbr\u003eNo. Gaming GPUs work just fine. Some Nvidia GPUs automatically reduce memory clocks for compute applications to almost entirely eliminate memory errors.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eMy GPU does not support CUDA. Can I still use FluidX3D?\u003c/summary\u003e\u003cbr\u003eYes. FluidX3D uses OpenCL 1.2 and not CUDA, so it runs on any GPU from any vendor since around 2012.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eI don't have a dedicated graphics card at all. Can I still run FluidX3D on my PC/laptop?\u003c/summary\u003e\u003cbr\u003eYes. FluidX3D also runs on all integrated GPUs since around 2012, and also on CPUs.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eI need more memory than my GPU can offer. Can I run FluidX3D on my CPU as well?\u003c/summary\u003e\u003cbr\u003eYes. You only need to install the \u003ca href=\"https://www.intel.com/content/www/us/en/developer/articles/technical/intel-cpu-runtime-for-opencl-applications-with-sycl-support.html\"\u003eIntel OpenCL CPU Runtime\u003c/a\u003e.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eIn the benchmarks you list some very expensive hardware. How do you get access to that?\u003c/summary\u003e\u003cbr\u003eAs a PhD candidate in computational physics, I used FluidX3D for my research, so I had access to BZHPC, SuperMUC-NG and JSC JURECA-DC supercomputers.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n### Graphics\n\n- \u003cdetails\u003e\u003csummary\u003eI don't have an RTX/DXR GPU that supports raytracing. Can I still use raytracing graphics in FluidX3D?\u003c/summary\u003e\u003cbr\u003eYes, and at full performance. FluidX3D does not use a bounding volume hierarchy (BVH) to accelerate raytracing, but fast ray-grid traversal instead, implemented directly in OpenCL C. This is much faster than BVH for moving isosurfaces in the LBM grid (~N vs. ~N²+log(N) runtime; LBM itself is ~N³), and it does not require any dedicated raytracing hardware. Raytracing in FluidX3D runs on any GPU that supports OpenCL 1.2.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eI have a datacenter/mining GPU without any video output or graphics hardware. Can FluidX3D still render simulation results?\u003c/summary\u003e\u003cbr\u003eYes. FluidX3D does all rendering (rasterization and raytracing) in OpenCL C, so no display output and no graphics features like OpenGL/Vulkan/DirectX are required. Rendering is just another form of compute after all. Rendered frames are passed to the CPU over PCIe and then the CPU can either draw them on screen through dedicated/integrated graphics or write them to the hard drive.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eI'm running FluidX3D on a remote (super-)computer and only have an SSH terminal. Can I still use graphics somehow?\u003c/summary\u003e\u003cbr\u003eYes, either directly as interactive ASCII graphics in the terminal or by storing rendered frames on the hard drive and then copying them over via `scp -r user@server.url:\"~/path/to/images/folder\" .`.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n### Licensing\n\n- \u003cdetails\u003e\u003csummary\u003eI want to learn about programming/software/physics/engineering. Can I use FluidX3D for free?\u003c/summary\u003e\u003cbr\u003eYes. Anyone can use FluidX3D for free for public research, education or personal use. Use by scientists, students and hobbyists is free of charge and well encouraged.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eI am a scientist/teacher with a paid position at a public institution. Can I use FluidX3D for my research/teaching?\u003c/summary\u003e\u003cbr\u003eYes, you can use FluidX3D free of charge. This is considered research/education, not commercial use. To give credit, the \u003ca href=\"https://github.com/ProjectPhysX/FluidX3D#references\"\u003ereferences\u003c/a\u003e listed below should be cited. If you publish data/results generated by altered source versions, the altered source code must be published as well.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eI work at a company in CFD/consulting/R\u0026D or related fields. Can I use FluidX3D commercially?\u003c/summary\u003e\u003cbr\u003eNo. Commercial use is not allowed with the current license.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eIs FluidX3D open-source?\u003c/summary\u003e\u003cbr\u003eNo. \"Open-source\" as a technical term is defined as freely available without any restriction on use, but I am not comfortable with that. I have written FluidX3D in my spare time and no one should milk it for profits while I remain uncompensated, especially considering what other CFD software sells for. The technical term for the type of license I choose is \"source-available no-cost non-commercial\". The source code is freely available, and you are free to use, to alter and to redistribute it, as long as you do not sell it or make a profit from derived products/services, and as long as you do not use it for any military purposes (see the \u003ca href=\"https://github.com/ProjectPhysX/FluidX3D/blob/master/LICENSE.md\"\u003elicense\u003c/a\u003e for details).\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n- \u003cdetails\u003e\u003csummary\u003eWill FluidX3D at some point be available with a commercial license?\u003c/summary\u003e\u003cbr\u003eMaybe I will add the option for a second, commercial license later on. If you are interested in commercial use, let me know. For non-commercial use in science and education, FluidX3D is and will always be free.\u003cbr\u003e\u003cbr\u003e\u003c/details\u003e\n\n\n\n## External Code/Libraries/Images used in FluidX3D\n\n- [OpenCL-Headers](https://github.com/KhronosGroup/OpenCL-Headers) and [C++ Wrapper](https://github.com/KhronosGroup/OpenCL-CLHPP) for GPU parallelization ([Khronos Group](https://www.khronos.org/opencl/))\n- [Win32 API](https://learn.microsoft.com/en-us/windows/win32/api/winbase/) for interactive graphics in Windows ([Microsoft](https://www.microsoft.com/))\n- [X11/Xlib](https://www.x.org/releases/current/doc/libX11/libX11/libX11.html) for interactive graphics in Linux ([The Open Group](https://www.x.org/releases/current/doc/libX11/libX11/libX11.html))\n- [marching-cubes tables](http://paulbourke.net/geometry/polygonise/) for isosurface generation on GPU ([Paul Bourke](http://paulbourke.net/geometry/))\n- [`src/lodepng.cpp`](https://github.com/lvandeve/lodepng/blob/master/lodepng.cpp) and [`src/lodepng.hpp`](https://github.com/lvandeve/lodepng/blob/master/lodepng.h) for `.png` encoding and decoding ([Lode Vandevenne](https://lodev.org/))\n- [SimplexNoise](https://weber.itn.liu.se/~stegu/simplexnoise/SimplexNoise.java) class in [`src/utilities.hpp`](https://github.com/ProjectPhysX/FluidX3D/blob/master/src/utilities.hpp) for generating continuous noise in 2D/3D/4D space ([Stefan Gustavson](https://github.com/stegu))\n- [`skybox/skybox8k.png`](https://www.hdri-hub.com/hdri-skies-aviation-aerospace) for free surface raytracing ([HDRI Hub](https://www.hdri-hub.com/))\n\n\n\n## References\n\n- Lehmann, M.: [Computational study of microplastic transport at the water-air interface with a memory-optimized lattice Boltzmann method](https://doi.org/10.15495/EPub_UBT_00006977). PhD thesis, (2023)\n- Lehmann, M.: [Esoteric Pull and Esoteric Push: Two Simple In-Place Streaming Schemes for the Lattice Boltzmann Method on GPUs](https://doi.org/10.3390/computation10060092). Computation, 10, 92, (2022)\n- Lehmann, M., Krause, M., Amati, G., Sega, M., Harting, J. and Gekle, S.: [Accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit, and customized 16-bit number formats](https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats). Phys. Rev. E 106, 015308, (2022)\n- Lehmann, M.: [Combined scientific CFD simulation and interactive raytracing with OpenCL](https://www.researchgate.net/publication/360501260_Combined_scientific_CFD_simulation_and_interactive_raytracing_with_OpenCL). IWOCL'22: International Workshop on OpenCL, 3, 1-2, (2022)\n- Lehmann, M., Oehlschlägel, L.M., Häusl, F., Held, A. and Gekle, S.: [Ejection of marine microplastics by raindrops: a computational and experimental study](https://doi.org/10.1186/s43591-021-00018-8). Micropl.\u0026Nanopl. 1, 18, (2021)\n- Lehmann, M.: [High Performance Free Surface LBM on GPUs](https://doi.org/10.15495/EPub_UBT_00005400). Master's thesis, (2019)\n- Lehmann, M. and Gekle, S.: [Analytic Solution to the Piecewise Linear Interface Construction Problem and Its Application in Curvature Calculation for Volume-of-Fluid Simulation Codes](https://doi.org/10.3390/computation10020021). Computation, 10, 21, (2022)\n\n\n\n## Contact\n\n- FluidX3D is solo-developed and maintained by Dr. Moritz Lehmann.\n- For any questions, feedback or other inquiries, contact me at [dr.moritz.lehmann@gmail.com](mailto:dr.moritz.lehmann@gmail.com?subject=FluidX3D).\n- Updates are posted on Mastodon via [@ProjectPhysX](https://mast.hpc.social/@ProjectPhysX)/[#FluidX3D](https://mast.hpc.social/tags/FluidX3D) and on [YouTube](https://youtube.com/@ProjectPhysX).\n\n\n\n## Support\n\nI'm developing FluidX3D in my spare time, to make computational fluid dynamics lightning fast, accessible on all hardware, and free for everyone.\n- You can support FluidX3D by reporting any bugs or things that don't work in the [issues](https://github.com/ProjectPhysX/FluidX3D/issues). I'm welcoming feedback!\n- If you like FluidX3D, share it with friends and colleagues. Spread the word that CFD is now lightning fast, accessible and free.\n- If you want to support FluidX3D financially, you can [sponsor me on GitHub](https://github.com/sponsors/ProjectPhysX) or [buy me a coffee](https://buymeacoffee.com/projectphysx). Thank you!","funding_links":["https://github.com/sponsors/ProjectPhysX","https://buymeacoffee.com/projectphysx"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprojectphysx%2Ffluidx3d","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprojectphysx%2Ffluidx3d","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprojectphysx%2Ffluidx3d/lists"}