Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/harrism/mini-nbody
A simple gravitational N-body simulation in less than 100 lines of C code, with CUDA optimizations.
https://github.com/harrism/mini-nbody
astrophysics benchmark cuda nbody
Last synced: 3 months ago
JSON representation
A simple gravitational N-body simulation in less than 100 lines of C code, with CUDA optimizations.
- Host: GitHub
- URL: https://github.com/harrism/mini-nbody
- Owner: harrism
- License: apache-2.0
- Created: 2014-02-20T03:37:14.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2014-02-21T02:53:44.000Z (almost 11 years ago)
- Last Synced: 2024-10-14T14:39:17.766Z (4 months ago)
- Topics: astrophysics, benchmark, cuda, nbody
- Language: C
- Size: 131 KB
- Stars: 99
- Watchers: 5
- Forks: 27
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
mini-nbody: A simple N-body Code
================================A simple gravitational N-body simulation in less than 100 lines of C code, with CUDA optimizations.
Benchmarks
----------There are 5 different benchmarks provided for CUDA and MIC platforms.
1. nbody-orig : the original, unoptimized simulation (also for CPU)
2. nbody-soa : Conversion from array of structures (AOS) data layout to structure of arrays (SOA) data layout
3. nbody-flush : Flush denormals to zero (no code changes, just a command line option)
4. nbody-block : Cache blocking
5. nbody-unroll / nbody-align : platform specific final optimizations (loop unrolling in CUDA, and data alignment on MIC)Files
-----nbody.c : simple, unoptimized OpenMP C code
timer.h : simple cross-OS timing codeEach directory below includes scripts for building and running a "shmoo" of five successive optimizations of the code over a range of data sizes from 1024 to 524,288 bodies.
cuda/ : folder containing CUDA optimized versions of the original C code (in order of performance on Tesla K20c GPU)
1. nbody-orig.cu : a straight port of the code to CUDA (shmoo-cuda-nbody-orig.sh)
2. nbody-soa.cu : conversion to structure of arrays (SOA) data layout (shmoo-cuda-nbody-soa.sh)
3. nbody-soa.cu + ftz : Enable flush denorms to zero (shmoo-cuda-nbody-ftz.sh)
4. nbody-block.cu : cache blocking in CUDA shared memory (shmoo-cuda-nbody-block.sh)
5. nbody-unroll.cu : addition of "#pragma unroll" to inner loop (shmoo-cuda-nbody-unroll.sh)mic/ : folder containing Intel Xeon Phi (MIC) optimized versions of the original C code (in order of performance on Xeon Phi 7110P)
1. ../nbody-orig.cu : original code (shmoo-mic-nbody-orig.sh)
2. nbody-soa.c : conversion to structure of arrays (SOA) data layout (shmoo-mic-nbody-soa.sh)
3. nbody-soa.cu + ftz : Enable flush denorms to zero (shmoo-mic-nbody-ftz.sh)
4. nbody-block.c : cache blocking via loop splitting (shmoo-mic-nbody-block.sh)
5. nbody-align.c : aligned memory allocation and vector access (shmoo-mic-nbody-align.sh)