https://github.com/mengke-mk/ring
RING is a ~5k lines pure C++11 runtime system for scaling irregular applications in PGAS programming praradigm.
https://github.com/mengke-mk/ring
graph numa pgas rdma runtime
Last synced: 8 months ago
JSON representation
RING is a ~5k lines pure C++11 runtime system for scaling irregular applications in PGAS programming praradigm.
- Host: GitHub
- URL: https://github.com/mengke-mk/ring
- Owner: mengke-mk
- Created: 2017-07-01T02:59:34.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2017-07-01T03:04:30.000Z (almost 9 years ago)
- Last Synced: 2024-12-28T04:17:04.160Z (over 1 year ago)
- Topics: graph, numa, pgas, rdma, runtime
- Language: C
- Homepage:
- Size: 555 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Intro
RING is a **~5k lines** pure C++11 runtime system for scaling irregular applications in PGAS programming paradigm.
## Dependences
You must have a 64-bit Linux system with the following installed to build RING.
- Build
- CMake >= 2.8.12
- Compiler
- GCC > 4.8
- External:
- Infiniband Verbs
- MPI
- pthread
- libnuma
- optional
- gperftool
## Quick Start
Like Grappa, Ring is primarly designed to be a "global view" model, whcih means that rather than coordinating where all parallel SPMD processes are at and how they devide up the data, the programmer is encourage to think of the system **as a large single shared memory**.
### Section 1 : Hello world
You can use `run()` to capture the whole process you want to run. Use `all_do()` to spawn the same task on all cores just like what you do in SPMD. Use `call()` to perform a remote process call.
`void run([]{ /*your codes*/ });`
`void all_do([]{ /*your works*/ });`
`void call(GlobalAddress gt, [](T* t){} );`
`auto f = call(GlobalAddress gt, [](T* t){} );`
`T result = f.get() // this will block explicitly just like C++11 future`
`void call(int which_core, [](){} );`
`auto f = call(int which_core, [](){} );`
`T result = f.get() // this will block explicitly just like C++11 future`
```c++
#include
#include
#include
#include "ring.hpp"
int x;
int main(int argc, char* argv[]){
RING_Init(&argc, &argv);
run([]{
all_do([]{
x=0;
if(thread_rank() == 0){
auto gx = make_global(0, 1, &x);
call(gx, [](int *x){
sync_printf("hello world", (*x));
});//call
}
});//all_do
});//run
RING_Finalize();
return 0;
}
```
### Section 2 : Global memory
In Ring, all memory on all cores is addressable by any other core in spirit of **PGAS** programming paradigm.
You can allocate a global array by `gmalloc()`, free it by `gfree()`.
`auto A = gmalloc(size_t size);//only can be called inner run()`
`free(A);`
```c++
run([]{
auto x = gmalloc(1<<20);
auto y = gmalloc(1<<20);
auto x1 = x + 230;
auto y1 = y + 333;
call(x1, [](int * w){
sync_printf("hello pgas x", id());
});
call(y1, [](int * w){
sync_printf("hello pgas y", id());
});
/* free request will be executed after rpc is executed. */
gfree(x);
gfree(y);
});
```
### Section 3 : parallel for
Instead of spawning tasks individually, it's almost always better to use a parallel loops of some sort. you can use `pfor()` to spwan loop iterations recursively untill hitting a threshold.
`void pfor(Array A, int s, int e, [](T* t){})`
`void pfor(Array A, int s, int e, [](i, T* t){})`
```c++
run([]{
auto A = gmalloc(1<<20);
pfor(A, 0, 1024, [](int* t){
auto a = id();
call(1, [a]{
sync_printf(id());
});//call
});//pfor
gfree(std::move(A));
});//run
```
## Benchmarks
- GUPS (Ring vs. Grappa)
- 1.23e8 vs. 8.04e7 (UPS), 120 core do 1<<20 updates
- 5.30e7 vs. 5.12e7 (UPS), 12 core do 1<<20 updates
- 2.36e7 vs. 2.10e7 (UPS), 1 core do 1<<20 updates
- Baseline 2.62e8 (UPS), 1 core do 1<<20 updates.
- Graph500 (Ring bs. Grappa)
- 1.59e8 vs. 9.7e7 (TEPS), 120 core do scale=22 BFS. (10 nodes)
- 1.55e7 vs 1.64e7 (TEPS), 12 core do scale=22 BFS. (1 node)