Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jsyiek/parallang

Scalable parallel computer simulator
https://github.com/jsyiek/parallang

concurrency dsl parallang parallelism scala

Last synced: 4 months ago
JSON representation

Scalable parallel computer simulator

Host: GitHub
URL: https://github.com/jsyiek/parallang
Owner: jsyiek
License: mit
Created: 2024-05-10T13:27:59.000Z (9 months ago)
Default Branch: main
Last Pushed: 2024-06-21T23:49:43.000Z (8 months ago)
Last Synced: 2024-10-01T15:59:23.403Z (4 months ago)
Topics: concurrency, dsl, parallang, parallelism, scala
Language: Scala
Homepage:
Size: 45.4 MB
Stars: 7
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # The Parallang Simulator

## What is this?

The *Parallang Simulator* is a scalable platform-independent solution to simulate parallel message-passing algorithms and evaluate their relative costs. 

*Key features*:

- **Parallang DSL**: The simulator is equipped with a strongly typed interpreted programming language with a C-like syntax called Parallang. Parallang is a portmanteau of "parallel" and "language."

- **Messages**: Through simple `send x -> worker i, j` and `recv x <- worker i, j` instructions, parallel algorithms on the simulator can send messages between processors. Message forwarding is implicitly performed, and the costs are modeled.

- **Data cache model**: The simulator can use an arbitrary user-specified data cache hierarchy. Programmed replacement strategies include LRU, tree-PLRU, and NLU.

- **Latency and memory estimates**: The system estimates running time and tracks memory usage throughout program execution, providing the user with this information at the end.

- **Communication environments**: A modular parameter set system can define communication costs, modeling anything from a multicore system-on-chip to a platform distributed over the internet.

- **Events system**: Latency and memory are tracked through *events* emitted throughout program execution. Users can add real-time event listeners to process these and gauge metrics of interest.

- **Output visualization and graph compression**: Several Jupyter Notebooks provide *graph compression* and *evaluation graph* functionality to visualize the simulator's output.

## What is a parallel algorithm?

To answer this, let's define the *parallel computer abstraction*. Conceptually, a parallel computer is a group of *processing elements* (PEs) linked together in a *message-passing topology*. Each PE executes a sequential algorithm independently and may send *data* directly to any connected PEs during execution: we call this a message. 

A *parallel algorithm* is any algorithm that uses some coordination technique, such as message-passing, to distribute computation tasks over PEs and achieve higher aggregate performance than its sequential counterpart. 

## Tutorials

These tutorials assume you have a *Parallang program* you want to run. If you do not, write one! See the sample code below and in `src/main/parallang`.

### Defining a simple simulation: a bus topology

A simulation is defined in Scala using a straightforward API. Before any simulation, you must set the number of PEs and define the connections. PEs are assumed to be arranged in a 2D grid for ease of use.

```scala

import uk.ac.cam.crsid.Parallang.Interpreter.CommunicationModel.InterconnectionNetwork

object TutorialSimulation {

    def main(args: Array[String]): Unit = {

        val pathToProgram = ...

        // Set up a 10x1 topology: a bus!

        InterconnectionNetwork.resetLengthWidth(10, 1)

        for (i <- 0 until 9) {

            InterconnectionNetwork.addLink((i, 1), (i+1, 1))

            InterconnectionNetwork.addLink((i+1, 1), (i, 1))

        }

        InterconnectionNetwork.launchWith(pathToProgram)

    }

}

```

### Changing the parameter set

By default, the Parallang Simulator uses the `MulticoreComputer` parameter set. Other parameter sets, such as `Datacenter` or `HighPowerInternet`, can be used. Users may also define their own by extending the `LatencyParameterSet` trait.

```scala

import uk.ac.cam.crsid.Parallang.Interpreter.CommunicationModel.{InterconnectionNetwork, HighPowerInternet}

InterconnectionNetwork.setLatencyParameterSet(HighPowerInternet)

```

### Defining an event listener

TBD

### Defining a cache hierarchy

TBD

### Communicating with a running program

TBD

## Sample code in Parallang

Here is an implementation of the Fox-Otto algorithm for solving APSP.

```

fn foxs_general(a: array[array[int]], b: array[array[int]], c: array[array[int]], maxIt: int) -> array[array[int]] {

    var southNeighbor: int = mod(myX+1, q);

    var northNeighbor: int = mod(myX+q-1, q);

    var p: array[array[int]] = array[array[int]](len(a), array[int](len(a[0]), 0));

    var pPrev: array[array[int]] = array[array[int]](len(a), array[int](len(a[0]), 0));

    for (var r: int = 0; r < len(a); r = r + 1) {

        for (var c: int = 0; c < len(a[0]); c = c + 1) {

            p[r][c] = c+scaleFactor_global*myY;

            pPrev[r][c] = c+scaleFactor_global*myY;

        }

    }

    for (var r: int = 0; r < len(a); r = r + 1) {

        for (var col: int = 0; col < len(a[0]); col = col + 1) {

            c[r][col] = a[r][col];

            if (a[r][col] != inf) {

                p[r][col] = r+scaleFactor_global*myX;

                pPrev[r][col] = r+scaleFactor_global*myX;

            }

        }

    }

    for (var it: int = 0; it < maxIt; it = it + 1) {

        printIfMain(-1*it);

        for (var k: int = 0; k < q; k = k + 1) {

            printIfMain(k);

            var bCastProc: int = mod(myX+k, q);

            if (myX == 0) {

            }

            if (bCastProc == myY) {

                send a -> broadcast_row;

                matSquareWithPredecessor(a, b, c, p, pPrev);

            } else {

                recv[array[array[int]]] tempBlock <- worker myX, bCastProc;

                matSquareWithPredecessor(tempBlock, b, c, p, pPrev);

            }

            send b -> worker northNeighbor, myY;

            send pPrev -> worker northNeighbor, myY;

            recv b <- worker southNeighbor, myY;

            recv pPrev <- worker southNeighbor, myY;

        }

        var temp: array[array[int]] = pPrev;

        pPrev = p;

        p = temp;

        for (var r: int = 0; r < len(a); r = r + 1) {

            for (var col: int = 0; col < len(a[0]); col = col + 1) {

                p[r][col] = pPrev[r][col];

                a[r][col] = c[r][col];

            }

        }

        b = a;

    }

```

## Origins

This code was produced for my undergraduate dissertation for the *Computer Science* tripos at the *University of Cambridge*: "Evaluation of parallel routing algorithms." 

I examined five parallelizations of solutions to the *all-pairs shortest paths* problem: Cannon's algorithm and the Fox-Otto algorithm for min-plus matrix exponentiation, the Floyd-Warshall algorithm, the distance vector algorithm, and the Bellman-Ford algorithm. The simulator was created to accomplish analysis at scale and be free of the variability in the underlying host computer. My results used road-network datasets and demonstrated that *superlinear speedup* could be achieved for the first three, that the Bellman-Ford algorithm was the fastest, and that the distance vector algorithm is abysmally inefficient. 

My supervisor was *Dr Jagdish Modi*, who suggested the project. I achieved a high class I mark on this project.