Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/argonne-lcf/balsam-serial-mode-profiling
Contains application tools and scripts for measuring balsam performance in serial mode.
https://github.com/argonne-lcf/balsam-serial-mode-profiling
Last synced: 3 months ago
JSON representation
Contains application tools and scripts for measuring balsam performance in serial mode.
- Host: GitHub
- URL: https://github.com/argonne-lcf/balsam-serial-mode-profiling
- Owner: argonne-lcf
- License: mit
- Created: 2020-08-19T17:02:09.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2020-08-24T17:05:21.000Z (about 4 years ago)
- Last Synced: 2024-07-04T02:15:53.707Z (4 months ago)
- Language: Python
- Size: 26.4 KB
- Stars: 1
- Watchers: 8
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# balsam-serial-mode-profiling
Contains application tools and scripts for measuring balsam performance in serial mode.# Timing Definitions
For each Job, we will measure the following timestamps:
| Timestamp | Description |
| --------- | ----------- |
| `t_0` | *Balsam worker start:* the launcher log timestamp immediately before worker Popens the job |
| `t_1` | *RUNNING database time:* when job recorded as `RUNNING` in database |
| `t_2` | *Application start:* timestamp emitted by application upon startup |
| `t_3` | *Application end:* timestamp emitted by application at the end |
| `t_4` | *Balsam worker end:* the launcher log timestamp immediately after Popen polls return 0 |
| `t_5` | *RUN_DONE database time* the timestamp when `RUN_DONE` is recorded in database |
| `t_err` | *Balsam worker error* launcher log timestamp when nonzero return code is polled |`t3 - t2` is the *inner app runtime*: how long the app takes to run measured purely by the application itself.
This time delta is important in case applications are running intrinsically slower at scale.`t2 - t0` is the *Popen start delay*: All the time between `t0` and `t2` is spent in Popen; a large delay here
indicates that subprocess `fork()` and `exec()` is taking a long time.`t4 - t3` is the *Popen end delay*: A large delay here indicates excessive lag time between the application end and when a return code is propagated back to the Popen object.
# First Steps
This is meant to be run on Theta. First, run the script to build a virtual env for these tests, using the latest balsam serial mode:
```bash
/balsam-installer.sh /path/to/balsam-serial-tests-venv/
```If you get an error like `fatal: destination path 'mpi4py' already exists and is not an empty directory.` you probably are re-running the script. It caches and builds in the /tmp area, so remove the folder here:
```bash
rm -r /tmp/$(whoami)/balsam-install
```Once it completes (it builds mpi4py, it takes a few minutes), you can activate the environment with:
```bash
source /path/to/balsam-serial-tests-venv/bin/activate
```On future logins, you can set the virtual env and all modules needed with:
```bash
source env_setup.sh /path/to/balsam-serial-tests-venv/
```## Create a balsam DB:
The balsam database is created and initialized like this:
```bash
balsam init /path/to/balsam-serial-tests-db/
```Once initialized, activate the database with:
```bash
source balsamactivate /path/to/balsam-serial-tests-db/
```## Initializing applications
This repo contains several applications:
- Python-based array addition
- Singularity based C++ simulation code
- An empty bash script that sleeps for 60 secondsYou can initialize all of them in your database, once activated, with the script in `applications/initialize_apps.sh`
## Initializing workloads
We plan to scan over workloads as a function of node packing count and job size (n_nodes). To facilitate sorting these and loading jobs into the DB, several scripts are provided for creating these workflows and their jobs:
```bash
# Add the empty app with 4 nodes, 16 ranks per node:
python workflows/add_workflow_empty_app.py -n 4 -npc 16# Add the array_add app with 32 nodes, 32 ranks per node:
python workflows/add_workflow_array_add.py -n 32 -npc 32# Add the cosmics_gen_stage app with 256 nodes, 1 rank per node:
python workflows/add_workflow_cosmics_gen_stage.py -n 256 --node-packing-count 1
```Each of these apps will produce at the end the correct submit-launch command with balsam, though you may want/need to change the allocation.