Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yalue/cuda_scheduling_examiner_mirror
A tool for examining GPU scheduling behavior.
https://github.com/yalue/cuda_scheduling_examiner_mirror
benchmark cuda cuda-kernels gpu gpu-scheduling mandelbrot
Last synced: 1 day ago
JSON representation
A tool for examining GPU scheduling behavior.
- Host: GitHub
- URL: https://github.com/yalue/cuda_scheduling_examiner_mirror
- Owner: yalue
- License: other
- Created: 2017-03-29T19:36:02.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2024-08-17T00:37:23.000Z (6 months ago)
- Last Synced: 2025-01-21T11:12:00.525Z (10 days ago)
- Topics: benchmark, cuda, cuda-kernels, gpu, gpu-scheduling, mandelbrot
- Language: Cuda
- Homepage:
- Size: 51.7 MB
- Stars: 71
- Watchers: 11
- Forks: 18
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
CUDA Scheduling Viewer
======================About
-----This project was intended to provide a tool for examining block-level
scheduling behavior and coscheduling performance on CUDA devices. The tool is
capable of running any benchmark which can be self-contained in a shared
library file exporting specific functions. Currently, this tool only runs under
Linux, and is unlikely to support other systems in the future.To cite this work in academic use, either link to this repository or cite the
[original paper for which it was created](https://cs.unc.edu/~anderson/papers/ospert17.pdf).
```
@inproceedings{otterness2017inferring,
title={Inferring the Scheduling Policies of an Embedded {CUDA} {GPU}},
author={Otterness, Nathan and Yang, Ming and Amert, Tanya and Anderson, James H. and Smith, F. D.},
booktitle={Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT)},
year={2017}
}
```If using SM/TPC partitioning, please cite the
[paper for which it was created](https://cs.unc.edu/~jbakita/rtas23.pdf).
```
@inproceedings{bakita2023hardware,
title={Hardware Compute Partitioning on {NVIDIA} {GPUs}},
author={Bakita, Joshua and Anderson, James H},
booktitle={Proceedings of the 29th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)},
year={2023},
}
```For Users of AMD GPUs
---------------------For users of AMD GPUs, or those willing to give up some useful CUDA-specific
features, we developed a port of this project in
the [HIP](https://github.com/ROCm-Developer-Tools/HIP) language. This project
can be found at [https://github.com/yalue/hip_plugin_framework](https://github.com/yalue/hip_plugin_framework).
`hip_plugin_framework` remains nearly identical to `cuda_scheduling_examiner`,
but with some cleaned-up code, more consistent naming conventions, and,
unfortunately, lacking in ability to detect the SMs that blocks are assigned
to, as such a feature is not portable to HIP.Compilation
-----------This tool can only be run on a computer with a CUDA-capable GPU and with CUDA
installed. The `nvcc` command must be available on your PATH. The tool has not
been tested with devices earlier than compute capability 5.0 or CUDA versions
earlier than 9.0. GCC version 4.9 or later is required.Earlier versions of the tool, developed for devices with compute capability 3.0
or CUDA versions 8.0 or earlier, is available by checking out the `older_cuda`
git tag.To build, clone the repository, `cd` into it, and run `make`.
In order to use SM/TPC partitioning (the `sm_mask` field documented below),
please install [libsmctrl](http://rtsrv.cs.unc.edu/cgit/cgit.cgi/libsmctrl.git/)
and set `LIBSMCTRL_PATH` to the library's location in this project's Makefile.Usage
-----The tool must be provided a JSON configuration file, which will contain
information about which benchmark libraries to run, how to run them, and what
parameters to provide. The file `configs/simple.json` has been provided as a
minimal example, running one instance of the `mandelbrot.so` benchmark. To run
it:```bash
./bin/runner ./configs/simple.json
```Additionally, the character `-` may be used in place of a config file name, in
which case the tool will attempt to read a JSON configuration object from
stdin. The file will be read completely before any benchmarks begin execution.Some scripts have been included to visualize results. They require python,
numpy, and matplotlib. All such scripts are located in the scripts directory.
For example:```bash
# Run all known configurations
find configs/*.json -exec ./bin/runner {} \;# Visualize the scheduling timelines for each scenario
python scripts/view_timelines.py# View the execution timeline of each block
python scripts/view_blocksbysm.py
```To only plot a subset of the results, many of the aforementioned scripts support
explicitly specifying which output files to plot.
For example:```bash
# Plot all results of the memset_doesnt_block.json configuration
python scripts/view_blocksbysm.py ./results/test_blocking_memset*
```Configuration Files
-------------------The configuration files specify parameters passed to each benchmark along with
some global settings for the entire program.The layout of each configuration file is as follows:
```
{
"name": ,
"max_iterations": ,
"max_time": ,
"use_processes":
"cuda_device": ,
"base_result_directory": ,
"pin_cpus": ,
"do_warmup": ,
"sync_every_iteration": ,
"benchmarks": [
{
"filename": ,
"log_name": ,
"mps_thread_percentage": ,
"label:": ,
"thread_count": ,
"block_count": ,
"data_size": ,
"sm_mask": ,
"additional_info": ,
"max_iterations": ,
"max_time": ,
"release_time": ,
"cpu_core":
"stream_priority":
}
]
}
```Additionally, benchmark configurations support the insertion of comments via
the usage of "comment" keys, which will be ignored at runtime.Automatic Benchmark Generation
------------------------------The script located in `scripts/multikernel_generator.py` illustrates how
config generation can be scripted. To run a scenario automatically generated by
this script, run the following command (after running `make`):```bash
python scripts/multikernel_generator.py | ./bin/runner -
```Output File Format
------------------Each benchmark, when run, will generate a JSON log file at the location
specified in the configuration. If the benchmark did not complete successfully,
the JSON file may be in an invalid state. Times will be recorded as
floating-point numbers of seconds. The format of the log file is:```
{
"scenario_name": "",
"benchmark_name": "",
"label": "",
"max_resident_threads": ,
"data_size": ,
"release_time": ,
"PID": ,
"TID": ,
"times": [
{},
{
"cpu_times": [
,
],
"copy_in_times": [
,
],
"execute_times": [
,
],
"copy_out_times": [
,
]
},
{
"kernel_name": ,
"block_count": ,
"thread_count": ,
"shared_memory": ,
"cuda_launch_times": [,
,
],
"block_times": [, , ...],
"block_smids": [, , ...],
"cpu_core":
},
...
]
}
```Notice that the first entry in the "times" array will be blank and should be
ignored. The times array will contain two types of objects: one will contain
CPU times and one type will apply to kernel times. An object containing CPU
times will contain a `"cpu_times"` key. A single CPU times object will
encompass all kernel times following it, up until another CPU times object.Creating New Benchmarks
-----------------------Each benchmark must be contained in a shared library and abide by the interface
specified in `src/library_interface.h`. In particular, the library must export
a `RegisterFunctions` function, which provides the addresses of further
functions to the calling program. Benchmarks, preferably, should never use
global state and instead use the `user_data` pointer returned by the
initialize function to track all state. Global state may function if only one
instance of each benchmark is run at a time, but this will never be a
limitation of the default benchmarks included in this project. All benchmarks
must use a user-created CUDA stream in order to avoid unnecessarily blocking
each other.The most important piece of information that each benchmark provides is the
`TimingInformation` struct, filled in during the `copy_out` function of each
benchmark. This struct will contain a list of `KernelTimes` structs, one for
each kernel invocation called during `execute`. Each `KernelTimes` struct will
contain the kernel start and end times, individual block start and end times,
and a list of the SM IDs to which blocks were assigned. The benchmark is
responsible for ensuring that the buffers provided in the TimingInformation
struct remain valid at least until another benchmark function is called. They
will not be freed by the caller.In general, the comments in `library_interface.h` provide an explanation for
the actions that every library-provided function is expected to carry out.
The existing libraries in `src/mandelbrot.cu` and `src/timer_spin.cu` provide
examples of working library implementations.In addition to `library_interface.h`, `benchmark_library_funcions.h/cu` define
a library of utility functions that may be shared between benchmarks.Benchmark libraries are invoked by the master process as follows:
1. The shared library file is loaded using the `dlopen()` function, and the
`RegisterFunctions` function is located using `dlysym()`.2. Depending on the configuration, either a new process or new thread will be
created for each benchmark.3. In its own thread or process, the benchmark's `initialize` function will
be called, which should allocate and initialize all of the local state
necessary for one instance of the benchmark.4. When the benchmark begins running, a single iteration will consist of the
benchmark's `copy_in`, `execute`, and `copy_out` functions being called, in
that order.5. When enough time has elapsed or the maximum number of iterations has been
reached, the benchmark's `cleanup` function will be called, to allow for
the benchmark to clean up and free its local state.Coding Style
------------Even though CUDA supports C++, contributions to this project should use the C
programming language when possible. C or CUDA source code should adhere to the
parts of the [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html)
that apply to the C language.Scripts should remain in the `scripts/` directory and should be written in
python when possible. For now, there is no explicit style guide for python
scripts apart from trying to maintain a consistent style within each file.