Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hughperkins/cltorch-benchmarking
cltorch benchmarking, for evaluating where to focus optimization effort
https://github.com/hughperkins/cltorch-benchmarking
Last synced: 20 days ago
JSON representation
cltorch benchmarking, for evaluating where to focus optimization effort
- Host: GitHub
- URL: https://github.com/hughperkins/cltorch-benchmarking
- Owner: hughperkins
- License: bsd-2-clause
- Created: 2015-07-11T23:26:16.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2015-07-22T00:20:36.000Z (over 9 years ago)
- Last Synced: 2023-03-10T19:25:50.899Z (over 1 year ago)
- Language: C++
- Size: 191 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# cltorch-benchmarking
cltorch benchmarking, for evaluating where to focus optimization effortThis is cltorch-specific for now, though if someone wants to make it more general, I'm happy to change the name, eg to `torch-benchmarking` :-)
Current direction is to measure why [char-rnn](https://github.com/karpathy/char-rnn) runs really slowly, on opencl, on certain devices. Examples of things to check:
- is it because of kernel launch time?
- is it because of passing in structs?
- is it because of passing in non-const structs?
- is it because of all those dimension loops?
- is it because the various `reduceAll` calls are causing sync points?
- is it because of excessive sync points generally?## Contents
* [test_launch](test_launch.cpp): measure kernel launch times, by adding 1 to a constant-sized array (about 100MB), and varying the number of kernel launches used
* [test_apply1](test_apply1.cpp): varies vector size, float vs float4. varies operation used, ie `+` vs `-`, `exp`, etc
* [test_apply1b](test_apply1b.cpp): varying operation, as test_apply1, but adds an additional temporary variable `out`
* [test_applystrided](test_applystrided.cpp): (in progress) mix up the memory access a bit, and/or add an inner loop over dimensions (tbd)## To build
*pre-requisites:*
- [EasyCL](https://github.com/hughperkins/EasyCL) installed, using `make -j 4 install`, into ~/git/EasyCL/dist (ie install easycl, with a `CMAKE_INSTALL_PREFIX` of `[your home directory]/git/EasyCL/dist`
- cmake and ccmake installed
- gcc, g++ etc*method*
```
git clone https://github.com/hughperkins/cltorch-benchmarking.git
cd cltorch-benchmarking
mkdir build
cd build
cmake ..
make -j 4
```