Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/hughperkins/cltorch-benchmarking

cltorch benchmarking, for evaluating where to focus optimization effort
https://github.com/hughperkins/cltorch-benchmarking

Last synced: 20 days ago
JSON representation

cltorch benchmarking, for evaluating where to focus optimization effort

Host: GitHub
URL: https://github.com/hughperkins/cltorch-benchmarking
Owner: hughperkins
License: bsd-2-clause
Created: 2015-07-11T23:26:16.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2015-07-22T00:20:36.000Z (over 9 years ago)
Last Synced: 2023-03-10T19:25:50.899Z (over 1 year ago)
Language: C++
Size: 191 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# cltorch-benchmarking
cltorch benchmarking, for evaluating where to focus optimization effort

This is cltorch-specific for now, though if someone wants to make it more general, I'm happy to change the name, eg to `torch-benchmarking` :-)

Current direction is to measure why [char-rnn](https://github.com/karpathy/char-rnn) runs really slowly, on opencl, on certain devices. Examples of things to check:
- is it because of kernel launch time?
- is it because of passing in structs?
- is it because of passing in non-const structs?
- is it because of all those dimension loops?
- is it because the various `reduceAll` calls are causing sync points?
- is it because of excessive sync points generally?

## Contents

* [test_launch](test_launch.cpp): measure kernel launch times, by adding 1 to a constant-sized array (about 100MB), and varying the number of kernel launches used
* [test_apply1](test_apply1.cpp): varies vector size, float vs float4. varies operation used, ie `+` vs `-`, `exp`, etc
* [test_apply1b](test_apply1b.cpp): varying operation, as test_apply1, but adds an additional temporary variable `out`
* [test_applystrided](test_applystrided.cpp): (in progress) mix up the memory access a bit, and/or add an inner loop over dimensions (tbd)

## To build

*pre-requisites:*
- [EasyCL](https://github.com/hughperkins/EasyCL) installed, using `make -j 4 install`, into ~/git/EasyCL/dist (ie install easycl, with a `CMAKE_INSTALL_PREFIX` of `[your home directory]/git/EasyCL/dist`
- cmake and ccmake installed
- gcc, g++ etc

*method*
```
git clone https://github.com/hughperkins/cltorch-benchmarking.git
cd cltorch-benchmarking
mkdir build
cd build
cmake ..
make -j 4
```