https://github.com/feifeibear/swdnn
a highly-efficient library for deep neural networks based on Sunway TaihuLight supercomputer.
https://github.com/feifeibear/swdnn
Last synced: 6 months ago
JSON representation
a highly-efficient library for deep neural networks based on Sunway TaihuLight supercomputer.
- Host: GitHub
- URL: https://github.com/feifeibear/swdnn
- Owner: feifeibear
- Created: 2018-01-22T22:38:38.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-09-03T12:19:26.000Z (almost 8 years ago)
- Last Synced: 2025-01-23T00:41:15.527Z (over 1 year ago)
- Language: Roff
- Size: 136 KB
- Stars: 15
- Watchers: 3
- Forks: 9
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight Supercomputer
This repo contains code for the paper [swDNN: A Library for Accelerating Deep Learning Applications on
Sunway TaihuLight](https://fangjiarui.github.io/assets/pdf/swdnn-ipdps-2017.pdf)
We report our work on swDNN, which is a highly- efficient library for accelerating deep learning applications. We derive a performance model that guides us in the process of identifying the most suitable approach for mapping the convolutional neural networks (CNNs) onto bottom hardware. By performing a systematic optimization that explores major factors, such as organization of convolution loops, blocking techniques, register data communication schemes, as well as reordering strategies for the two pipelines of instructions, we manage to achieve a double-precision performance over 1.6 Tflops for the convolution kernel, achieving 54% of the theoretical peak. Compared with Tesla K40m with cuDNNv5, swDNN results in 1.91-9.75x performance speedup in an evaluation with over 100 parameter configurations.
## Directories
swCNNv10-4cg-image-size-aware: contains code for Algorithm 1 in the paper.
swCNNv11-4cg-opt-image-size-aware: contains code for Algorithm 1 in the paper after promoting the DMA operation to
outer loop.
swCNNv13-4cg-batch-size-aware: contains code for Algorithm 2 in the paper.
swCNNv12-4cg-opt-batch-size-aware: contains code for Algorithm 2 in the paper after promoting the DMA operation to
outer loop.
swCNNv14-4cg-batch-size-aware-all-asm: contains code for Algorithm 1 with all CPE code in ASM.
## citation
Fang J, Fu H, Zhao W, et al. swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight[C]//Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International. IEEE, 2017: 615-624.
## Usage
We have already integrated swDNN into swCaffe, which is a deep learning framework on SW26010 based by Caffe with MPI supported.
Please refer [swutil directory in swCaffe](https://github.com/feifeibear/SWCaffe/tree/master/src/caffe/swutil/slave) for more derail.
## contact
Jiarui Fang (fang_jiarui@163.com)