Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-GPU
Awesome resources for GPUs
https://github.com/Jokeren/Awesome-GPU
Last synced: 2 days ago
JSON representation
-
Tools
-
Profilers
- **HPCToolkit**
- **NVIDIA Nsight Systems**
- **NVIDIA Nsight Compute**
- **SASSI**
- **NVBit**
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- **Vampir|Score-P**
- **TAU**
- **Open|SpeedShop**
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Exposing Hidden Performance Opportunities in High Performance GPU Applications
- Lynx: A dynamic instrumentation system for data-parallel applications on GPGPU architectures
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
- Monitoring Heterogeneous Applications with the OpenMP Tools Interface
- Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
-
Benchmarking
-
Models
- Instruction Roofline An insightful visual performance model for GPUs
- Performance Tuning of Scientific Codes with the Roofline Model
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
- Fundamental_Optimizations
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
-
Simulators
-
-
Architecture
-
Resources Management
- Dynamic Resource Management for Efficient Utilization of Multitasking GPUs
- Dynamic GPGPU Power Management Using Adaptive Model Predictive Control
- Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems
- Locality-Aware CTA Clustering for Modern GPUs
- Reducing Energy in GPGPUs through Approximate Trivial Bypassing
-
Parallelism
- Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls
- Controlled Kernel Launch for Dynamic Parallelism in GPUs
- LaPerm: Locality Aware Scheduler for Dynamic Parallelism on GPUs
- Virtual Thread Maximizing Thread-Level Parallelism beyond GPU Scheduling Limit
- Understanding Latency Hiding on GPUs
-
Cache
-
Memory
- Improving Inter-kernel Data Reuse With CTA-Page Coordination in GPGPU
- Umpire: Application-Focused Management and Coordination of Complex Hierarchical Memory
- Reducing GPU Offload Latency via Fine-Grained CPU-GPU Synchronization
- In-Depth Analyses of Unified Virtual Memory System for GPU Accelerated Computing
-
White Papers
-
-
Algorithms
-
BLAS
-
Stencils
-
Scans
-
-
Applications
-
Runtime
-
Code Generation
-
Compilers
-
Programming Models
-
Profile Guided Optimization
-
Binaries
-