https://github.com/redhat-et/triton-cache-performance-comparison

amd-gpu cache cuda gpu nvidia-gpu performance rocm triton

Last synced: 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/redhat-et/triton-cache-performance-comparison
Owner: redhat-et
License: apache-2.0
Created: 2025-03-06T09:03:11.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-06T09:04:37.000Z (over 1 year ago)
Last Synced: 2025-03-06T10:22:36.580Z (over 1 year ago)
Topics: amd-gpu, cache, cuda, gpu, nvidia-gpu, performance, rocm, triton
Language: Python
Homepage:
Size: 338 KB
Stars: 0
Watchers: 5
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Triton Cache Performance Comparison

![Performance Plot](gpu_memory_usage_comparison_cuda.png)
*CUDA: Triton cache significantly improves startup performance*

![Performance Plot](gpu_memory_usage_comparison_rocm.png)
*ROCm: Triton cache significantly improves startup performance*

## Proof of Concept

This benchmark compares GPU memory usage and startup performance of Triton kernels in two scenarios:

1. **With Triton cache pre-loaded** - Cache exists from previous run
2. **Without Triton cache** - Clean cache state

Key findings:
- Triton cache significantly reduces startup time
- More consistent memory usage patterns with cached kernels
- Improved resource utilization during initial model loading

## Prerequisites

### Hardware Requirements
- NVIDIA GPU (CUDA) or AMD GPU (ROCm)

## Usage

### Basic Benchmark
```bash
./benchmark.sh --arch [cuda|rocm]
```

### Advanced Options
```bash
# Custom cache location and script
./benchmark.sh \
--arch cuda \
--triton-cache-dir ~/alternate_cache \
--script ./custom_script.py
```

### Expected Output
1. `gpu_usage_log.csv` - Time-series memory data
2. `gpu_memory_usage_comparison.png` - Visualization plot

## Technical Details

### Benchmark Process
1. **Cold Start** (no cache):
- Purge existing Triton cache
- Run script
- Log GPU memory at 1Hz frequency

2. **Warm Start** (with cache):
- Reuse generated kernels
- Run identical script
- Compare memory/time metrics

### Key Configuration
```bash
export TRITON_CACHE_DIR="~/.triton/cache" # Default cache location
```

## License
Apache 2.0 [LICENSE](LICENSE)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/redhat-et/triton-cache-performance-comparison

Awesome Lists containing this project

README