{"id":26167393,"url":"https://github.com/redhat-et/triton-cache-performance-comparison","last_synced_at":"2026-04-12T22:43:11.426Z","repository":{"id":280971617,"uuid":"943783572","full_name":"redhat-et/Triton-Cache-Performance-Comparison","owner":"redhat-et","description":null,"archived":false,"fork":false,"pushed_at":"2025-03-06T09:04:37.000Z","size":346,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-06T10:22:36.580Z","etag":null,"topics":["amd-gpu","cache","cuda","gpu","nvidia-gpu","performance","rocm","triton"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/redhat-et.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-06T09:03:11.000Z","updated_at":"2025-03-06T09:17:26.000Z","dependencies_parsed_at":"2025-03-06T10:22:47.407Z","dependency_job_id":"5eba51ab-b96d-4406-942f-d8d01f6713f3","html_url":"https://github.com/redhat-et/Triton-Cache-Performance-Comparison","commit_stats":null,"previous_names":["redhat-et/triton-cache-performance-comparison"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/redhat-et/Triton-Cache-Performance-Comparison","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/redhat-et%2FTriton-Cache-Performance-Comparison","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/redhat-et%2FTriton-Cache-Performance-Comparison/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/redhat-et%2FTriton-Cache-Performance-Comparison/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/redhat-et%2FTriton-Cache-Performance-Comparison/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/redhat-et","download_url":"https://codeload.github.com/redhat-et/Triton-Cache-Performance-Comparison/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/redhat-et%2FTriton-Cache-Performance-Comparison/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268689545,"owners_count":24291077,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-04T02:00:09.867Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["amd-gpu","cache","cuda","gpu","nvidia-gpu","performance","rocm","triton"],"created_at":"2025-03-11T17:35:36.989Z","updated_at":"2026-04-12T22:43:11.362Z","avatar_url":"https://github.com/redhat-et.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Triton Cache Performance Comparison\n\n![Performance Plot](gpu_memory_usage_comparison_cuda.png)  \n*CUDA: Triton cache significantly improves startup performance*\n\n![Performance Plot](gpu_memory_usage_comparison_rocm.png)  \n*ROCm: Triton cache significantly improves startup performance*\n\n## Proof of Concept\n\nThis benchmark compares GPU memory usage and startup performance of Triton kernels in two scenarios:\n\n1. **With Triton cache pre-loaded** - Cache exists from previous run\n2. **Without Triton cache** - Clean cache state\n\nKey findings:\n- Triton cache significantly reduces startup time\n- More consistent memory usage patterns with cached kernels\n- Improved resource utilization during initial model loading\n\n## Prerequisites\n\n### Hardware Requirements\n- NVIDIA GPU (CUDA) or AMD GPU (ROCm)\n\n## Usage\n\n### Basic Benchmark\n```bash\n./benchmark.sh --arch [cuda|rocm]\n```\n\n### Advanced Options\n```bash\n# Custom cache location and script\n./benchmark.sh \\\n  --arch cuda \\\n  --triton-cache-dir ~/alternate_cache \\\n  --script ./custom_script.py\n```\n\n### Expected Output\n1. `gpu_usage_log.csv` - Time-series memory data\n2. `gpu_memory_usage_comparison.png` - Visualization plot\n\n## Technical Details\n\n### Benchmark Process\n1. **Cold Start** (no cache):\n   - Purge existing Triton cache\n   - Run script\n   - Log GPU memory at 1Hz frequency\n\n2. **Warm Start** (with cache):\n   - Reuse generated kernels\n   - Run identical script\n   - Compare memory/time metrics\n\n### Key Configuration\n```bash\nexport TRITON_CACHE_DIR=\"~/.triton/cache\"  # Default cache location\n```\n\n## License\nApache 2.0 [LICENSE](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fredhat-et%2Ftriton-cache-performance-comparison","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fredhat-et%2Ftriton-cache-performance-comparison","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fredhat-et%2Ftriton-cache-performance-comparison/lists"}