{"id":31505645,"url":"https://github.com/intelav/gpu-agent-opt","last_synced_at":"2026-04-19T02:01:39.131Z","repository":{"id":317694032,"uuid":"1068449648","full_name":"intelav/gpu-agent-opt","owner":"intelav","description":"AI Agent Framework for GPU Kernel Autotuning \u0026 Optimization. Automate CUDA kernel exploration, profiling, and tuning with AI-driven agents for deep learning, geospatial AI, and HPC workloads.","archived":false,"fork":false,"pushed_at":"2025-10-02T12:08:38.000Z","size":10,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-02T14:18:20.137Z","etag":null,"topics":["ai-agents","autotuning","cuda","deep-l","edge-ai","geospatial","gpu","hpc","nvidia","optimization","performance","pytorch"],"latest_commit_sha":null,"homepage":"https://aifusion.in","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/intelav.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-02T12:02:58.000Z","updated_at":"2025-10-02T12:13:43.000Z","dependencies_parsed_at":"2025-10-02T14:18:37.285Z","dependency_job_id":"8bcb62be-21de-4af8-84e6-5df9b7235ce4","html_url":"https://github.com/intelav/gpu-agent-opt","commit_stats":null,"previous_names":["intelav/gpu-agent-opt"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/intelav/gpu-agent-opt","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intelav%2Fgpu-agent-opt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intelav%2Fgpu-agent-opt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intelav%2Fgpu-agent-opt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intelav%2Fgpu-agent-opt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/intelav","download_url":"https://codeload.github.com/intelav/gpu-agent-opt/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intelav%2Fgpu-agent-opt/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278493798,"owners_count":25996410,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","autotuning","cuda","deep-l","edge-ai","geospatial","gpu","hpc","nvidia","optimization","performance","pytorch"],"created_at":"2025-10-02T20:08:58.394Z","updated_at":"2025-10-08T17:07:27.985Z","avatar_url":"https://github.com/intelav.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🧠 **gpu-agent-opt**\n\n**Unified AI Agent Framework for GPU Kernel Profiling, Scientific Computing, and CUDA Exploration**\n\n`gpu-agent-opt` is a Python package designed to orchestrate **agentic workflows** for **Triton, CUDA, CuPy, cuDF**, and advanced GPU programming patterns — combining **kernel discovery**, **profiling**, and **analysis** with a knowledge-driven loop:\n\n👉 **Sense → Think → Act → Learn → Reflect**\n\nThe current focus is to build a **one-stop GPU research \u0026 profiling layer** that integrates:\n- Deep learning graph compilers (PyTorch Inductor / XLA)  \n- Scientific computing (CuPy / cuDF)  \n- Low-level CUDA primitives (e.g., coalesced memory, warp shuffle, tensor cores)\n\n---\n\n## ✨ **Core Capabilities**\n\n### 🧠 Agentic Kernel Profiler\n- Discovers active GPU kernels during script execution using **Nsight Systems**.  \n- Selects top kernels for detailed **Nsight Compute** profiling.  \n- Generates structured summary reports (JSON) with SM and DRAM efficiency metrics.\n\n### 🧪 Multi-Backend Context\n- ✅ **Triton kernels** (via PyTorch Inductor or custom)  \n- ✅ **Raw CUDA kernels** (NVRTC / PyCUDA / C++ extensions)  \n- ✅ **CuPy \u0026 cuDF** scientific kernels  \n- 🚧 **Planned:** CUDA Graphs, Cooperative Groups, Tensor Cores, async copies, MIG.\n\n### 🔬 Profiler Integration\n- Nsight Systems → Kernel discovery  \n- Nsight Compute → Per-kernel profiling (SM \u0026 DRAM metrics)  \n- Exports both per-kernel CSV and aggregated `summary.json`.\n\n### 📚 Knowledge Base / Reflection\n- `reflect_history.json` stores efficiency trends across runs.  \n- Helps identify consistently low-performing kernels over time.\n\n---\n\n## 🛰 **Target Use Cases**\n- Geospatial AI auto-annotation pipelines (DINOv2, SAM2, YOLO, NDWI/LBP preprocessing)  \n- Deep learning inference/training profiling through PyTorch + Nsight  \n- Scientific/HPC workloads (FFT, FDTD3D, conjugate gradient, Monte Carlo, etc.)  \n- CUDA educational benchmarking (transpose, reduction, memory hierarchy, etc.)  \n- Embedded GPU pipelines (Jetson Orin / RB5)\n\n---\n\n## 📊 **Agentic Profiling Snapshot**\n\nThe framework executes a **five-stage loop** to profile real GPU workloads:\n\n| Stage   | Description                     |\n|---------|----------------------------------|\n| Sense   | Discover kernels                |\n| Think   | Select top kernels              |\n| Act     | Profile with Nsight Compute     |\n| Learn   | Analyze \u0026 classify bottlenecks |\n| Reflect | Track efficiency trends        |\n\n### 📸 Example output from profiling a geospatial annotation pipeline\n\nBelow is a snapshot from a real profiling run on DINOv2 + SAM2:\n\n![Profiling Snapshot](assets/snapshot2.png)\n\nThe results are stored in:\n\n- `runs/profile_logs/.../summary.json` → per-run aggregated metrics  \n- `reflect_history.json` → longitudinal trend tracking\n\nThese form the basis for future **agentic actions**, such as:\n- Replacing inefficient PyTorch kernels with custom CUDA/Triton implementations\n- Adjusting launch configurations or fusing operators\n- Triggering code-generation agents\n\n---\n\n## 🔥 **CUDA Samples Integration**\n\nThe agent provides a Pythonic layer over classic CUDA patterns (via official samples):\n\n- **Memory \u0026 Data Movement**  \n  `bandwidthTest`, `transpose`, `globalToShmemAsyncCopy`, `UnifiedMemoryStreams`\n\n- **Computation Kernels**  \n  `reduction`, `scan`, GEMM tensor core examples\n\n- **Advanced Features**  \n  CUDA Graphs, Cooperative Groups, Async API\n\n- **Linear Algebra \u0026 Solvers**  \n  cuBLAS, cuSolver\n\n- **Signal \u0026 Image Processing**  \n  CUFFT, DCT, NPP routines\n\n- **Miscellaneous / Educational**  \n  `deviceQuery`, `inlinePTX`, `cudaOpenMP`, NVRTC runtime compilation\n\n---\n\n## 🧪 **Scientific + DL Interoperability**\n\n- CuPy / cuDF kernels can be profiled alongside Triton / CUDA kernels.  \n- PyTorch Inductor graphs can be analyzed to identify subgraphs for replacement.  \n- Goal: Combine **high-level DL graphs** with **low-level profiling data**.\n\n---\n\n## 📦 **Installation**\n\n**TestPyPI**:  \n👉 [https://test.pypi.org/project/gpu-agent-opt/](https://test.pypi.org/project/gpu-agent-opt/)\n\n```bash\npip install gpu-agent-opt\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintelav%2Fgpu-agent-opt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fintelav%2Fgpu-agent-opt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintelav%2Fgpu-agent-opt/lists"}