An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with cuda-kernels

A curated list of projects in awesome lists tagged with cuda-kernels .

https://github.com/nvidia/cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

cuda cuda-driver-api cuda-kernels cuda-opengl

Last synced: 12 May 2025

https://github.com/NVIDIA/cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

cuda cuda-driver-api cuda-kernels cuda-opengl

Last synced: 18 Mar 2025

https://github.com/internlm/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

codellama cuda-kernels deepspeed fastertransformer internlm llama llama2 llama3 llm llm-inference turbomind

Last synced: 24 Dec 2025

https://github.com/InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

codellama cuda-kernels deepspeed fastertransformer internlm llama llama2 llama3 llm llm-inference turbomind

Last synced: 20 Mar 2025

https://github.com/deftruth/cuda-learn-notes

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

cuda cuda-12 cuda-cpp cuda-demo cuda-kernel cuda-kernels cuda-library cuda-toolkit flash-attention hgemm learn-cuda leet-cuda

Last synced: 14 May 2025

https://github.com/rust-gpu/rust-cuda

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.

cuda cuda-kernels cuda-programming gpgpu gpu gpu-programming rust rust-lang

Last synced: 14 May 2025

https://github.com/Rust-GPU/Rust-CUDA

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.

cuda cuda-kernels cuda-programming gpgpu gpu gpu-programming rust rust-lang

Last synced: 27 Mar 2025

https://github.com/xlite-dev/cuda-learn-notes

📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm

Last synced: 15 Apr 2025

https://github.com/xlite-dev/CUDA-Learn-Notes

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm

Last synced: 26 Mar 2025

https://github.com/DefTruth/CUDA-Learn-Notes

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm

Last synced: 20 Mar 2025

https://github.com/NVIDIA/nvbench

CUDA Kernel Benchmarking Library

benchmark cuda cuda-kernels gpu kernel-benchmark nvidia performance

Last synced: 16 May 2025

https://github.com/nvidia/nvbench

CUDA Kernel Benchmarking Library

benchmark cuda cuda-kernels gpu kernel-benchmark nvidia performance

Last synced: 14 Apr 2025

https://github.com/laugh12321/tensorrt-yolo

🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast inference speeds.

cuda cuda-graph cuda-kernels cuda-programming detection onnx ppyoloe tensorrt yolov10 yolov3 yolov5 yolov6 yolov7 yolov8 yolov9

Last synced: 14 May 2025

https://github.com/laugh12321/TensorRT-YOLO

🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast inference speeds.

cuda cuda-graph cuda-kernels cuda-programming detection onnx ppyoloe tensorrt yolov10 yolov3 yolov5 yolov6 yolov7 yolov8 yolov9

Last synced: 18 Mar 2025

https://github.com/harrism/hemi

Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.

c-plus-plus cuda cuda-device cuda-kernels gpu hemi

Last synced: 06 Apr 2025

https://github.com/HMUNACHI/henry-vjp

From zero to hero CUDA for accelerating maths and machine learning on GPU.

cuda cuda-kernels cuda-programming machine-learning maths

Last synced: 05 Apr 2025

https://github.com/hmunachi/henry-vjp

From zero to hero CUDA for accelerating maths and machine learning on GPU.

cuda cuda-kernels cuda-programming machine-learning maths

Last synced: 08 Apr 2025

https://github.com/HMUNACHI/CUDATutorials

Zero to Hero GPU and CUDA for Maths & ML tutorials with examples.

cuda cuda-kernels cuda-programming machine-learning maths

Last synced: 24 Apr 2025

https://github.com/HMUNACHI/cuda-tutorials

CUDA tutorials or Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.

cuda cuda-kernels cuda-programming machine-learning maths

Last synced: 13 May 2025

https://github.com/deepakkumar1984/amplifier.net

Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your function in .NET and Amplifier will take care of running it on your favorite hardware.

compiler cuda-kernels gpgpu gpgpu-computing gpgpu-sim opencl opencl-kernels simd

Last synced: 15 Dec 2025

https://github.com/deepakkumar1984/Amplifier.NET

Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your function in .NET and Amplifier will take care of running it on your favorite hardware.

compiler cuda-kernels gpgpu gpgpu-computing gpgpu-sim opencl opencl-kernels simd

Last synced: 14 Mar 2025

https://github.com/patwie/cuda-design-patterns

Some CUDA design patterns and a bit of template magic for CUDA

bazel cpp11 cuda cuda-development cuda-device cuda-kernels cuda-utils gpu template-metaprogramming

Last synced: 14 Apr 2025

https://github.com/microsoft/tilefusion

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

cpp cuda-kernels

Last synced: 10 Apr 2025

https://github.com/yalue/cuda_scheduling_examiner_mirror

A tool for examining GPU scheduling behavior.

benchmark cuda cuda-kernels gpu gpu-scheduling mandelbrot

Last synced: 28 Oct 2025

https://github.com/emptysoal/cuda-image-preprocess

Speed up image preprocess with cuda when handle image or tensorrt inference

cnn cuda cuda-demo cuda-kernels cuda-programming deep-learning image-processing tensorrt

Last synced: 01 Aug 2025

https://github.com/stellar-group/octotiger

Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees

astrophysics cuda cuda-kernels hpx kokkos simd stellar-mergers sycl

Last synced: 04 Jul 2025

https://github.com/STEllAR-GROUP/octotiger

Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees

astrophysics cuda cuda-kernels hpx kokkos simd stellar-mergers sycl

Last synced: 04 Apr 2025

https://github.com/l4nos/php-cuda

An extesnion for PHP allowing it to access GPU operations on CUDA graphics cards (NVIDIA)

cuda cuda-kernels cuda-php php php-dll php-ext php-extension

Last synced: 26 Aug 2025

https://github.com/sashakolpakov/dire-rapids

DiRe accelerated by PyTorch, PyKeOps and cuVS

cuda cuda-kernels dimensionality-reduction pykeops pytorch rapidsai t-sne umap

Last synced: 02 Oct 2025

https://github.com/ashvardanian/scaling-democracy

GPU-accelerated Schulze voting method in Python, Numba, and CUDA, using ideas from Algebraic Graph Theory

cuda cuda-kernels dynamic-programming gpgpu graph-algorithms graph-theory pybind11 python voting

Last synced: 12 Apr 2025

https://github.com/lawmurray/gpu-gemm

CUDA kernel for matrix-matrix multiplication on Nvidia GPUs, using a Hilbert curve to improve L2 cache utilization.

cplusplus cuda cuda-kernels cuda-programming gpu gpu-computing gpu-programming matrix-multiplication numerical-methods scientific-computing

Last synced: 14 Apr 2025

https://github.com/matrix97317/oneneuralnetwork

This is a cross-chip platform collection of operators and a unified neural network library.

ai-compiler compiler cuda-kernels deep-neural-networks deeplearning-framework nerual-networks pytorch

Last synced: 01 Sep 2025

https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-c-cpp

Accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques.

cpp cuda cuda-kernels cuda-programming nsight nvidia profilling

Last synced: 10 Apr 2025

https://github.com/kartavyaantani/cuda_image_processing

A CUDA-accelerated image processing project featuring multiple GPU-based filters and enhancement techniques. Implements convolution, edge detection, Non-Local Means (NLM) denoising, K-Nearest Neighbors (KNN), and pixelization. Each operation is optimized using CUDA kernels for real-time performance on large images. The project supports command-line

cuda cuda-kernels cuda-programming cuda-toolkit gpu-programming high-performance-computing image-manipulation image-processing nvidia-cuda nvidia-gpu

Last synced: 19 Apr 2025

https://github.com/nrmancuso/big-bang

CUDA and OpenMp NBody simulation based on data from the Milky Way and Andromeda Galaxies

c cuda-kernels cuda-programming nbody-simulation openmp-parallelization parallel-computing space

Last synced: 13 Jun 2025

https://github.com/gravitytwog/electromagneticfield

Electro-magnetic field simulation made with CUDA

c cuda cuda-kernels cuda-programming

Last synced: 14 Apr 2025

https://github.com/hariprashad-ravikumar/accelerated-computing-in-cuda-c

This repo contains my codes for problem sets in NVIDIA Getting Started with Accelerated Computing in CUDA C/C++

c cuda cuda-kernels cuda-toolkit

Last synced: 01 Jul 2025

https://github.com/0xhilsa/variable

variable + CUDA

cuda-kernels cuda-toolkit python3

Last synced: 09 Apr 2025

https://github.com/sergeipapina/color2graycuda

color to gray image conversion nvidia CUDA kernel implementation using make or cmake to compile and link

cmake cuda cuda-kernels cuda-programming link makefile nvidia

Last synced: 06 Apr 2025

https://github.com/giog97/histogram_equalization_cuda

Performance comparison of sequential and parallel CUDA Histogram Equalization for image contrast enhancement.

cuda cuda-kernels cuda-programming histogram-equalization image-processing parallel-computing parallel-programming

Last synced: 14 Apr 2025

https://github.com/flosmume/cpp-cuda-deepvision-rtx-starter

CUDA C++ practice project for RTX 4070 SUPER — explore GPU concurrency, pinned memory, and Nsight profiling. Includes SAXPY and 2D blur kernels to train optimization, stream overlap, and timing analysis for NVIDIA Developer Technology Engineering skillset.

cpp cuda cuda-kernels cuda-streams deep-learning-inference gpu gpu-optimization gpu-profiling high-performance-computing nsight nvidia parrallel-computing pinned-memory

Last synced: 31 Oct 2025

https://github.com/jakubfr4czek/concurrent-gauss-elimination

Concurrent gaussian elimination algorithm implemented using traces theory. Parallelism has been achieved employing CUDA cores.

agh agh-ust agh-wi conda cuda cuda-kernels cuda-toolkit diekert-graph graphviz java python python3 traces-theory

Last synced: 20 Feb 2025

https://github.com/awaldis/cuda-experiments

A place to explore the capabilities and limits of CUDA parallel processing.

cuda cuda-kernels cuda-programming

Last synced: 27 Aug 2025

https://github.com/zahi1/concurrent-programming

Repository featuring code examples and implementations in C#, C++, Go, and CUDA, showcasing threading, synchronization, parallel processing, and asynchronous programming concepts for multi-core and GPU architectures.

concurrent-programming cuda-kernels go openmp

Last synced: 17 Mar 2025

https://github.com/dhakalnirajan/baghchal-rl

C/CUDA implementation of Baagh Chaal Game with Neural Network

bagh-chal baghchal c clang cuda cuda-kernels neural-network reinforcement-learning

Last synced: 17 Oct 2025

https://github.com/bjornmelin/cuda-core-projects

🎯 Essential CUDA programming patterns and optimizations. Showcasing parallel computing expertise through matrix operations, memory management, and advanced kernel implementations. 💻

cpp cuda cuda-kernels gpu-computing high-performance-computing nvidia optimization parallel-computing

Last synced: 20 Jul 2025

https://github.com/chrisdalvit/gpu-matrix-transpose

Implementation and benchmarking of different matrix transpose with CUDA

c cpp cuda cuda-kernels cuda-programming gpu-acceleration gpu-computing gpu-programming matrix-transpose nvidia-gpu

Last synced: 07 Apr 2025

https://github.com/shendrew/cuda-renderer

A simple real-time 3D renderer built from scratch and accelerated with CUDA

3d-graphics cuda-kernels

Last synced: 07 Oct 2025

https://github.com/dino65-dev/cuda_ml_library

This is a Cuda applied ML Library so that anyone can use GPU Powered ML with Ease in Python.

aritificial-intelligence cuda-kernels cuda-programming gpu-computing gpu-programming machine-learning

Last synced: 09 Oct 2025

https://github.com/0xhilsa/vector-cuda

vector calculation with GPU acceleration using CUDA

c cpp11 cuda cuda-kernels cuda-programming nvcc

Last synced: 02 Apr 2025

https://github.com/dino65-dev/cuda-ml-library

This is a Cuda applied ML Library so that anyone can use GPU Powered ML with Ease in Python.

aritificial-intelligence cuda-kernels cuda-programming gpu-computing gpu-programming machine-learning

Last synced: 23 Aug 2025

https://github.com/sahil-rajwar-2004/vector-cuda

vector calculation with GPU acceleration using CUDA

c cpp11 cuda cuda-kernels cuda-programming nvcc

Last synced: 15 May 2025

https://github.com/adityamotale/y3

Yeet your typo's into the shadow realm before it makes it to production!

cuda-kernels rust spellcheck

Last synced: 15 Apr 2025

https://github.com/nvaranki/cmmx

CUDA matrix multiplication (official guide, modified)

cuda cuda-kernels

Last synced: 08 Aug 2025

https://github.com/alexkranias/triton_vs_cuda

Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.

cuda cuda-kernels gpu gpu-programming parallel-programming python triton

Last synced: 30 Mar 2025

https://github.com/tomtolleson/cuda-kernel-benchmarking-tool

A benchmarking tool in C++ that creates Cuda kernels and tests the overall system performance between CPU and GPU

cuda cuda-kernels cuda-support cuda-toolkit nvidia nvidia-cuda nvidia-gpu

Last synced: 30 Mar 2025

https://github.com/0x778/gaussian_filter_using_cuda

Implemention of gaussain filter using CUDA

cuda cuda-kernels cuda-programming image-processing

Last synced: 09 Apr 2025