Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with cuda-kernels
A curated list of projects in awesome lists tagged with cuda-kernels .
https://github.com/nvidia/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
cuda cuda-driver-api cuda-kernels cuda-opengl
Last synced: 17 Dec 2024
https://github.com/NVIDIA/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
cuda cuda-driver-api cuda-kernels cuda-opengl
Last synced: 27 Oct 2024
https://github.com/internlm/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
codellama cuda-kernels deepspeed fastertransformer internlm llama llama2 llama3 llm llm-inference turbomind
Last synced: 22 Dec 2024
https://github.com/InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
codellama cuda-kernels deepspeed fastertransformer internlm llama llama2 llama3 llm llm-inference turbomind
Last synced: 28 Oct 2024
https://github.com/coreylowman/dfdx
Deep learning in Rust, with shape checked tensors and neural networks
autodiff autodifferentiation autograd backpropagation cuda cuda-kernels cuda-support cuda-toolkit cudnn deep-learning deep-neural-networks gpu gpu-acceleration gpu-computing machine-learning neural-network rust rust-lang tensor
Last synced: 17 Dec 2024
https://github.com/nvidia/cccl
CUDA Core Compute Libraries
accelerated-computing cpp cpp-programming cuda cuda-cpp cuda-kernels cuda-library cuda-programming gpu gpu-acceleration gpu-computing gpu-programming hpc modern-cpp nvidia nvidia-gpu parallel-algorithm parallel-computing parallel-programming
Last synced: 19 Dec 2024
https://github.com/NVIDIA/cccl
CUDA Core Compute Libraries
accelerated-computing cpp cpp-programming cuda cuda-cpp cuda-kernels cuda-library cuda-programming gpu gpu-acceleration gpu-computing gpu-programming hpc modern-cpp nvidia nvidia-gpu parallel-algorithm parallel-computing parallel-programming
Last synced: 19 Nov 2024
https://github.com/coreylowman/cudarc
Safe rust wrapper around CUDA toolkit
cublas cuda cuda-kernels cuda-programming cuda-toolkit cudnn curand gpu gpu-acceleration nccl nvrtc rust
Last synced: 19 Dec 2024
https://github.com/laugh12321/TensorRT-YOLO
🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast inference speeds.
cuda cuda-graph cuda-kernels cuda-programming detection onnx ppyoloe tensorrt yolov10 yolov3 yolov5 yolov6 yolov7 yolov8 yolov9
Last synced: 27 Oct 2024
https://github.com/nvidia/nvbench
CUDA Kernel Benchmarking Library
benchmark cuda cuda-kernels gpu kernel-benchmark nvidia performance
Last synced: 21 Dec 2024
https://github.com/NVIDIA/nvbench
CUDA Kernel Benchmarking Library
benchmark cuda cuda-kernels gpu kernel-benchmark nvidia performance
Last synced: 19 Nov 2024
https://github.com/harrism/hemi
Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
c-plus-plus cuda cuda-device cuda-kernels gpu hemi
Last synced: 16 Dec 2024
https://github.com/kerneltuner/kernel_tuner
Kernel Tuner
auto-tuning autotuning c cplusplus cuda cuda-kernels gpu gpu-computing kernel-tuner machine-learning opencl opencl-kernels optimization python software-development testing
Last synced: 20 Dec 2024
https://github.com/hmunachi/cuda-repo
From zero to hero CUDA for accelerating maths and machine learning on GPU.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 21 Dec 2024
https://github.com/HMUNACHI/cuda-repo
From zero to hero CUDA for accelerating maths and machine learning on GPU.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 12 Nov 2024
https://github.com/deepakkumar1984/Amplifier.NET
Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your function in .NET and Amplifier will take care of running it on your favorite hardware.
compiler cuda-kernels gpgpu gpgpu-computing gpgpu-sim opencl opencl-kernels simd
Last synced: 26 Oct 2024
https://github.com/patwie/cuda-design-patterns
Some CUDA design patterns and a bit of template magic for CUDA
bazel cpp11 cuda cuda-development cuda-device cuda-kernels cuda-utils gpu template-metaprogramming
Last synced: 01 Nov 2024
https://github.com/yalue/cuda_scheduling_examiner_mirror
A tool for examining GPU scheduling behavior.
benchmark cuda cuda-kernels gpu gpu-scheduling mandelbrot
Last synced: 16 Dec 2024
https://github.com/bgin/radar-electrooptical-simulation
(REOS) Radar and Electro-Optical Simulation Framework written in C++.
amd-gpu atmosphere-model avx avx2 avx512 control-theory cuda-kernels fortran90 gpu-acceleration high-performance-computing infrared-sensors modelling radar radar-signal-processing radiative-transfer simd-instructions simulation vectorization
Last synced: 18 Dec 2024
https://github.com/emptysoal/cuda-image-preprocess
Speed up image preprocess with cuda when handle image or tensorrt inference
cnn cuda cuda-demo cuda-kernels cuda-programming deep-learning image-processing tensorrt
Last synced: 06 Dec 2024
https://github.com/stellar-group/octotiger
Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees
astrophysics cuda cuda-kernels hpx kokkos simd stellar-mergers sycl
Last synced: 12 Nov 2024
https://github.com/STEllAR-GROUP/octotiger
Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees
astrophysics cuda cuda-kernels hpx kokkos simd stellar-mergers sycl
Last synced: 05 Nov 2024
https://github.com/bgin/radar_electrooptical_simulation
(REOS) Radar and ElectroOptical Simulation Framework written in Fortran.
amdgpu avx avx-512 avx2 c99 control-systems cuda-kernels fortran90 gpu-acceleration high-performance-computing infrared-sensors modeling openmp radar radiative-transfer simd simulation vectorization
Last synced: 12 Oct 2024
https://github.com/huangcongqing/cuda-learning
cuda编程学习入门
cuda cuda-kernels cuda-programming
Last synced: 01 Nov 2024
https://github.com/conradsnicta/bandicoot-code
Bandicoot: C++ library for GPU linear algebra & scientific computing - https://coot.sourceforge.io
armadillo c-plus-plus clblas cublas cuda cuda-kernels cusolver gpu gpu-accelerated-library gpu-acceleration gpu-computing linear-algebra linear-algebra-library machine-learning matrix-functions matrix-library opencl opencl-kernels scientific-computing
Last synced: 02 Nov 2024
https://github.com/koushikphy/intro-to-cuda-fortran
A Complete beginner's introduction to programming with CUDA Fortran
cuda cuda-fortran cuda-kernels cuda-programming fortran fortran90 gpgpu gpu gpu-computing high-performance-computing hpc nvidia nvidia-cuda parallel-computing parallel-programming
Last synced: 11 Oct 2024
https://github.com/imsanjoykb/cuda-bootcamp
CUDA Programming Practices
computer-vision crypto-mining crypto-mining-program cuda cuda-api cuda-development cuda-device cuda-driver cuda-kernels cuda-library cuda-opengl cuda-programming cuda-resource cuda-support cuda-toolkit jetson jetson-inference jetson-xavier nvidia-cuda nvidia-jetson-nano
Last synced: 12 Oct 2024
https://github.com/alessandrobessi/cuda-lab
Playing with CUDA and GPUs in Google Colab
cuda cuda-kernels gpu gpu-acceleration gpu-programming parallel-algorithm parallel-computing
Last synced: 16 Oct 2024
https://github.com/ashvardanian/scaling-democracy
GPU-accelerated Schulze voting method in Python, Numba, and CUDA, using ideas from Algebraic Graph Theory
cuda cuda-kernels dynamic-programming gpgpu graph-algorithms graph-theory pybind11 python voting
Last synced: 07 Nov 2024
https://github.com/lawmurray/gpu-gemm
CUDA kernel for matrix-matrix multiplication on Nvidia GPUs, using a Hilbert curve to improve L2 cache utilization.
cplusplus cuda cuda-kernels cuda-programming gpu gpu-computing gpu-programming matrix-multiplication numerical-methods scientific-computing
Last synced: 01 Nov 2024
https://github.com/liberxue/parallel_computing
CUDA Algorithm && Hacker's Delight
algorithms cuda cuda-kernels cuda-programming hacker-s-delight nvidia
Last synced: 08 Nov 2024
https://github.com/l4nos/php-cuda
An extesnion for PHP allowing it to access GPU operations on CUDA graphics cards (NVIDIA)
cuda cuda-kernels cuda-php php php-dll php-ext php-extension
Last synced: 18 Dec 2024
https://github.com/giorgiogamba/parallel_programming
Experimenting with parallel programming
cuda cuda-kernels cuda-programming cuda-toolkit parallel parallel-computing parallel-processing parallel-programming visual-studio
Last synced: 08 Nov 2024
https://github.com/nrmancuso/big-bang
CUDA and OpenMp NBody simulation based on data from the Milky Way and Andromeda Galaxies
c cuda-kernels cuda-programming nbody-simulation openmp-parallelization parallel-computing space
Last synced: 03 Dec 2024
https://github.com/gogolb/ee147
Intro to GPU Computing
c cuda cuda-kernels cuda-toolkit gpu-computing gpu-programming university-course
Last synced: 01 Dec 2024
https://github.com/denyskryvytskyi/capgemini-cuda
CUDA implementation of vector additon, matrix multiplication, reduction and sorting
bitonic-sort cpp cuda cuda-kernels gpgpu matrix matrix-multiplication matrix-multiplication-parallel matrix-transpose nvidia nvidia-cuda nvidia-gpu reduction-dimension sort sorting-algorithms-implemented vector vector-addition vectorization
Last synced: 18 Dec 2024
https://github.com/sergeipapina/color2graycuda
color to gray image conversion nvidia CUDA kernel implementation using make or cmake to compile and link
cmake cuda cuda-kernels cuda-programming link makefile nvidia
Last synced: 20 Dec 2024
https://github.com/aaditya29/parallel-computing-and-cuda
Learning about Parallel Computing and GPU programming using CUDA.
c cpp cuda cuda-kernels cuda-programming nvidia-cuda openmp openmpi parallel-computing parallel-programming
Last synced: 14 Dec 2024
https://github.com/chrisdalvit/gpu-matrix-transpose
Implementation and benchmarking of different matrix transpose with CUDA
c cpp cuda cuda-kernels cuda-programming gpu-acceleration gpu-computing gpu-programming matrix-transpose nvidia-gpu
Last synced: 20 Dec 2024
https://github.com/binarybrainiacs/nexly
Deep Tech R&D Research
artificial-intelligence cpp20 cuda cuda-kernels cuda-programming deep-learning deep-neural-networks experimental hpc-systems infrastructure java maven natural-language-processing reccomendation-system research-and-development visual-studio
Last synced: 05 Nov 2024
https://github.com/shikha-code36/cuda-programming-beginner-guide
A beginner's guide to CUDA programming
cuda cuda-basic cuda-basics cuda-cpp cuda-demo cuda-kernel cuda-kernels cuda-library cuda-programming cuda-support cuda-toolkit
Last synced: 14 Nov 2024
https://github.com/marcoplaitano/counting-sort-cuda
Parallelized version of Counting Sort using CUDA
counting-sort cuda cuda-kernels cuda-programming gpu gpu-programming sort sorting sorting-algorithms
Last synced: 06 Nov 2024
https://github.com/marnovo/cuda-projects
cuda cuda-kernels gpu gpu-programming nvidia-cuda parallel-computing
Last synced: 07 Nov 2024
https://github.com/sahil-rajwar-2004/variable
variable + CUDA
cuda-kernels cuda-toolkit python3
Last synced: 10 Nov 2024
https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-c-cpp
Accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques.
cpp cuda cuda-kernels cuda-programming nsight nvidia profilling
Last synced: 09 Nov 2024
https://github.com/tgautam03/tgemm
General Matrix Multiplication using NVIDIA Tensor Cores
cuda-kernels cuda-programming gpu-computing gpu-programming matrix-multiplication nvidia-cuda nvidia-gpu nvidia-tensor-cores sgemm tensor-cores
Last synced: 10 Nov 2024
https://github.com/sahil-rajwar-2004/vector-cuda
vector calculation with GPU acceleration using CUDA
c cpp11 cuda cuda-kernels cuda-programming nvcc
Last synced: 19 Nov 2024
https://github.com/0xhilsa/vector-cuda
vector calculation with GPU acceleration using CUDA
c cpp11 cuda cuda-kernels cuda-programming nvcc
Last synced: 15 Dec 2024
https://github.com/nvaranki/cmmx
CUDA matrix multiplication (official guide, modified)
Last synced: 10 Dec 2024
https://github.com/alexkranias/triton_vs_cuda
Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.
cuda cuda-kernels gpu gpu-programming parallel-programming python triton
Last synced: 10 Dec 2024
https://github.com/tomtolleson/cuda-kernel-benchmarking-tool
A benchmarking tool in C++ that creates Cuda kernels and tests the overall system performance between CPU and GPU
cuda cuda-kernels cuda-support cuda-toolkit nvidia nvidia-cuda nvidia-gpu
Last synced: 11 Dec 2024