Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with cuda-programming
A curated list of projects in awesome lists tagged with cuda-programming .
https://github.com/cpp-taskflow/cpp-taskflow
A General-purpose Task-parallel Programming System using Modern C++
concurrent-programming cuda-programming gpu-programming heterogeneous-parallel-programming high-performance-computing multi-threading multicore-programming multithreading parallel parallel-computing parallel-programming taskflow taskparallelism threadpool work-stealing
Last synced: 08 Aug 2024
https://github.com/taskflow/taskflow
A General-purpose Task-parallel Programming System using Modern C++
concurrent-programming cuda-programming gpu-programming heterogeneous-parallel-programming high-performance-computing multi-threading multicore-programming multithreading parallel parallel-computing parallel-programming taskflow taskparallelism threadpool work-stealing
Last synced: 30 Sep 2024
https://github.com/brucefan1983/CUDA-Programming
Sample codes for my CUDA programming book
cuda-programming gpu-programming molecular-dynamics-simulation
Last synced: 04 Aug 2024
https://github.com/DefTruth/CUDA-Learn-Notes
🎉CUDA/C++ 笔记 / 大模型手撕CUDA / 技术博客,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
block-reduce cuda cuda-kernels cuda-programming elementwise flash-attention flash-attention-2 gemm gemv layernorm rmsnorm softmax warp-reduce
Last synced: 31 Jul 2024
https://github.com/eyalroz/cuda-api-wrappers
Thin, unified, C++-flavored wrappers for the CUDA APIs
api-wrapper cuda cuda-api-wrappers cuda-device cuda-driver cuda-driver-api cuda-programming cuda-runtime-api cuda-toolkit gpgpu gpgpu-computing gpu gpu-computing gpu-memory modern-cpp
Last synced: 02 Aug 2024
https://github.com/nvidia/cccl
CUDA C++ Core Libraries
accelerated-computing cpp cpp-programming cuda cuda-cpp cuda-kernels cuda-library cuda-programming gpu gpu-acceleration gpu-computing gpu-programming hpc modern-cpp nvidia nvidia-gpu parallel-algorithm parallel-computing parallel-programming
Last synced: 29 Sep 2024
https://github.com/NVIDIA/cccl
CUDA C++ Core Libraries
accelerated-computing cpp cpp-programming cuda cuda-cpp cuda-kernels cuda-library cuda-programming gpu gpu-acceleration gpu-computing gpu-programming hpc modern-cpp nvidia nvidia-gpu parallel-algorithm parallel-computing parallel-programming
Last synced: 04 Aug 2024
https://github.com/sail-sg/Adan
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
adan artificial-intelligence bert-model convnext cuda-programming deep-learning diffusion dreamfusion fairseq gpt2 llm-training llms mae moe optimizer pytorch resnet timm transformer-xl vit
Last synced: 01 Aug 2024
https://github.com/mit-han-lab/TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
arm c cpp cuda-programming deep-learning edge-computing large-language-models on-device-ai quantization x86-64
Last synced: 03 Aug 2024
https://github.com/coreylowman/cudarc
Safe rust wrapper around CUDA toolkit
cublas cuda cuda-kernels cuda-programming cuda-toolkit cudnn curand gpu gpu-acceleration nccl nvrtc rust
Last synced: 04 Aug 2024
https://github.com/laugh12321/TensorRT-YOLO
🚀 TensorRT-YOLO: Supports YOLOv3, YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, and PP-YOLOE using TensorRT acceleration with EfficientNMS, CUDA Kernels and CUDA Graphs!
cuda cuda-graph cuda-kernels cuda-programming detection onnx ppyoloe tensorrt yolov10 yolov3 yolov5 yolov6 yolov7 yolov8 yolov9
Last synced: 31 Jul 2024
https://github.com/nosferalatu/SimpleGPUHashTable
A simple GPU hash table implemented in CUDA using lock free techniques
cuda cuda-programming data-structures gpu gpu-cuda-programs
Last synced: 02 Aug 2024
https://github.com/HMUNACHI/cuda-repo
From zero to hero CUDA for accelerating maths and machine learning on GPU.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 02 Aug 2024
https://github.com/MuGdxy/muda
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
cuda cuda-cpp cuda-programming
Last synced: 04 Aug 2024
https://github.com/PaddleJitLab/CUDATutorial
A self-learning tutorail for CUDA High Performance Programing.
cuda-programming deep-learning
Last synced: 04 Aug 2024
https://github.com/LinhanDai/yolov9-tensorrt
YOLOv9 Tensorrt deployment acceleration,provide two implementation methods: C++and Python🔥🔥🔥
cpp cuda-programming python tensorrt yolov9
Last synced: 31 Jul 2024
https://github.com/Lin-Mao/DrGPUM
A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.
cuda-programming gpu-memory gpu-memory-profiler gpu-profiler memory-management
Last synced: 09 Aug 2024
https://github.com/codingonion/cuda-beginner-course-cpp-version
bilibili视频【CUDA 12.1 并行编程入门(C++语言版)】配套代码
cpp cublas cuda cuda-programming cudnn gpu gpu-programming nvcc nvidia parallel-programming python rust
Last synced: 04 Aug 2024
https://github.com/littlebearsama/xxCu3Dlibrary
cuda 加速3D点云算法库,持续更新(含cudaicp,glfw点云可视化等)
cuda-programming glfw3 pointcloud
Last synced: 31 Jul 2024
https://github.com/codingonion/cuda-beginner-course-python-version
bilibili视频【CUDA 12.1 并行编程入门(Python语言版)】配套代码
cpp cublas cuda cuda-programming cudnn cupy gpu gpu-programming nvcc nvidia parallel-programming python rust
Last synced: 04 Aug 2024
https://github.com/codingonion/cuda-beginner-course-rust-version
bilibili视频【CUDA 12.1 并行编程入门(Rust语言版)】配套代码
candle cpp cublas cuda cuda-programming cudarc cudnn gpu gpu-programming nvcc nvidia parellel-programming python rust
Last synced: 04 Aug 2024
https://github.com/pastekaztekastor/crowd-simulation
Le projet consiste en une simulation de foule sur une grille, avec des versions parallélisées sur carte graphique. L'objectif est de modéliser le mouvement des individus dans un environnement en utilisant des paramètres tels que la dimension de la grille, le nombre d'individus et exporte de résultat de chaque frame dans unfichier bin pour analyse.
c cmake cpp crowdsimulation cuda-programming graphicscard grid-layout ipynb make nvidia-gpu parallelization
Last synced: 27 Sep 2024
https://github.com/GCaptainNemo/Cuda-Image-Processing
Using CUDA GPU Programming to speed up image processing.
cuda-programming image-processing
Last synced: 31 Jul 2024
https://github.com/evanmcclure/hello_gpu
Hello world example for Rust on GPU
apple apple-silicon cuda cuda-programming example-project gpu gpu-programming gpu-support metal rust rust-lang
Last synced: 28 Sep 2024
https://github.com/dpetrosy/fractal
This project is a Fractal Visualizer developed in C++ with SFML and CUDA.
burning-ship cmake cmakelists cpp cpp-programming cpp-project cuda cuda-opengl cuda-programming fractal fractal-generation fractal-visualization julia mandelbox mandelbrot opengl opengl-project sfml sfml-library tricorn
Last synced: 29 Sep 2024
https://github.com/pavulurig/matrix-mul-pytorch-cuda-cpu-analysis
Compare the performances of the matrix multiplication on CPU and GPU with PyTorch cuda programming.
cuda-programming matrix-multiplication python3 pytorch
Last synced: 01 Oct 2024
https://github.com/sahil-rajwar-2004/vector-cuda
vector calculation with GPU acceleration using CUDA
c cpp11 cuda cuda-kernels cuda-programming nvcc
Last synced: 29 Sep 2024
https://github.com/pzaino/cpp-hpc
A collection of stuff for HPC in C++
coding cpp cpp17 cuda-programming hpc library opencl openmp
Last synced: 28 Sep 2024
https://github.com/bardiparsi/threadpoolmanager
ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.
cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe
Last synced: 28 Sep 2024