CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-30 00:07:24 UTC
- JSON Representation
https://github.com/thesupercd/cuda_sort
A simple project implementing and measuring the runtime performance metrics related to massively parallel algorithms (radix sort) on an NVIDIA GPU device.
benchmarking c cpp cuda cuda-programming gpu-acceleration gpu-programming multithreading parallel-processing radix-sort sorting-algorithms
Last synced: 10 May 2026
https://github.com/grindelfp/cuda-texture-memory
Exercise on using texture memory in CUDA.
Last synced: 30 Mar 2025
https://github.com/thesoenke/deeplearning-docker
Setup for Deep Learning experiments in Docker with Cuda
Last synced: 11 May 2026
https://github.com/h4ck3r-04/fpassword
Fpassword merges Hashcat's hash-cracking precision with Hydra's parallelized network login, offering penetration testers a powerful tool for swift hash deciphering and simultaneous login attempts across diverse protocols.
brute-force brute-force-attacks c cracking cuda gpgpu hashcat hashes hydra network-security opencl password penetration-testing
Last synced: 16 Jan 2026
https://github.com/dragonscypher/prompty
Tool for generating smart and secure prompts for language models!
autotokenizer bert-model cuda google-t5 llm python3 tensorflow threading
Last synced: 02 Jan 2026
https://github.com/lucatedeschini/feedforwardnn
This project is my submission for the exam "Project Work in Architecture and Platform for Artificial Intelligence"
c cuda neural-networks openmp scratch-implementation
Last synced: 20 Apr 2026
https://github.com/akshaysinhaaa/emova
A deep learning framework designed for emotion and sentiment recognition using text, audio, and video modalities. This project leverages the MELD (Multimodal EmotionLines Dataset) to train a robust and flexible model that reflects human communication more accurately than unimodal models.
bert cnn cuda deep-learning multimodal python pytorch resnet-18 tensorboard transformers
Last synced: 05 May 2026
https://github.com/maltsev-andrey/julia_set_cuda
High-performance Julia set fractal computation in pure CUDA C, achieving 2.78 billion pixels/second on Tesla P100. Demonstrates GPU kernel programming, memory optimization, and massive parallelization (16M+ threads)."
cuda fractals gpu-programming high-performance-computing nvidia parallel-computing science visualization
Last synced: 03 Nov 2025
https://github.com/rbuj-uoc/m1.209
PAC 1, PAC 2, PAC 3 i PAC 4 de l'assignatura Computació d'altes prestacions del MUEI
Last synced: 21 May 2026
https://github.com/cmazakas/cuda-stuff
A CUDA-based playground
cmake cuda delaunay-triangulation vscode
Last synced: 24 Mar 2025
https://github.com/nmicic/k-tuplet-search
k-tuplet-search
computational-number-theory cuda experimental-mathematics gmp gpu-computing high-performance-computing hpc k-tuplets number-theory primality-testing prime-numbers prime-tuples sieve
Last synced: 21 May 2026
https://github.com/maxenceleguery/jare
3D Render engine accelerated with CUDA
Last synced: 21 May 2026
https://github.com/sbstndb/nbody_k
A simple 3D naïve NBody simulation using Kokkos enabling CUDA or OpenMP backend
cuda kokkos nbody openmp simulation
Last synced: 21 May 2026
https://github.com/Parxd/cuda-optim
various CUDA kernels optimized for specific ML algos
Last synced: 02 Sep 2025
https://github.com/mattjesc/federated-learning-simulation-1gpu-mi-is
Federated Learning Simulation on a Single GPU with Model Interpretability and Interactive Visualization
ai cuda deep-learning distributed-systems federated-learning gpu hpc keras machine-learning ml model-interpretability python pytorch simulation streamlit tensorflow
Last synced: 05 Jan 2026
https://github.com/shermanlo77/poisson_icing
Gibbs sampling on the Poisson-Ising model. The Poisson-Ising model is a 2D image of Poisson distributed random variables but has a dependency on their four neighbours. This causes the Poisson random variables to be similar (or dissimilar) to their neighbours.
cuda cupy gibbs-sampling gpu ising-model mcmc monte-carlo poisson poisson-ising
Last synced: 21 May 2026
https://github.com/bjornmelin/ml-algorithm-playground
🧪 Core ML algorithm implementations with GPU acceleration. Featuring optimized implementations across various libraries with comprehensive analysis. 📈
algorithms cuda gpu-computing lightgbm machine-learning python scikit-learn xgboost
Last synced: 13 May 2026
https://github.com/minseoc03/cuda-100-days
A 100-day journey to master CUDA programming, inspired by the CUDA-120-DAYS--CHALLENGE project. This repo contains daily CUDA exercises and code folders, with learning notes hosted on Notion. Practicing on leetgpu.com due to lack of local NVIDIA GPU.
100daysofcode cuda deeplearning gpgpu gpu hpc nvidia parallel-computing
Last synced: 19 Apr 2025
https://github.com/moesio-f/cla
C Linear Algebra (CLA) library. A simple toy library for basic vector/matrix operations with CUDA support and Python bindings.
Last synced: 09 May 2026
https://github.com/ndgigliotti/torch-ipca
GPU-accelerated Incremental PCA for PyTorch
cuda dimensionality-reduction gpu incremental-pca machine-learning pca pytorch
Last synced: 26 Jan 2026
https://github.com/marnovo/cuda-projects
cuda cuda-kernels gpu gpu-programming nvidia-cuda parallel-computing
Last synced: 10 Jun 2025
https://github.com/ionmich/cs149-local-dev
Provides `conda` installation instructions for Stanford's CS149 (Parallel Computing) programming assignments
conda cs149 cuda ispc parallel-computing
Last synced: 31 Mar 2025
https://github.com/dasbd72/nthu-ipc-2022
National Tsing Hua University - Introduction to Parallel Computing - 2022
cuda cuda-programming hpc mpi openmp pthreads
Last synced: 30 Mar 2025
https://github.com/daelsepara/hipnewton
GPU Implementation of Newton Fractal Generator with Benchmarking
amd cuda fractal gpu gpu-compute gpu-computing hip newton parallel-computing rocm sdk
Last synced: 03 May 2026
https://github.com/anne-andresen/autoencoder_3d_c_cuda
3D Autoencoder training in raw C/CUDA
Last synced: 28 Apr 2026
https://github.com/fedesky25/hpc-project-2024
Project for the 2024 course of HPC: generator of streamplot of complex-valued functions
Last synced: 30 Mar 2025
https://github.com/cs550-epfl/review
Review of the paper A Formal Analysis of the NVIDIA PTX Memory Consistency Model
cuda formal-verification gpu memory-consistency ptx simt
Last synced: 30 Mar 2025
https://github.com/td99/ai-sandbox
A collection of AI tools and prototypes.
ai cuda docker image-generation-ai nvidia python
Last synced: 08 Apr 2026
https://github.com/belrbez/ship-graphic-qt-qml-cuda-c
Client-Server application for Rocket driving in QML graphics
c client-server cpp cuda qml qt5 rocket
Last synced: 08 Apr 2026
https://github.com/cuda8/brainwords2
GPU brainflayer for sale $250
brain brainflayer brainwords cuda gpu key pass passphrase private
Last synced: 10 Mar 2025
https://github.com/shtrophic/wicuvanity
Generate wireguard vanity keys on your Nvidia GPU
cuda gpu vanity-address vanity-addresses vanitygen wireguard
Last synced: 10 Mar 2025
https://github.com/monajemi-arman/sparkling
Easy to use Spark cluster management panel with GPU support
apache-spark csharp cuda distributed-computing distributed-learning docker gpu javascript nextjs torch typescript
Last synced: 12 Apr 2026
https://github.com/Neuro-Mechatronics-Interfaces/python-intan
Tools and demos for working with EMG data from intan using python
circuitpython cuda emg pico python realtime tensorflow
Last synced: 13 Jan 2026
https://github.com/uefi-code/bachelorgraduationdesign
I developed a PyTorch_For_PoorGuys framework and Let it train LLM on NVIDIA GeForce 2080Ti GPU as my Bachelor's Graduation Design Project
chatbot cuda gpu hacking large-language-models pytorch
Last synced: 03 May 2026
https://github.com/sergeipapina/color2graycuda
color to gray image conversion nvidia CUDA kernel implementation using make or cmake to compile and link
cmake cuda cuda-kernels cuda-programming link makefile nvidia
Last synced: 06 Apr 2025
https://github.com/kataglyphis/machinelearningalgorithms
Basic Machine Learning Algorithms
cuda machine-learning python tensorflow
Last synced: 31 Mar 2025
https://github.com/stephanmg/cuda-playground
CUDA playground
cpu cuda gp100 gpu gv100 openmp parallel-computing parallel-programming
Last synced: 30 Mar 2025
https://github.com/tdavidcl/cu_intercept
cuda cuda-memory cuda-programming hook massif memory-tracking preload
Last synced: 03 May 2026
https://github.com/codename-detective/cuda_gpgpus_shared_memory_systems_pdp
CUDA GPGPUs Shared Memory Systems Parallel & Distributed Programming
cuda cuda-programming numa parallel-programming
Last synced: 30 Mar 2025
https://github.com/voltr0x/raytracing-cuda
Raytracing in a weekend using CUDA
Last synced: 01 Apr 2026
https://github.com/AndreasKaratzas/orin
Setting up the NVIDIA Jetson Orin Nano Developer Kit
cuda cudnn jetpack6 nvidia-jetson nvidia-sdkmanager orin-nano
Last synced: 25 Feb 2025
https://github.com/adesoji1/youtubesummaryai
Python script for YouTube summary. The service should summarize an YouTube video by url. It should works for long video and for different languages.
cuda googleapi python3 speech-recognition transformers youtube-api-v3 youtube-dl
Last synced: 04 Apr 2025
https://github.com/alkaifaftab000/autonomous-maze-solver
Building an Autonomous Maze Solver using reinforcement learning to train agents for decision-making in dynamic grid-based environments
agent criticism cuda gymnasium-environment maze-solving-bot pytorch reinforcement-learning reward-functions
Last synced: 12 Apr 2026
https://github.com/larygwil/cuda-samples-old
nvidia cuda samples old (5.0 - 7.5)
Last synced: 03 May 2026
https://github.com/tylerfaulkner/n-body_simulation
CUDA N-Body Gravitational Simulation with rendering in Python with MatPlotLib
Last synced: 20 May 2026
https://github.com/kronbii/thermal-super-resolution
State-of-the-art thermal super-resolution system (IMDN) with RGB→thermal adaptation, custom multi-component loss, 29.6 dB PSNR, 0.713 SSIM, 250+ FPS, production-ready PyTorch + CUDA implementation.
computer-vision cuda deep-learning image-enhancement imdn model-optimization production-machine-learning pytorch real-time real-time-processing research super-resolution thermal-imaging
Last synced: 18 Apr 2026
https://github.com/asadiahmad/100_sports_image_classification
A deep learning project for sport image classification using a custom VGG19-based architecture with integrated Grad-CAM heatmap visualization for model interpretability.
computer-vision cuda data-augmentation deep-learning explainable-ai gpu-acceleration grad-cam heatmap-visualization image-classification mixed-precision-training pytorch pytorch-grad-cam sports-analytics sports-classification transfer-learning vgg19
Last synced: 11 Jun 2025
https://github.com/ysl1016/cudadigitfilter
CUDA-based parallel image filtering system for MNIST dataset
computer-vision cuda deep-learning gpu-acceleration image-processing mnist parallel-computing
Last synced: 28 Mar 2025
https://github.com/ojaswithag/opencv-doc
OpenCV ile görüntü ve video işleme, makine öğrenmesi ve proje uygulamaları için Türkçe kapsamlı bir rehber. 🐙 Adım adım kod örnekleriyle öğrenin ve projeler geliştirin.
arm-architecture cuda cuda-support deployment django docker-image docker-images heroku image-processing javascript nodejs nvidia opencv-contrib opencv3 production python scanner tutorial
Last synced: 08 Apr 2026
https://github.com/yangfengzzz/tardis
Travel space and time by using autodiff and codegen
Last synced: 03 May 2026
https://github.com/airvzxf/c-plus-plus-understanding-cuda
Understanding CUDA with C++
cuda hacktoberfest hacktoberfest-accepted
Last synced: 22 Mar 2025
https://github.com/ergus/cuda-ts-mode
An emacs Cuda mode supported by tree-sitter
Last synced: 20 May 2026
https://github.com/branebb/nn-framework
Framework for creating neural networks using C++ and CUDA platform. This project is part of my final university assignment for bachelor's degree.
cmake cpp cuda cuda-programming
Last synced: 20 Jan 2026
https://github.com/voduchuy/cudafsp
CUDA-based implementation of the Finite State Projection (FSP) algorithm.
chemical-master-equation cuda stochastic-reaction-networks sundials
Last synced: 20 Jan 2026
https://github.com/maltsev-andrey/cuda-nn-inference
GPU-accelerated neural network inference using custom CUDA kernels. Achieves 97.82% accuracy on MNIST.
cuda deep-learning gpu-programming neural-networks numba nvidia parallel-computing parallel-programming performance-optimization python3 pytorch rhel9 tesla-p100
Last synced: 07 Mar 2026
https://github.com/andreasholt/cuda-matmul-benchmarking
Implementing and benchmarking various matmul implementations in CUDA
Last synced: 01 Nov 2025
https://github.com/nxoti1/points-reader-ocr
🖥️ Extract text from images easily with POINTS-Reader OCR, a high-accuracy application for seamless document conversion and processing.
cuda gradio huggingface-transformers ocr open-source points-reader reportlab spaces tencent vision-language-model vlm
Last synced: 20 May 2026
https://github.com/ludekcizinsky/fast-cg-solver
Implementation of Conjugate Gradient (CG) algorithm for solving sparse linear systems using MPI and CUDA.
Last synced: 17 May 2026
https://github.com/tomtolleson/cuda-kernel-benchmarking-tool
A benchmarking tool in C++ that creates Cuda kernels and tests the overall system performance between CPU and GPU
cuda cuda-kernels cuda-support cuda-toolkit nvidia nvidia-cuda nvidia-gpu
Last synced: 30 Mar 2025
https://github.com/myselfaryan/attention-mechanism
Accelerating Scaled Dot-Product Attention using OpenMP and CUDA
Last synced: 27 Apr 2026
https://github.com/juliankarrer/reyn
CUDA-based Implementation of Smoothed Particle Hydrodynamics for Fluid Simulation
cuda fluid lagrangian simulation sph
Last synced: 31 Oct 2025
https://github.com/lu-m-dev/cuda-molecular-simulation
CUDA accelerated molecular simulation of materials
cuda materials-science molecular-dynamics molecular-simulation monte-carlo
Last synced: 25 Jun 2026
https://github.com/nabilshadman/cuda-4-dummies
Lecture slides and exercise files of the CUDA 4 Dummies course (2025)
cuda gpu-computing high-performance-computing nsight-systems nvidia-gpu parallel-computing
Last synced: 31 Oct 2025
https://github.com/flosmume/cpp-cuda-streams-and-pinned-mem
A CUDA C++ demo showing how to overlap data transfer and kernel execution using multiple streams and pinned (page-locked) host memory. This project illustrates asynchronous memcpy, event timing, and performance benefits of concurrent GPU execution — essential for building high-throughput pipelines.
asynchronous-execution cuda cuda-streams gpu parallel-programming performance-optimization pinned-memory
Last synced: 13 May 2026
https://github.com/sephiroth7712/k-nearest-neigbours
Implementation of K-Nearest Neighbors algorithm using multiple parallel computing approaches: CUDA (GPU), Hadoop, Spark, MPI, OpenMP, and PThreads. Demonstrates scalable machine learning across different parallel computing paradigms from GPU to distributed frameworks.
cuda cuda-programming hadoop-mapreduce java mpi multiprocessing multithreading openmp pthreads scala spark
Last synced: 12 Apr 2026
https://github.com/uva-trasgo/controllers
Read-only mirror of the official repository: https://gitlab.com/trasgo-group-valladolid/controllers. Controllers is a library written in C11 that provides a simplified way to program applications that can exploit heterogeneous computational platforms including accelerators and/or multi-core CPUs.
cuda heterogeneous-computing heterogeneous-parallel-programming hip opencl openmp
Last synced: 12 May 2026
https://github.com/mahdi-hasan-shuvo/ml-opensource-project
is an open source repository focused on providing practical and educational machine learning resources. The project aims to make learning and applying machine learning more accessible through well-documented code, tutorials, and real-world examples.
cuda machine-learning machine-learning-algorithms ml-projects open-source python
Last synced: 19 May 2026
https://github.com/sneha-at-hub/bruteforce_passwordcracking_in-milliseconds
Last synced: 28 Apr 2026
https://github.com/eastonman/tensorrt-pytorch-wrapper
A wrapper makes TensorRT engine accept PyTorch Cuda Tensor.
Last synced: 06 May 2026
https://github.com/TeamBipartite/bipartite-gemm
High throughput data-parallel GEMM implementations in Cuda using Cuda cores and Tensor cores
Last synced: 14 Jan 2026
https://github.com/drilonaliu/parallel-mandelbrot-set
GPU-accelerated Mandelbrot Set generation with CUDA and OpenGL interoperability.
cuda fractals gpu mandelbrot-fractal parallel-programming
Last synced: 12 Apr 2026
https://github.com/aurelienperez/gpu-heston-monte-carlo
GPU-accelerated Monte Carlo simulation for option pricing under the Heston model using CUDA.
Last synced: 01 Apr 2025
https://github.com/nikhilrout/thetensorcoreproject
Microarchitecture implementation of Nvidia's Tensor Cores
cuda floating-point gpgpu hybrid-precision-training tensorcore
Last synced: 01 Apr 2025
https://github.com/ramyacp14/document-based-question-and-answers
Developed a document question answering system that utilizes Llama and LangChain for contextual and accurate answers. The system supports .txt documents, intelligent text splitting, and context-aware querying through an easy-to-use Streamlit interface.
chroma cuda hugging-face langchain llama python recursivecharactertextsplitter streamlit
Last synced: 07 Mar 2026
https://github.com/storterald/neural-network
Simple neural network implementation in C++ and CUDA
asm asmx86 c-plus-plus cmake cpp cuda machine-learning neural-network
Last synced: 28 Mar 2025
https://github.com/yutakseo/docker_ubuntu-cuda_environment
🐳 A ready-to-use Docker environment for deep learning development with Ubuntu 22.04 and CUDA 11.8.
container cuda docker environment ubuntu
Last synced: 12 Apr 2026
https://github.com/amypad/miutil
Basic functionality needed for AMYPAD
cuda matlab medical-imaging python
Last synced: 13 May 2025
https://github.com/ivanfioravanti/tflops_mps
TFLOPs testing on MPS and CUDA
Last synced: 19 May 2026
https://github.com/isquicha/cuda-parallel-studies
Learning CUDA programming here =D
cuda cuda-programming cuda-toolkit
Last synced: 03 Jul 2025
https://github.com/grindelfp/cuda-n-body-simulation
Simulation of N-Body movement using CUDA.
Last synced: 06 Apr 2025
https://github.com/drilonaliu/parallel-fractal-tree
GPU-accelerated fractal tree generation with CUDA and OpenGL interoperability.
cuda fractal-tree fractals gpu
Last synced: 19 May 2026
https://github.com/fikri-rouzan/cuda-c-program-part-1
CUDA C program from NVIDIA course.
Last synced: 12 Apr 2026
https://github.com/crazyguitar/libefaxx
aws benchmark cpp20-coroutine cuda efa gpu gpu-benchmarks hpc large-language-models llm rdma rdma-benchmarks
Last synced: 16 Jan 2026
https://github.com/patriciobcs/mini-aevol
Parallel implementation of a reduced version of the Aevol simulator
Last synced: 19 May 2026
https://github.com/alpinebuster/meshlib
Mesh processing library with extra `C/C#/JS/TS/PYTHON` bindings.
cuda dicom electron emscripten mesh mesh-modelling pybind11 stl stomatology threejs wasm
Last synced: 03 Jul 2025
https://github.com/muneeb706/cuda
sample programs implemented using cuda (gpu)
cplusplus cuda gpu-programming
Last synced: 19 May 2026
https://github.com/hnthap/vietnamese-word-segment
Vietnamese word segmentation package.
cuda torch transformers vietnamese vietnamese-nlp vietnamese-tokenizer word-segmentation
Last synced: 19 May 2026
https://github.com/pipecruz/cuda-flocking-sim
CPU and GPU (CUDA) implementations of naive/optimized flocking algorithms
Last synced: 07 May 2026
https://github.com/chiragajain/gpu-optimization-roadmap
This repository is part of a structured curriculum designed to master GPU optimization, Triton, Deep Learning, and LLMs. This section focuses on GPU fundamentals, CUDA programming, and PyTorch optimizations.
cuda deeplearning gpu-acceleration learning python pytorch triton
Last synced: 18 Feb 2026
https://github.com/kar-dim/CAS-2D
Implementation of the AMD FidelityFX CAS (Contrast Adaptive Sharpening) algorithm on CUDA, for sharpening static images.
cpp cuda dll fidelityfx gpu image-processing parallel-computing sharpen
Last synced: 01 Nov 2025
https://github.com/mxm-tr/docker-darknet-opencv
Accelerated objects detection on streams and files, using a Docker darknet YOLO container
cuda docker docker-compose object-recognition opencv-python python3 yolo
Last synced: 10 Apr 2026