CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-20 00:07:16 UTC
- JSON Representation
https://github.com/PINTO0309/Open3D-build
Provide Docker build sequences of Open3D for various environments.
cuda docker jetson jetson-nano open3d open3d-python pytorch tensorflow
Last synced: 20 Mar 2025
https://github.com/Bruce-Lee-LY/matrix_multiply
Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
coppersmith-winograd cpp11 cpu cublas cuda kahan matrix-multiply naive nvidia reordering shared-memory strassen tiling
Last synced: 14 May 2025
https://github.com/dark-art108/gpu-docker-deployment-text-summarization
Text Summarization using Transformer on GPU Docker Deployment
cuda docker fastapi gpu-acceleration huggingface
Last synced: 02 May 2025
https://github.com/lona-cn/vision-simple
a lightweight C++ cross-platform vision inference library,support YOLOv10 YOLOv11 PaddleOCR EasyOCR ,using ONNXRuntime/TVM with multiple exectuion providers.
cuda directml easyocr ocr onnxruntime paddleocr tensorrt-inference tvm yolo
Last synced: 16 Oct 2025
https://github.com/lnstadrum/fastaugment
A handy data augmentation toolkit for image classification put in a single efficient TensorFlow/PyTorch op.
augmentation-transformations brightness-correction cuda cutout data-augmentation gamma-correction gpu mixup perspective-distortions tensorflow-op
Last synced: 23 Mar 2025
https://github.com/ivanrs297/cuda-spmv-csr
Parallel SpMV using CSR representation, built in CUDA
csr cuda parallel-computing spmv
Last synced: 20 Jun 2025
https://github.com/flyingfathead/dvr-yolov8-detection
Python+YOLOv8-based human/animal/object detection DVR framework with GUI, webUI and Telegram alerts
automation cctv cuda dvr dvr-tool human-activity-recognition human-detection object-detection opencv opencv-python opencv2 opencv2-python python rtmp security video-processing yolo yolo-detection-framework yolov11 yolov8
Last synced: 01 May 2025
https://github.com/krassowski/gsea-api
Pandas API for multiple Gene Set Enrichment Analysis implementations in Python (GSEApy, cudaGSEA, GSEA)
bioinformatics cuda enrichment gene-set-enrichment gene-sets gsea pandas pathway-analysis python3 transcriptomics
Last synced: 13 Apr 2025
https://github.com/bruce-lee-ly/matrix_multiply
Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
coppersmith-winograd cpp11 cpu cublas cuda kahan matrix-multiply naive nvidia reordering shared-memory strassen tiling
Last synced: 13 Apr 2025
https://github.com/pinto0309/open3d-build
Provide Docker build sequences of Open3D for various environments.
cuda docker jetson jetson-nano open3d open3d-python pytorch tensorflow
Last synced: 06 May 2025
https://github.com/owensgroup/mvgpubtree
GPU B-Tree with support for versioning (snapshots).
b-tree concurrent cuda gpu snapshot versioning
Last synced: 15 Jun 2025
https://github.com/shivaraj-bh/ollama-flake
Run ollama natively - powered by Nix
cuda flakes nix ollama open-webui rocm services
Last synced: 01 May 2025
https://github.com/shadyboukhary/gpu-research-fft-openacc-cuda
Case studies constitute a modern interdisciplinary and valuable teaching practice which plays a critical and fundamental role in the development of new skills and the formation of new knowledge. This research studies the behavior and performance of two interdisciplinary and widely adopted scientific kernels, a Fast Fourier Transform and Matrix Multiplication. Both routines are implemented in the two current most popular many-core programming models CUDA and OpenACC. A Fast Fourier Transform (FFT) samples a signal over a period of time and divides it into its frequency components, computing the Discrete Fourier Transform (DFT) of a sequence. Unlike the traditional approach to computing a DFT, FFT algorithms reduce the complexity of the problem from O(n2) to O(nLog2n). Matrix multiplication is a cornerstone routine in Mathematics, Artificial Intelligence and Machine Learning. This research also shows that the nature of the problem plays a crucial role in determining what many-core model will provide the highest benefit in performance.
acceleration cuda fast-fourier-transform fft gpu-acceleration gpu-computing gpu-programming nvcc openacc parallel-computing pgi pgi-compiler radix-2
Last synced: 07 Aug 2025
https://github.com/jishanshaikh4/cuda-programs
CUDA Programs for Hadoop/CUDA Lab at MANIT, Bhopal
Last synced: 25 Apr 2025
https://github.com/romnn/microgpusim
Cycle-level, trace-driven, parallel GPU simulator for NVIDIA Pascal.
cuda cycle-level design-space-exploration gpgpu gpu nvbit nvidia performance-engineering rust simulation trace-driven
Last synced: 27 Jul 2025
https://github.com/cea-hpc/HARP
Small tool for profiling the performance of hardware-accelerated Rust code using OpenCL and CUDA
cuda gpgpu-computing hpc opencl rust
Last synced: 14 May 2025
https://github.com/thecomputekid/premake5-cuda
Premake5 module that enables CUDA development in Visual Studio using the native CUDA Toolkit integration.
cuda premake-module premake5 visual-studio
Last synced: 14 Apr 2025
https://github.com/pinto0309/jetson-tensorflow-pytorch-build
Provides an environment for compiling TensorFlow or PyTorch with CUDA for aarch64 on an x86 machine. This is for Jetson. If you build using an EC2 m6g.16xlarge (aarch64) instance, TensorFlow can be fully built in about 30 minutes. It can be used as a cross-compilation environment not only for TensorFlow and PyTorch, but also for various other packages and libraries.
cross-compile cuda docker jetson jetson-nano l4t pytorch tensorflow
Last synced: 07 May 2025
https://github.com/krk/cuda-webcam
Webcam Image Processing with CUDA using OpenCV
Last synced: 04 Apr 2025
https://github.com/mach3-software/mach3
The official repository for MaCh3
bayesian cuda data-analysis markov-chain-monte-carlo mathematics neutrino neutrino-oscillations physics statistics
Last synced: 14 May 2026
https://github.com/miladfa7/install-tensorflow-gpu-2.1.0-on-linux-ubuntu-18.04
Easily Install Tensorflow-GPU 2.1.0 on Linux Ubuntu 18.04 -Cuda 10 & Cudnn 7.6.5 | Download package dependencies with direct link
cuda cudnn install-tensorflow linux python tensoflow tensorflow-gpu ubuntu1804
Last synced: 15 Aug 2025
https://github.com/marshallward/optiflop
Optiflop measures the optimally achievable FLOPs for mathematical operations on various platforms.
avx avx2 avx512 cuda roofline vectorization x86
Last synced: 02 Apr 2026
https://github.com/nolmoonen/cuda-lbvh
CUDA implementation of a linear bounding volume hierarchy (LBVH).
Last synced: 29 Jul 2025
https://github.com/AnyDSL/traversal
AnyDSL traversal code
amdgpu bvh cuda gpu nvvm raytracing traversal
Last synced: 29 Jul 2025
https://github.com/ogrecave/ogre-gpgpu
GPGPU compute with Ogre using CUDA or OpenCL
cuda gpgpu-computing ogre3d opencl
Last synced: 25 Aug 2025
https://github.com/skurbee/ytarchiver
Download + Compress + Transcribe + Organize + Browse
4kdownloader batch-processing cuda faster-whisper ffmpeg matplotlib transcription vlc-media-player whisper whisper-ai youtube yt-dlp yt-dlp-gui yt-dlp-wrapper
Last synced: 06 Jun 2026
https://github.com/egecetin/libkaleidoscope
A library to create kaleidoscope effect on images with CUDA. You can build on all platforms using CMake
c cpp cuda image-filter image-filtering image-manipulation image-processing kaleidoscope python real-time real-time-processing video-filter video-filtering video-processing
Last synced: 14 Apr 2025
https://github.com/superlinear-ai/python-gpu
🐳 Python GPU adds a minimal install of CUDA and cuDNN on top of the official python:3.x-slim base image
cuda cudnn docker docker-image python
Last synced: 27 Apr 2025
https://github.com/sandialabs/p3a
Portably Performant Physical Algebra
amd-gpu avx512 cmake cpp cpp17 cpp17-library cuda gpgpu hip hpc hpc-tools nvidia-cuda sandia-national-laboratories scr-2619 simd snl-science-libs vector
Last synced: 02 May 2025
https://github.com/gianlucapaolocci/background-subtraction-on-gpu-with-cuda-and-opencv
In this code is provided a simple, efficient and fast method to calculate motion and backgroud dynamically using nVidia GPUs power
background-subtraction cuda image-processing nvidia opencv parallel-computing
Last synced: 07 May 2025
https://github.com/cms-patatrack/cluestering
Density-based clustering algorithm developed at CERN
alpaka cern clustering cpp cuda pybind11 python tbb
Last synced: 10 Apr 2025
https://github.com/l4nos/php-cuda
An extesnion for PHP allowing it to access GPU operations on CUDA graphics cards (NVIDIA)
cuda cuda-kernels cuda-php php php-dll php-ext php-extension
Last synced: 26 Aug 2025
https://github.com/theochem/cugbasis
High performance CUDA/Python library for computing quantum chemistry density-based descriptors for larger systems using GPUs.
atoms-in-molecules computational-chemistry conceptual-dft cuda electron-density gpu python qtaim quantum quantum-chemistry theoretical-chemistry
Last synced: 17 Jan 2026
https://github.com/brownbiomechanics/autoscoper
Autoscoper is a 2D-3D image registration software package.
autoscoper biomechanics cuda hpc-server medical-imaging radiography tracking
Last synced: 01 Apr 2025
https://github.com/bigsk1/podcast-ai
AI podcast summary from a youtube video using Anthropic or XAI and Elevenlabs voices
ai-podcast anthropic-claude claude-ai claude-api cuda cudnn elevenlabs elevenlabs-api faster-whisper ffpmeg podcast review-tools xai xai-api youtube yt-dlp
Last synced: 18 Sep 2025
https://github.com/3p3r/pf-localization
Localization using a Particle Filter (and random walk model)
cuda localization matlab particle-filter slam
Last synced: 02 Apr 2025
https://github.com/z3lx/waifu2x-tensorrt
TensorRT implementation of the waifu2x super-resolution model for faster image and video upscaling.
anime cpp cuda cudnn image-upscaling machine-learning neural-network nvidia super-resolution tensorrt upscaling video-upscaling waifu2x
Last synced: 17 Jan 2026
https://github.com/mvisat/mcc-cuda
Implementation of Minutia Cylinder-Code with CUDA for Fingeprint Matching
Last synced: 19 Apr 2025
https://github.com/aperim/docker-nvidia-cuda-ffmpeg
A docker container, with ffmpeg that supports scale_cuda among other things
cuda ffmpeg gpu hacktoberfest nvidia
Last synced: 25 Dec 2025
https://github.com/mortvest/hastl
HaSTL: A fast GPU implementation of STL decomposition with missing values and support for both CUDA and OpenCL
cuda forecasting gpu opencl time-series time-series-analysis
Last synced: 05 Mar 2025
https://github.com/cea-hpc/harp
Small tool for profiling the performance of hardware-accelerated Rust code using OpenCL and CUDA
cuda gpgpu-computing hpc opencl rust
Last synced: 13 May 2025
https://github.com/acfr/gpu-ray-surface-intersection-in-cuda
A GPU-based ray-surface intersection test implemented in CUDA
Last synced: 18 Feb 2026
https://github.com/yalue/cudabrot
A CUDA renderer for the Buddhabrot fractal
amd buddhabrot buddhabrot-fractal cuda gpu hip mandelbrot mandelbrot-fractal rocm
Last synced: 07 May 2025
https://github.com/bruce-lee-ly/cuda_back2back_hgemm
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
back2back-gemm back2back-hgemm cublas cuda fused-gemm fused-hgemm gemm gpu hgemm matrix-multiply nvidia tensor-core
Last synced: 13 Apr 2025
https://github.com/minnukota381/cuda-parallel-c-programming
This repository contains various CUDA C programs demonstrating parallel computing techniques using NVIDIA's CUDA platform.
cuda cuda-programming hpc nvcc nvidia
Last synced: 30 Jun 2025
https://github.com/selenecodes/GPU-jupyterhub
A basic jupyterhub with Nvidia GPU accessibility.
cuda cuda-toolkit docker-compose docker-container docker-deployment docker-volumes gpu-computing jupyter jupyter-notebook jupyterhub scipy tensorflow
Last synced: 04 May 2025
https://github.com/boyan-soubachov/excelerator
A Microsoft Excel calculation speed-up add in.
calculation-speed cuda excel formulae gpgpu microsoft
Last synced: 11 Apr 2025
https://github.com/shapelets/khiva-csharp
C# binding for Khiva library.
clustering csharp cuda data-series discords distances gpu khiva kshape matrix-profile multicore opencl shapelets snippets time-series timeseries
Last synced: 06 May 2025
https://github.com/tomaszrewak/rotatingvoxels
In this project I use C#, Alea GPU and OpenGL.Net to create a simple, hardware-accelerated, 3d animation of rotating cubes.
alea-gpu-library csharp cuda gpu opengl voxel
Last synced: 21 Apr 2025
https://github.com/feifeibear/pstensor
PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.
cuda deeplearning machinelearning pytorch tensorflow2
Last synced: 08 May 2026
https://github.com/balos1/shi_tomasi_feature_detection
CUDA, OpenMP, and regular serial C implementations of Shi Tomasi feature detection
cuda image-processing openmp shi-tomasi-detection
Last synced: 30 Aug 2025
https://github.com/yyaadet/aigc
An Web UI with intelligent prompts of AIGC. Include SDXL and AudioCraft
audiocraft bootstrap5 cuda django django-project image-generation jquery llm m1-mac python stable-diffusion stable-diffusion-webui text2image webapp webui
Last synced: 12 Aug 2025
https://github.com/qureshizawar/cuda-quartic-solver
A general cubic equation solver and quartic equation minimisation solver written for CPU and Nvidia GPUs, for more details and results, see: https://arxiv.org/abs/1903.10041. The library is available for C++/CUDA as well as Python using Pybind11.
cmake cubic-equations cuda cuda-quartic-solver gpu minimisation numpy nvidia-gpus openmp optimization pip pybind11 python quartic quartic-equations quartic-functions quartic-minimisation solver
Last synced: 18 Jul 2025
https://github.com/design4additive/gpucadforam
Design Software for Additive Manufacturing
3dcad 3dprinting additive-manufacturing computer-graphics cuda dear-imgui design-for-additive-manufacturing geometry gpu-computing implicit-modelling isosurface lattice-structures mesh simulation spatially-varying-lattice structural-analysis thermal-analysis topology-optimization vulkan
Last synced: 02 Oct 2025
https://github.com/yuehaowang/volume_renderer
CUDA-based interactive volume visualizer.
computer-graphics cuda imgui interactive-visualizations scientific-visualization volume-rendering
Last synced: 20 Oct 2025
https://github.com/okerew/neural-web
This repository shows an alternative neural network structure to modern ones, inspiring from the brain and it's creativity, workings.
alternative architecture biology c cpu cuda gpu innovative kernel machine-learning markdown metal neural neural-network neuron objc shader structure
Last synced: 27 Jul 2025
https://github.com/fluiddyn/fluidfft
:chart_with_upwards_trend: Common API (C++ and Python) for Fast Fourier Transform HPC libraries (publish-only mirror)
cuda cython fft fftw3-binding mpi pythran spectral-methods
Last synced: 27 Aug 2025
https://github.com/denzp/rust-ptx-support
Experiments with achieving better ergonomics in Rust CUDA workflow
Last synced: 25 Oct 2025
https://github.com/nikhilmukraj/spiking-neural-networks
Implementations of various simulations for integrate and fire models, as well as conductance based models with synaptic neurotransmission
biological-neural-networks biological-neurons computational-biology cuda hodgkin-huxley-neuron izhikevich-neurons neuroscience python rust
Last synced: 15 Jun 2025
https://github.com/wang-xinyu/cudcnv2
A fully cuda implementation of DCNv2(deformable convolution) forward. Without dependent of cuTorch(THC).
Last synced: 25 Mar 2025
https://github.com/mgepahmge/neuzephyr
A simple C++ deep learning framework
ai backpropagation cpp cuda deep-learning deep-learning-framework framework machine-learning optimization
Last synced: 11 Jul 2025
https://github.com/pvgupta24/graph-betweenness-centrality
Parallelizing Graph Betweenness Centrality with CUDA
betweenness-centrality cuda graphs
Last synced: 12 Apr 2025
https://github.com/abelcarreras/cuda_functions
Python functions to calculate the FFT and autocorrelation function using GPU (Cuda)
autocorrelation-functions complex cuda cuda-functions fft gpu power-spectrum pypi python-api
Last synced: 12 Apr 2025
https://github.com/stellar-group/blaze
Fork of the Blaze library for compatibility with Blaze CUDA · https://bitbucket.org/blaze-lib/blaze · https://github.com/STEllAR-GROUP/blaze_cuda
cpp cpp14 cuda hpc linear-algebra metaprogramming
Last synced: 30 Apr 2025
https://github.com/daschr/cuda_firewall
Implementing a Firewall using dpdk and CUDA
Last synced: 10 Apr 2025
https://github.com/noahgift/nuclear_powered_command_line_tools
Nuclear Powered Command-Line Tools
cuda jit machine-learning numba python
Last synced: 28 Oct 2025
https://github.com/enp1s0/cumpsgemm
Fast SGEMM emulation on Tensor Cores
cuda fp32 gemm gpu half-precision mixed-precision tensorcore tensorcores
Last synced: 09 Apr 2025
https://github.com/dusanerdeljan/tensor-math-library
Header only lazy evaluation tensor math library with multi-backend parallel eager execution support (TBB, OpenMP, Parallel STL and in the future CUDA and OpenCL)
cuda eager-execution lazy-evaluation matrix-library opencl openmp parallel-computing tbb tensor-library
Last synced: 28 Oct 2025
https://github.com/nezihesozen/bscproject
cellular-automata cuda gpu opengl simulation traffic-simulation
Last synced: 30 Apr 2025
https://github.com/sukunis/cunfft
Nonequispaced FFTs on GPUs (based on NFFT: http://www.nfft.org)
Last synced: 21 Aug 2025
https://github.com/kekeblom/mpm
A simple CUDA accelerated material point method simulation.
computer-graphics cpp cuda docker mpm opengl physically-based-simulation physics-simulation simulations
Last synced: 12 Apr 2025
https://github.com/shapelets/shapelets-compute
Shapelets Compute is an accelerated platform for time series analysis
cuda matrixprofile opencl time-series
Last synced: 06 May 2025
https://github.com/tgautam03/xfilters
GPU (CUDA) accelerated filters using 2D convolution for high resolution images.
2d-convolution c cpp cuda cuda-programming gpu-acceleration gpu-computing gpu-programming image-filters image-processing
Last synced: 10 Oct 2025
https://github.com/dkobylianskii/torch-lap-cuda
A fast CUDA implementation of the Linear Assignment Problem (LAP) solver for PyTorch.
Last synced: 05 May 2026
https://github.com/finmath/finmath-lib-cuda-extensions
Classes enabling finmath-lib to run its Monte-Carlo models on Cuda GPUs
Last synced: 05 May 2025
https://github.com/marklysze/llamaindex-rag-linux-cuda
Examples of RAG using Llamaindex with local LLMs in Linux - Gemma, Mixtral 8x7B, Llama 2, Mistral 7B, Orca 2, Phi-2, Neural 7B
cuda gemma gemma-2b gemma-7b linux llama-2 llamaindex microsoft-phi-2 mistral-7b mixtral mixtral-8x7b neural-7b neural-chat-7b orca-2 phi-2 retrieval-augemented-generation ubuntu yi-34b
Last synced: 23 Jun 2025
https://github.com/phrb/gpu-autotuning
Autotuning NVCC Compiler Parameters, published @ CCPE Journal
autotuning cuda nvcc opentuner
Last synced: 06 Jul 2025
https://github.com/alessandrobessi/cuda-lab
Playing with CUDA and GPUs in Google Colab
cuda cuda-kernels gpu gpu-acceleration gpu-programming parallel-algorithm parallel-computing
Last synced: 15 Apr 2025
https://github.com/pennylaneai/lightning-on-hpc
"Hybrid quantum programming with PennyLane Lightning on HPC platforms" accompanying data and workloads
cpp20 cuda gpu hpc mpi openmp python quantum quantum-computing rocm supercomputing
Last synced: 10 Jun 2025
https://github.com/veriblock/nodecore-pow-cuda-miner
VeriBlock CUDA PoW Miner
Last synced: 03 Mar 2026
https://github.com/ashishpatel26/tensorflow-installation-on-windows10-cuda-and-cudnn
TensorFlow installation on windows10 CUDA and cudnn
cuda cudatoolkit cudnn installation nvidia tensorflow tensorflow2 windows windows10
Last synced: 14 May 2025
https://github.com/professorcode1/event-analysis
Library for Event Synchronization and Event Coincidence Analysis
cuda cuda-kernels cuda-library cuda-programming event-analysis event-coincidence event-coincidence-analysis event-series event-series-analysis event-synchronization time-series-analysis
Last synced: 24 Oct 2025
https://github.com/openvoiceos/status
Open Voice OS Status Page
alerting cuda fasterwhisper mimic3 monitoring nvidia openvoiceos ovos piper sam speech-to-text stats status stt text-to-speech translator tts upptime uptime
Last synced: 17 Oct 2025
https://github.com/yhmtsai/ci_windows_cuda
This Repo creates the dockerfiles for using cuda in windows docker and provides the gitlab/github windows shared vm runner config.
continuous-integration cuda docker github-actions gitlab windows
Last synced: 14 Apr 2025
https://github.com/tank3-tk3/procesamiento-imagenes-cuda-opencv
Procesamiento de imágenes con CUDA y OpenCV
Last synced: 11 Jul 2025
https://github.com/dexter2206/ising
Ising: a Python package for exactly solving abritrary Ising model instances using exhaustive search.
Last synced: 02 Jul 2025
https://github.com/nglsg/uniapi
The Universal LLM Gateway - Integrate ANY AI Model with One Consistent API
ai ai-tools api-client api-integration api-wrapper chatbot cpp cross-platform cuda gpu-accelerated high-performance http-server inference-server language-model llm llm-integration openai-compatible rest-api universal-api
Last synced: 17 Jun 2025
https://github.com/erkaman/parle-cuda
A reference implementation of RLE in CUDA
c-plus-plus compression cuda data-compression demo gpgpu gpu parle rle run-length-encoding
Last synced: 17 Mar 2026
https://github.com/nssharmaofficial/kmeans-in-cuda
K-Means algorithm parallelized in CUDA
cpp cuda cuda-programming high-performance high-performance-computing k-means k-means-algorithm k-means-clustering parallel parallel-computing
Last synced: 27 Apr 2025