Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2025-01-31 00:06:47 UTC
- JSON Representation
https://github.com/datarhei/ffmpeg
FFmpeg base image for datarhei/core.
alpine cuda docker ffmpeg mmal raspberry-pi vaapi
Last synced: 10 Nov 2024
https://github.com/jtschwar/tomo_tv
C++ library for Regularized 2D and 3D Tomography Reconstructions.
3d-reconstruction cuda inverse-problems regularization tomography
Last synced: 10 Nov 2024
https://github.com/itzmeanjan/blake3
SYCL accelerated BLAKE3 Hash Implementation
avx2 avx512 binary-merklization blake3 cpu cryptographic-hash-functions cuda dpcpp gpu gpu-computing merkle-tree sycl
Last synced: 09 Nov 2024
https://github.com/yashassamaga/convolutionbuildingblocks
GEMM and Winograd based convolutions using CUTLASS
convolution cuda cutlass deep-learning
Last synced: 03 Dec 2024
https://github.com/abhishekyana/cyclegans-pytorch
CycleGANs-PyTorch applied on Young to Old image converter.
cuda cyclegan faceapp gan python pytorch resnet tutorial-code young2old
Last synced: 18 Nov 2024
https://github.com/PINTO0309/Open3D-build
Provide Docker build sequences of Open3D for various environments.
cuda docker jetson jetson-nano open3d open3d-python pytorch tensorflow
Last synced: 27 Oct 2024
https://github.com/Bruce-Lee-LY/matrix_multiply
Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
coppersmith-winograd cpp11 cpu cublas cuda kahan matrix-multiply naive nvidia reordering shared-memory strassen tiling
Last synced: 19 Nov 2024
https://github.com/sparselinearalgebra/spbla
Sparse Boolean linear algebra for Nvidia Cuda, OpenCL and CPU computations
boolean-algebra cplusplus cuda graph-algorithms graphblas opencl python sparse-matrix suitesparse
Last synced: 12 Oct 2024
https://github.com/cartersusi/pacman_cuda
[AUR][Pacman] Current Cuda compatibility with Tensorflow and Torch on Arch Linux
arch arch-linux archlinux aur compatibility cuda guide installer linux pacman script tensorflow torch
Last synced: 10 Nov 2024
https://github.com/lnstadrum/fastaugment
A handy data augmentation toolkit for image classification put in a single efficient TensorFlow/PyTorch op.
augmentation-transformations brightness-correction cuda cutout data-augmentation gamma-correction gpu mixup perspective-distortions tensorflow-op
Last synced: 28 Oct 2024
https://github.com/fahimfba/cuda-wsl2-ubuntu
Install CUDA on Windows11 using WSL2
cuda cuda-programming cuda-support cuda-toolkit cuda-wsl deep-learning deep-reinforcement-learning deeplearning deeplearning-ai machine-learning machinelearning machinelearning-python wsl wsl-environment wsl-ubuntu wsl2
Last synced: 10 Dec 2024
https://github.com/dark-art108/gpu-docker-deployment-text-summarization
Text Summarization using Transformer on GPU Docker Deployment
cuda docker fastapi gpu-acceleration huggingface
Last synced: 12 Nov 2024
https://github.com/bruce-lee-ly/decoding_attention
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.
cuda cuda-core decoding-attention flash-attention flashinfer gpu inference large-language-model llm mha multi-head-attention nvidia
Last synced: 23 Oct 2024
https://github.com/cggos/hpc
High-Performance Computing: CPU Instructions, GPU OpenCL & CUDA, etc. :sunny:
cuda heterogeneous-parallel-programming multi-threading neon opencl openmp simd sse
Last synced: 28 Oct 2024
https://github.com/bfrg/vim-cuda-syntax
CUDA syntax highlighting for Vim
cuda highlighting syntax vim vim-syntax
Last synced: 30 Oct 2024
https://github.com/pinto0309/open3d-build
Provide Docker build sequences of Open3D for various environments.
cuda docker jetson jetson-nano open3d open3d-python pytorch tensorflow
Last synced: 23 Oct 2024
https://github.com/jishanshaikh4/cuda-programs
CUDA Programs for Hadoop/CUDA Lab at MANIT, Bhopal
Last synced: 10 Nov 2024
https://github.com/mberr/torch-max-mem
Decorators for maximizing memory utilization with PyTorch & CUDA
Last synced: 27 Oct 2024
https://github.com/ivangabriele/docker-cuda-desktop
Ubuntu PyTorch CUDA Docker image with KDE Plasma Desktop & VNC. Ideal for LLM & Deep Learning remote work.
cuda d-bus dbus deep-learning desktop docker gpu large-language-models llm nvidia python pytorch remote-desktop server ubuntu ubuntu-desktop vnc vnc-server x11
Last synced: 23 Oct 2024
https://github.com/pinto0309/jetson-tensorflow-pytorch-build
Provides an environment for compiling TensorFlow or PyTorch with CUDA for aarch64 on an x86 machine. This is for Jetson. If you build using an EC2 m6g.16xlarge (aarch64) instance, TensorFlow can be fully built in about 30 minutes. It can be used as a cross-compilation environment not only for TensorFlow and PyTorch, but also for various other packages and libraries.
cross-compile cuda docker jetson jetson-nano l4t pytorch tensorflow
Last synced: 23 Oct 2024
https://github.com/koushikphy/intro-to-cuda-fortran
A Complete beginner's introduction to programming with CUDA Fortran
cuda cuda-fortran cuda-kernels cuda-programming fortran fortran90 gpgpu gpu gpu-computing high-performance-computing hpc nvidia nvidia-cuda parallel-computing parallel-programming
Last synced: 11 Oct 2024
https://github.com/bruce-lee-ly/matrix_multiply
Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
coppersmith-winograd cpp11 cpu cublas cuda kahan matrix-multiply naive nvidia reordering shared-memory strassen tiling
Last synced: 15 Nov 2024
https://github.com/shivaraj-bh/ollama-flake
Run ollama natively - powered by Nix
cuda flakes nix ollama open-webui rocm services
Last synced: 12 Nov 2024
https://github.com/krk/cuda-webcam
Webcam Image Processing with CUDA using OpenCV
Last synced: 05 Nov 2024
https://github.com/nvidia/numbast
Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.
Last synced: 29 Oct 2024
https://github.com/pinto0309/pytorch-build
Provide Docker build sequences of PyTorch for various environments.
Last synced: 23 Oct 2024
https://github.com/kerneltuner/kernel_launcher
Using C++ magic to launch/capture CUDA kernels and tune them with Kernel Tuner
Last synced: 15 Nov 2024
https://github.com/tinybiggames/infero
An easy to use, high performant CUDA powered LLM inference library.
cuda llamacpp llm-inference win64 windows-10 windows-11
Last synced: 10 Oct 2024
https://github.com/sandialabs/p3a
Portably Performant Physical Algebra
amd-gpu avx512 cmake cpp cpp17 cpp17-library cuda gpgpu hip hpc hpc-tools nvidia-cuda sandia-national-laboratories scr-2619 simd snl-science-libs vector
Last synced: 12 Nov 2024
https://github.com/gianlucapaolocci/background-subtraction-on-gpu-with-cuda-and-opencv
In this code is provided a simple, efficient and fast method to calculate motion and backgroud dynamically using nVidia GPUs power
background-subtraction cuda image-processing nvidia opencv parallel-computing
Last synced: 23 Oct 2024
https://github.com/thecomputekid/premake5-cuda
Premake5 module that enables CUDA development in Visual Studio using the native CUDA Toolkit integration.
cuda premake-module premake5 visual-studio
Last synced: 07 Jan 2025
https://github.com/egecetin/libkaleidoscope
A library to create kaleidoscope effect on images with CUDA. You can build on all platforms using CMake
c cpp cuda image-filter image-filtering image-manipulation image-processing kaleidoscope python real-time real-time-processing video-filter video-filtering video-processing
Last synced: 15 Oct 2024
https://github.com/r-barnes/barnes2019-landscape
Landscape evolution models and graph processing on the GPU
Last synced: 28 Nov 2024
https://github.com/boyan-soubachov/excelerator
A Microsoft Excel calculation speed-up add in.
calculation-speed cuda excel formulae gpgpu microsoft
Last synced: 13 Oct 2024
https://github.com/selenecodes/GPU-jupyterhub
A basic jupyterhub with Nvidia GPU accessibility.
cuda cuda-toolkit docker-compose docker-container docker-deployment docker-volumes gpu-computing jupyter jupyter-notebook jupyterhub scipy tensorflow
Last synced: 13 Nov 2024
https://github.com/nezihesozen/bscproject
cellular-automata cuda gpu opengl simulation traffic-simulation
Last synced: 12 Nov 2024
https://github.com/aperim/docker-nvidia-cuda-ffmpeg
A docker container, with ffmpeg that supports scale_cuda among other things
cuda ffmpeg gpu hacktoberfest nvidia
Last synced: 08 Nov 2024
https://github.com/shapelets/khiva-csharp
C# binding for Khiva library.
clustering csharp cuda data-series discords distances gpu khiva kshape matrix-profile multicore opencl shapelets snippets time-series timeseries
Last synced: 13 Nov 2024
https://github.com/shapelets/shapelets-compute
Shapelets Compute is an accelerated platform for time series analysis
cuda matrixprofile opencl time-series
Last synced: 13 Nov 2024
https://github.com/idsia/automated-cl
Official repository for the paper "Automating Continual Learning"
continual-learning cuda fast-weight-programmers fast-weights few-shot-learning linear-transformers meta-learning pytorch self-referential-learning self-referential-weight-matrix transformers
Last synced: 11 Nov 2024
https://github.com/dusanerdeljan/tensor-math-library
Header only lazy evaluation tensor math library with multi-backend parallel eager execution support (TBB, OpenMP, Parallel STL and in the future CUDA and OpenCL)
cuda eager-execution lazy-evaluation matrix-library opencl openmp parallel-computing tbb tensor-library
Last synced: 11 Oct 2024
https://github.com/fluiddyn/fluidfft
:chart_with_upwards_trend: Common API (C++ and Python) for Fast Fourier Transform HPC libraries (publish-only mirror)
cuda cython fft fftw3-binding mpi pythran spectral-methods
Last synced: 01 Dec 2024
https://github.com/noahgift/nuclear_powered_command_line_tools
Nuclear Powered Command-Line Tools
cuda jit machine-learning numba python
Last synced: 11 Oct 2024
https://github.com/nolmoonen/cuda-lbvh
CUDA implementation of a linear bounding volume hierarchy (LBVH).
Last synced: 10 Dec 2024
https://github.com/denzp/rust-ptx-support
Experiments with achieving better ergonomics in Rust CUDA workflow
Last synced: 10 Oct 2024
https://github.com/stellar-group/blaze
Fork of the Blaze library for compatibility with Blaze CUDA · https://bitbucket.org/blaze-lib/blaze · https://github.com/STEllAR-GROUP/blaze_cuda
cpp cpp14 cuda hpc linear-algebra metaprogramming
Last synced: 12 Nov 2024
https://github.com/abelcarreras/cuda_functions
Python functions to calculate the FFT and autocorrelation function using GPU (Cuda)
autocorrelation-functions complex cuda cuda-functions fft gpu power-spectrum pypi python-api
Last synced: 07 Nov 2024
https://github.com/veriblock/nodecore-pow-cuda-miner
VeriBlock CUDA PoW Miner
Last synced: 23 Jan 2025
https://github.com/feifeibear/pstensor
PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.
cuda deeplearning machinelearning pytorch tensorflow2
Last synced: 23 Jan 2025
https://github.com/tomaszrewak/rotatingvoxels
In this project I use C#, Alea GPU and OpenGL.Net to create a simple, hardware-accelerated, 3d animation of rotating cubes.
alea-gpu-library csharp cuda gpu opengl voxel
Last synced: 09 Nov 2024
https://github.com/minnukota381/cuda-parallel-c-programming
This repository contains various CUDA C programs demonstrating parallel computing techniques using NVIDIA's CUDA platform.
cuda cuda-programming hpc nvcc nvidia
Last synced: 21 Nov 2024
https://github.com/3p3r/pf-localization
Localization using a Particle Filter (and random walk model)
cuda localization matlab particle-filter slam
Last synced: 03 Nov 2024
https://github.com/shadyboukhary/gpu-research-fft-openacc-cuda
Case studies constitute a modern interdisciplinary and valuable teaching practice which plays a critical and fundamental role in the development of new skills and the formation of new knowledge. This research studies the behavior and performance of two interdisciplinary and widely adopted scientific kernels, a Fast Fourier Transform and Matrix Multiplication. Both routines are implemented in the two current most popular many-core programming models CUDA and OpenACC. A Fast Fourier Transform (FFT) samples a signal over a period of time and divides it into its frequency components, computing the Discrete Fourier Transform (DFT) of a sequence. Unlike the traditional approach to computing a DFT, FFT algorithms reduce the complexity of the problem from O(n2) to O(nLog2n). Matrix multiplication is a cornerstone routine in Mathematics, Artificial Intelligence and Machine Learning. This research also shows that the nature of the problem plays a crucial role in determining what many-core model will provide the highest benefit in performance.
acceleration cuda fast-fourier-transform fft gpu-acceleration gpu-computing gpu-programming nvcc openacc parallel-computing pgi pgi-compiler radix-2
Last synced: 09 Nov 2024
https://github.com/mgepahmge/neuzephyr
A simple C++ deep learning framework
ai backpropagation cpp cuda deep-learning deep-learning-framework framework machine-learning optimization
Last synced: 21 Nov 2024
https://github.com/openvoiceos/status
Open Voice OS Status Page
alerting cuda fasterwhisper mimic3 monitoring nvidia openvoiceos ovos piper sam speech-to-text stats status stt text-to-speech translator tts upptime uptime
Last synced: 19 Nov 2024
https://github.com/yyaadet/aigc
An Web UI with intelligent prompts of AIGC. For example Stable Diffusion with Core ML on Apple Silicon M1/M2 and CUDA and CPU
bootstrap5 cuda django django-project image-generation jquery llm m1-mac python stable-diffusion stable-diffusion-webui text2image webapp webui
Last synced: 12 Jan 2025
https://github.com/pvgupta24/graph-betweenness-centrality
Parallelizing Graph Betweenness Centrality with CUDA
betweenness-centrality cuda graphs
Last synced: 06 Jan 2025
https://github.com/cea-hpc/HARP
Small tool for profiling the performance of hardware-accelerated Rust code using OpenCL and CUDA
cuda gpgpu-computing hpc opencl rust
Last synced: 19 Nov 2024
https://github.com/imsanjoykb/cuda-bootcamp
CUDA Programming Practices
computer-vision crypto-mining crypto-mining-program cuda cuda-api cuda-development cuda-device cuda-driver cuda-kernels cuda-library cuda-opengl cuda-programming cuda-resource cuda-support cuda-toolkit jetson jetson-inference jetson-xavier nvidia-cuda nvidia-jetson-nano
Last synced: 12 Oct 2024
https://github.com/ogrecave/ogre-gpgpu
GPGPU compute with Ogre using CUDA or OpenCL
cuda gpgpu-computing ogre3d opencl
Last synced: 05 Nov 2024
https://github.com/cea-hpc/harp
Small tool for profiling the performance of hardware-accelerated Rust code using OpenCL and CUDA
cuda gpgpu-computing hpc opencl rust
Last synced: 14 Dec 2024
https://github.com/alessandrobessi/cuda-lab
Playing with CUDA and GPUs in Google Colab
cuda cuda-kernels gpu gpu-acceleration gpu-programming parallel-algorithm parallel-computing
Last synced: 16 Oct 2024
https://github.com/qureshizawar/cuda-quartic-solver
A general cubic equation solver and quartic equation minimisation solver written for CPU and Nvidia GPUs, for more details and results, see: https://arxiv.org/abs/1903.10041. The library is available for C++/CUDA as well as Python using Pybind11.
cmake cubic-equations cuda cuda-quartic-solver gpu minimisation numpy nvidia-gpus openmp optimization pip pybind11 python quartic quartic-equations quartic-functions quartic-minimisation solver
Last synced: 11 Oct 2024
https://github.com/ashishpatel26/tensorflow-installation-on-windows10-cuda-and-cudnn
TensorFlow installation on windows10 CUDA and cudnn
cuda cudatoolkit cudnn installation nvidia tensorflow tensorflow2 windows windows10
Last synced: 19 Nov 2024
https://github.com/sukunis/cunfft
Nonequispaced FFTs on GPUs (based on NFFT: http://www.nfft.org)
Last synced: 03 Dec 2024
https://github.com/phrb/gpu-autotuning
Autotuning NVCC Compiler Parameters, published @ CCPE Journal
autotuning cuda nvcc opentuner
Last synced: 19 Oct 2024
https://github.com/tgymnich/shallowwater.jl
🌊 Simple Finite Volumes models that solve the shallow water equations
cuda hpc julia shallow-water-equations simulation tsunami
Last synced: 25 Oct 2024
https://github.com/insightsoftwareconsortium/itkvkfftbackend
VkFFT backends for ITK FFT classes.
cpp cuda fft hip insight-toolkit itk itk-module opencl python vulkan
Last synced: 27 Nov 2024
https://github.com/yottaawesome/cuda-by-example
Source code contained in CUDA By Example: An Introduction to General Purpose GPU Programming
Last synced: 13 Nov 2024
https://github.com/pkestene/incremental-fluids-kokkos
Simple, single-file fluid solvers for learning purposes revisited with parallel programing (Kokkos: OpenMP / Cuda)
cfd cuda kokkos openmp parallel-programming
Last synced: 18 Dec 2024
https://github.com/yuhui-zh15/gpu-smart
Interactive Automatic GPU Manager
cuda cudnn deep-learning gpu manager neural-network nvidia pytorch tensorflow
Last synced: 08 Nov 2024
https://github.com/prg-titech/kani-cuda
A program synthesizer for CUDA like GPGPU language
Last synced: 18 Nov 2024
https://github.com/finmath/finmath-lib-cuda-extensions
Classes enabling finmath-lib to run its Monte-Carlo models on Cuda GPUs
Last synced: 23 Oct 2024
https://github.com/ema2159/equirectangular-cubemaptransform
OpenCV with CUDA and OpenMP implementations for transforming equirectangular images to cube maps and vice versa
cubemap-to-equirectangular cuda equirectangular-to-cubemap opencv openmp
Last synced: 16 Nov 2024
https://github.com/evilfreelancer/docker-whisper-server
whisper.cpp HTTP transcription server with OpenAI-like API in Docker
api api-server asr cuda docker docker-compose dockerfile nvidia openai openai-api whisper whisper-cpp
Last synced: 09 Oct 2024
https://github.com/dexter2206/ising
Ising: a Python package for exactly solving abritrary Ising model instances using exhaustive search.
Last synced: 23 Oct 2024
https://github.com/previsionio/damavand
Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.
cuda distributed-computing hpc multi-gpu multithreading quantum-computing rust simulator
Last synced: 28 Nov 2024
https://github.com/balos1/shi_tomasi_feature_detection
CUDA, OpenMP, and regular serial C implementations of Shi Tomasi feature detection
cuda image-processing openmp shi-tomasi-detection
Last synced: 29 Oct 2024
https://github.com/thelolagemann/docker-gminer
a docker container for running gminer
cryptocurrency cryptocurrency-mining cryptomining cuckoocycle cuda docker dual-mining equihash ethash ethereum ethereum-miner ethereum-mining gminer kawpow miner mining nvidia progpow ton
Last synced: 23 Oct 2024
https://github.com/bobbui/tensorflow-serving-cuda-docker
Docker image for tensorflow serving with Nvidia CUDA, CuDNN
cuda cudnn docker docker-image tensorflow tensorflow-serving ubuntu1604
Last synced: 25 Jan 2025
https://github.com/NAGAGroup/Scalix
Scalix is a data parallel compute library that automatically scales to the available compute resources.
Last synced: 02 Nov 2024
https://github.com/ktaletsk/gpu_dsm
🔗Accessible quantitative polymer rheology predictions with slip-links on GPU
c-plus-plus cuda gpu polymer rheology
Last synced: 31 Dec 2024
https://github.com/yalue/cudabrot
A CUDA renderer for the Buddhabrot fractal
amd buddhabrot buddhabrot-fractal cuda gpu hip mandelbrot mandelbrot-fractal rocm
Last synced: 23 Oct 2024
https://github.com/joaomlneto/cpds-heat
Heat Equation using different solvers (Jacobi, Red-Black, Gaussian) in C using different paradigms (sequential, OpenMP, MPI, CUDA) - Assignments for the Concurrent, Parallel and Distributed Systems course @ UPC 2013
cuda cuda-support gauss-seidel gaussian heat-equation jacobi mpi mpi-applications openmp openmp-applications openmp-parallelization openmp-support openmpi paradigms performance red-black solvers
Last synced: 09 Nov 2024
https://github.com/yhmtsai/ci_windows_cuda
This Repo creates the dockerfiles for using cuda in windows docker and provides the gitlab/github windows shared vm runner config.
continuous-integration cuda docker github-actions gitlab windows
Last synced: 27 Nov 2024
https://github.com/enp1s0/shgemm
Fast multiplication of single-precision and half-precision matrices on Tensor Cores
Last synced: 26 Dec 2024
https://github.com/pkestene/cuda-proj-tmpl
A minimal cmake based project skeleton for developping a CUDA application
cea cmake cuda gpu gpu-computing parallel-computing parallel-programming template
Last synced: 18 Dec 2024
https://github.com/qengineering/tensorflow-addons-jetson-nano
TensorFlow Addons installation wheels for Jetson Nano
aarch64 cuda cudnn installation-wheel jetson-nano linux python3 tensorflow-addons wheel
Last synced: 27 Nov 2024
https://github.com/ghost---shadow/near-duplicate-image-detector
CUDA implementation of some perceptual hashing algorithms
Last synced: 11 Oct 2024
https://github.com/101001000/tfg-pathtracer
CUDA Path tracing render engine, with MIS and the Disney BRDF
cuda pathtracing raytracing renderer
Last synced: 14 Nov 2024