Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
CUDA
![](https://explore-feed.github.com/topics/cuda/cuda.png)
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2025-02-13 00:07:16 UTC
- JSON Representation
https://github.com/llm-db/understanding-gpu-architecture-implications-on-llm-serving-workloads
Understanding GPU Architecture Implications on LLM Serving Workloads (Master Thesis, ETH Zürich, 2024)
cuda inference pytorch rocm transformer
Last synced: 14 Dec 2024
https://github.com/cerit-sc/scipion-docker
Scipion (Cryo em image processing framework (https://scipion.i2pc.es/)) adapted to run in Kubernetes.
cryo-em cryoem cuda desktop kubernetes scipion vnc
Last synced: 06 Dec 2024
https://github.com/brainlesslabs/jalebi
C++ String algorithms for maximum performance
c-plus-plus cplusplus cpp cpp-library cpu cuda library parallel performance simd sse string string-matching vectorization
Last synced: 26 Jan 2025
https://github.com/bhavinpatel4199/image-processing-with-opencv-and-cuda-on-google-colab
This repository demonstrates image processing using OpenCV with CUDA for GPU acceleration on Google Colab. It includes basics like displaying and manipulating images, alongside advanced techniques using CUDA to enhance performance. Ideal for learning GPU-accelerated image processing in Python.
computer-vision cuda google-colab gpu-acceleration high-performance-computing image-processing opencv pixel-manupulation
Last synced: 12 Feb 2025
https://github.com/azdavis/parallel-portrait-mode
Parallel Portrait Mode
cuda image-processing ispc openmp
Last synced: 28 Jan 2025
https://github.com/kenwuqianghao/c4ai-cuda-birds
Homework assignments for C4AI Beginners in Research-Driven Studies
Last synced: 27 Dec 2024
https://github.com/wpjunior/cuda-numba-playground
Some uses of cuda with numba framework
Last synced: 13 Jan 2025
https://github.com/kentakoong/mtnlog
A simple multinode performance logger for Python
cuda lanta nvitop python slurm-cluster
Last synced: 22 Jan 2025
https://github.com/cscfi/csc-env-julia
Julia language environment including MPI.jl, CUDA.jl and AMDGPU.jl preferences for HPC clusters at CSC.
amdgpu ansible cuda hpc julia julia-language mpi
Last synced: 22 Jan 2025
https://github.com/bardiparsi/threadpoolmanager
ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.
cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe
Last synced: 19 Nov 2024
https://github.com/vladd12/libexecstd
Modern C++ library for using an execution context of computer devices
cpp cpp17 cuda gpu-acceleration gpu-computing
Last synced: 28 Jan 2025
https://github.com/ydkn/htw-progko-cuda
Parallel processing of image transformations. Part of the "Programmierkonzepte und Algorithmen" course at HTW-Berlin.
cuda image-transformations opencv
Last synced: 11 Jan 2025
https://github.com/usman619/pdc
Parallel and Distributed Computing
cuda distributed-computing distributed-systems nextcloud
Last synced: 13 Jan 2025
https://github.com/juntyr/necsim-rust-analysis
Analysis of the spatially explicit biodiversity simulation `necsim-rust`
analysis biodiversity cuda mpi necsim rust simulation
Last synced: 25 Jan 2025
https://github.com/sangioai/sph
CUDA and OpenMP versions of SPH (Smoothed Particle Hydrodynamics) serial algorithm.
Last synced: 12 Feb 2025
https://github.com/sahil-rajwar-2004/vector-cuda
vector calculation with GPU acceleration using CUDA
c cpp11 cuda cuda-kernels cuda-programming nvcc
Last synced: 19 Nov 2024
https://github.com/dirmeier/cuda-etudes
:notes: A collection of CUDA recipes
Last synced: 17 Jan 2025
https://github.com/mathiasotnes/gemm
General Matrix Multiplication (GEMM) optimization in Cuda.
Last synced: 31 Jan 2025
https://github.com/parxd/fasterdl
cuBLAS/CUDA tensor library with auto-diff support
cublas cuda cudnn deep-learning machine-learning
Last synced: 06 Jan 2025
https://github.com/lordofhyphens/gpu-path-delay-coverage
CUDA-based Path Delay Fault Coverage
Last synced: 28 Jan 2025
https://github.com/fabulani/360ip-with-cuda
360° Image Processing with CUDA and OpenCV.
360-image 360-video cpp cuda image-processing opencv
Last synced: 08 Feb 2025
https://github.com/imanghd/parallelprocessing
CE Algorithms Lab @ SUT
cuda openmp parallel-algorithm parallel-processing systolic
Last synced: 02 Feb 2025
https://github.com/lfrati/subpair
Fast pairwise cosine distance calculation and numba accelerated evolutionary matrix subset extraction 🍐🚀
Last synced: 16 Jan 2025
https://github.com/jpuigcerver/prob-phoc
Probabilistic relevance scores from PHOC embeddings
cuda keyword-spotting kws phoc pytorch
Last synced: 16 Jan 2025
https://github.com/m-torhan/advent-of-code
🎄 Solutions for the Advent of Code
advent-of-code advent-of-code-2024 cuda
Last synced: 20 Dec 2024
https://github.com/chrisdalvit/gpu-matrix-transpose
Implementation and benchmarking of different matrix transpose with CUDA
c cpp cuda cuda-kernels cuda-programming gpu-acceleration gpu-computing gpu-programming matrix-transpose nvidia-gpu
Last synced: 20 Dec 2024
https://github.com/airvzxf/c-plus-plus-understanding-cuda
Understanding CUDA with C++
cuda hacktoberfest hacktoberfest-accepted
Last synced: 27 Jan 2025
https://github.com/darshanakgr/meanfiltergpu
A gpu implementation of mean filter in CUDA
Last synced: 28 Jan 2025
https://github.com/rog0d/gpuss_watchers
"The GPU Watchers swore upon their shared memory hierarchy, from L1 to global memory, which also served as their mandate as lords of parallel computation."
cuda gpu-acceleration gpu-monitoring gpu-profiling
Last synced: 20 Dec 2024
https://github.com/programmergnome/cuda-codes
Snippet repository for learning parallel GPU programming with CUDA.
c cpp-programming cuda cuda-kernel gpu-programming learning-materials parallel-programming parallelization
Last synced: 22 Jan 2025
https://github.com/snandasena/courseera_gpu_specilization
Example for Cuda streaming
Last synced: 14 Jan 2025
https://github.com/flavienbwk/tensorflow2-cuda-10.2-docker
Tensorflow 2.3, CUDA 10.2, Docker compatible image
cuda docker python3 tensorflow ubuntu1804
Last synced: 28 Jan 2025
https://github.com/jaidevd/ipec-fdp
cuda hpc keras mapreduce numba spark tensorflow
Last synced: 01 Feb 2025
https://github.com/flavienbwk/nvidia-cuda-mirror-docker
An all-in-one mirror for installing NVIDIA Docker.
cuda docker linux-mirror mirror nvidia nvidia-docker nvidia-docker2 offline offline-capable
Last synced: 28 Jan 2025
https://github.com/boostibot/bachelors
My bachelors thesis at CTU in Prague, Faculty of Nuclear Sciences and Physical Engineering supervised by Ing. Pavel Strachota, Ph.D
crystal-growth cuda finite-volume-method parallel-programming phase-field-method
Last synced: 18 Jan 2025
https://github.com/sustia-llc/gpu_logger_poc
GPU execution verification system with immutable Kafka logging. Monitors CUDA operations, validates GPU performance, and maintains auditable operation history. Built with Rust and Candle for reliable ML model execution tracking.
candle-core cuda docker gpu gpu-computing kafka logging machine-learning mlops monitoring nvidia performance-testing rust
Last synced: 12 Feb 2025
https://github.com/grindelfp/cuda-n-body-simulation
Simulation of N-Body movement using CUDA.
Last synced: 12 Feb 2025
https://github.com/baonguyen6742/uv-install-torch
Tutorial to install torch/pytorch with cuda using uv
cuda install installation package python pytorch resolver torch torchaudio torchvision tutorial uv
Last synced: 12 Feb 2025
https://github.com/jonyandunh/stanforddogsresnet
A classifier for 120 dogs classified at Stanford Dogs Dataset, using the Pytorch framework and using custom Resnet for neural network learning
cuda deep-learning python pytorch resnet resnet-18 standford-dog stanford
Last synced: 14 Jan 2025
https://github.com/sydney-informatics-hub/computer-vision-fine-tuning
Fine tune a computer vision to solve your task locally, on HPC, in a container, or in the cloud!
computer-vision cuda deep-learning python
Last synced: 22 Jan 2025
https://github.com/kanchishimono/python-images
Ubuntu based Python container images, including CUDA images
container-image cuda docker dockerfile machine-learning python python3
Last synced: 26 Jan 2025
https://github.com/akhuntsaria/image-filters
Image filters implemented in CUDA C/C++
Last synced: 07 Jan 2025
https://github.com/mattjesc/federated-learning-simulation-1gpu-mi-is
Federated Learning Simulation on a Single GPU with Model Interpretability and Interactive Visualization
ai cuda deep-learning distributed-systems federated-learning gpu hpc keras machine-learning ml model-interpretability python pytorch simulation streamlit tensorflow
Last synced: 12 Oct 2024
https://github.com/dragonscypher/prompty
Tool for generating smart and secure prompts for language models!
autotokenizer bert-model cuda google-t5 llm python3 tensorflow threading
Last synced: 22 Jan 2025
https://github.com/raiszo/cs334
Journey thorugh Intro to Parallel Programming
Last synced: 25 Jan 2025
https://github.com/sarah627/horus_eye_fcih_graduation_project
An AI-powered tourism website using YOLOv7 for real-time landmark detection in images. Built with Flask, PyTorch, and Roboflow for seamless tourist interaction.
computer-vision cuda flask jupyter-notebook kaggle matplotlib object-detection opencv python pytorch roboflow
Last synced: 21 Jan 2025
https://github.com/awikramanayake/optimized-matrix-mult
Optimizing matrix multiplication using parallelism and SIMD (AVX2, CUDA)
avx2 cuda matrix-multiplication
Last synced: 21 Jan 2025
https://github.com/bd2720/accesspatterns
Comparing chunked vs. striped memory access patterns for CPU and GPU code using the CUDA toolkit in C.
c cache cuda cuda-toolkit performance-analysis performance-testing profiling
Last synced: 31 Jan 2025
https://github.com/branebb/nn-framework
Framework for creating neural networks using C++ and CUDA platform. This project is part of my final university assignment for bachelor's degree.
cmake cpp cuda cuda-programming
Last synced: 19 Nov 2024
https://github.com/mmz33/practice-cuda
c cpp cuda cuda-programming gpu-programming parallel-programming
Last synced: 22 Jan 2025
https://github.com/parlaynu/inference-tvm
Export ONNX to ApacheTVM and run inference in containerized environments.
apache-tvm cuda docker jetson-nano onnx raspberrypi4 x86-64
Last synced: 28 Jan 2025
https://github.com/fikri-rouzan/cuda-c-program-part-3
CUDA C program from NVIDIA course.
Last synced: 05 Feb 2025
https://github.com/fikri-rouzan/cuda-c-program-part-1
CUDA C program from NVIDIA course.
Last synced: 05 Feb 2025
https://github.com/fikri-rouzan/cuda-c-program-part-2
CUDA C program from NVIDIA course.
Last synced: 05 Feb 2025
https://github.com/thomasvonwu/interview-note
Share Interview Questions and Summarize Answers
Last synced: 05 Feb 2025
https://github.com/kts-o7/n-body-parallel-implementation
A simple study to compare the speed-up obtained by using different parallelization formats like MPI,OpenMP and CUDA for FFT implementation of n-body simulation
cuda mpi openmp parallel-computing pthreads
Last synced: 05 Feb 2025
https://github.com/f14-bertolotti/torchess
cuda torch extension for a chess engine
Last synced: 05 Feb 2025
https://github.com/pintamonas4575/rlgan-project-maadm-upm
Neuroevolution to learn the Lunar Lander from Gymnasium and a GAN to learn to color images. Subject from the ML and BD master´s degree of UPM.
cuda deep-learning gan genetic-algorithm lunar-lander machine-learning mlp python3 pytorch reinforcement-learning tensorflow
Last synced: 05 Feb 2025
https://github.com/rushirg/cuda-matrix-multiplication
Matrix Multiplication on GPGPU in CUDA
cpu cuda gpu parallel-processing
Last synced: 21 Jan 2025
https://github.com/ivanbgd/cuda_quad_c
Calculates a definite integral by using three different rules. Compares sequential to parallel implementations.
cuda integrals parallel-implementations
Last synced: 03 Feb 2025
https://github.com/karusb/2dca-cuda
2 Dimensional Cellular Automata Visualisation (Game of Life)
algorithm-flowchart cellular-automata cuda game game-of-life glut visual-studio
Last synced: 08 Jan 2025
https://github.com/rurumimic/candle
huggingface candle
cuda gpu huggingface nvidia transformer
Last synced: 27 Jan 2025
https://github.com/emilienmendes/gpgpu
Parallélisation et optimisation de reconnaissance de point dans une image
cuda gpgpu parallel-programming
Last synced: 27 Jan 2025
https://github.com/strigidie/cudar
The custom graphics pipeline based on NVIDIA CUDA ⚙️
Last synced: 27 Jan 2025
https://github.com/ribin-baby/cuda_cudnn_installation_on_ubuntu20.04
Installation of CUDA-11.8 with cuDNN-8.7 for ubuntu(20.04) server A30 GPU, and onnx gpu installation guide
cuda gpu linux onnxruntime server
Last synced: 16 Jan 2025
https://github.com/gladap/heterogeneous_computing_project
Heterogeneous parallel programming exercise using OpenMP and CUDA to parallelize image filters
cuda heterogeneous-parallel-programming
Last synced: 05 Feb 2025
https://github.com/sferez/sspp_sparse_matrix_cuda
Small Scale Parallel Programming, Sparse Matrix multiplication with CUDA
cpp cuda omp omp-parallel parallel-computing small-scale-parallel-programming sparse-matrix
Last synced: 13 Jan 2025
https://github.com/isquicha/cuda-parallel-studies
Learning CUDA programming here =D
cuda cuda-programming cuda-toolkit
Last synced: 22 Jan 2025
https://github.com/toshikinakamura0412/dotfiles_for_docker
My dotfiles for docker of some linux distribution
cuda docker docker-compose dotfiles git neovim ros-noetic tmux zsh
Last synced: 20 Nov 2024
https://github.com/vectorworksreal/sd-forge-docker
sd forge webui docker image.
ai-art artificial-intelligence containerization cuda docker docker-image forge image-to-image machine-learning sd-forge stable-diffusion stable-diffusion-webui text-to-image ubuntu webui
Last synced: 10 Feb 2025
https://github.com/versi379/optimized-matrix-multiplication
This project utilizes CUDA and cuBLAS to optimize matrix multiplication, achieving up to a 5x speedup on large matrices by leveraging GPU acceleration. It also improves memory efficiency and reduces data transfer times between CPU and GPU.
cublas cuda cuda-programming hpc matrix-multiplication parallel-computing parallel-programming
Last synced: 21 Jan 2025
https://github.com/neel-dandiwala/cuda-programs
Miscellaneous programs that grasp the concept of Parallel Computing
cuda gpu-programming parallel-programming
Last synced: 26 Dec 2024
https://github.com/xza85hrf/flux_pipeline
FluxPipeline is a prototype experimental project that provides a framework for working with the FLUX.1-schnell image generation model. This project is intended for educational and experimental purposes only.
ai cuda docker educational experimental flux1 flux1-schnell flux1ai gradio image-generation model non-commercial python pytorch research transformer-model
Last synced: 22 Dec 2024
https://github.com/juntyr/necsim-rust-docs
Documentation of the spatially explicit biodiversity simulation necsim-rust
biodiversity cuda docs mpi necsim rust simulation
Last synced: 03 Feb 2025
https://github.com/cs550-epfl/review
Review of the paper A Formal Analysis of the NVIDIA PTX Memory Consistency Model
cuda formal-verification gpu memory-consistency ptx simt
Last synced: 05 Feb 2025
https://github.com/skyguy126/cuda-learnings
Collection of personal CUDA learnings.
Last synced: 05 Feb 2025
https://github.com/tdavidcl/cu_intercept
cuda cuda-memory cuda-programming hook massif memory-tracking preload
Last synced: 05 Feb 2025
https://github.com/spatialgraphics/tardis
Travel space and time by using autodiff and codegen
Last synced: 05 Feb 2025
https://github.com/sbstndb/neural_k
A simple Neural Network library using Kokkos enabling CUDA or OpenMP backend
ai cuda kokkos library neural-network openmp
Last synced: 05 Feb 2025
https://github.com/phrutis/brainwords2
GPU brainflayer for sale $250
brain brainflayer brainwords cuda gpu key pass passphrase private
Last synced: 05 Feb 2025
https://github.com/alexkranias/triton_vs_cuda
Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.
cuda cuda-kernels gpu gpu-programming parallel-programming python triton
Last synced: 05 Feb 2025