Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
CUDA
![](https://explore-feed.github.com/topics/cuda/cuda.png)
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2025-02-13 00:07:16 UTC
- JSON Representation
https://github.com/sangioai/torchpace
PyTorch CUDA/C++ extension of PACE: Transformer non-linearlity accelerator engine.
Last synced: 02 Feb 2025
https://github.com/jmuwrobotics/libbicos
GPU-Accelerated Binary Correspondence Search for Multishot Stereo Vision
computer-vision cuda depth-map stereo-camera stereo-matching stereo-vision
Last synced: 30 Dec 2024
https://github.com/bjornmelin/edge-ai-engineering
📱 Optimized ML for edge devices. Showcasing efficient model deployment, GPU-CPU memory transfer optimization, and real-world edge AI applications. 🤖
cuda edge-computing embedded-systems gpu-optimization iot mobile-ml model-optimization python tflite
Last synced: 02 Feb 2025
https://github.com/iglee/jax-cuda-eicl-exp-docker
Docker for getting jax to work with cuda, for reproducing ml experiments like eicl. Sure, let's NOT make a compatibility matrix and let people fight for their lives on cuda
cuda docker jax jaxline ml-engineering ml-experiments tensorflow
Last synced: 05 Feb 2025
https://github.com/alexkranias/triton_vs_cuda
Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.
cuda cuda-kernels gpu gpu-programming parallel-programming python triton
Last synced: 05 Feb 2025
https://github.com/ne0nwinds/gpupuzzles
My solutions to srush/GPU-Puzzles using CUDA
Last synced: 02 Feb 2025
https://github.com/atelierarith/julia_gpu_playground
For those who want use Julia with GPU
cuda docker docker-compose julia
Last synced: 06 Feb 2025
https://github.com/ysl1016/cudadigitfilter
CUDA-based parallel image filtering system for MNIST dataset
computer-vision cuda deep-learning gpu-acceleration image-processing mnist parallel-computing
Last synced: 02 Feb 2025
https://github.com/bjornmelin/ai-system-design
🎨 Large-scale AI system architectures and implementations. Features distributed training systems, multi-GPU pipelines, and efficient resource management. 🏗️
architecture cuda distributed-systems engineering gpu-computing production scalability system-design
Last synced: 02 Feb 2025
https://github.com/sephiroth7712/k-nearest-neigbours
Implementation of K-Nearest Neighbors algorithm using multiple parallel computing approaches: CUDA (GPU), Hadoop, Spark, MPI, OpenMP, and PThreads. Demonstrates scalable machine learning across different parallel computing paradigms from GPU to distributed frameworks.
cuda cuda-programming hadoop-mapreduce java mpi multiprocessing multithreading openmp pthreads scala spark
Last synced: 06 Feb 2025
https://github.com/phrutis/brainwords2
GPU brainflayer for sale $250
brain brainflayer brainwords cuda gpu key pass passphrase private
Last synced: 05 Feb 2025
https://github.com/rzxmha/linear_algebra
Linear Algebra project from TripleTen
blas computational-science cuda data-science data-visualization eigenvectors gram-schmidt linear-transformations matrix-calculations numpy nvidia python symmetric-matrices typescript
Last synced: 02 Feb 2025
https://github.com/sbstndb/neural_k
A simple Neural Network library using Kokkos enabling CUDA or OpenMP backend
ai cuda kokkos library neural-network openmp
Last synced: 05 Feb 2025
https://github.com/spatialgraphics/tardis
Travel space and time by using autodiff and codegen
Last synced: 05 Feb 2025
https://github.com/belrbez/ship-graphic-qt-qml-cuda-c
Client-Server application for Rocket driving in QML graphics
c client-server cpp cuda qml qt5 rocket
Last synced: 06 Feb 2025
https://github.com/jiriklepl/bits-knn-jpdc2024
Replication package for the paper Towards Optimal GPU-accelerated K-Nearest Neighbors Search
bitonic-sort cuda gpu k-nearest-neighbors knn-search top-k
Last synced: 26 Jan 2025
https://github.com/wiktor2718/matrix_flow
Matrix Flow is a simple machine learning library written in Rust and CUDA. It was created as a portfolio project to deepen my understanding of machine learning, GPU programming, and Rust. It provides an API for matrix manipulation and includes specially optimized neural networks.
adam-optimizer benchmarking cuda deep-learning gpu-computing machine-learning matrix-operations neural-networks portfolio-project rust
Last synced: 26 Jan 2025
https://github.com/scar17off/ai-2048
A Python implementation of 2048 with a self-learning AI agent powered by TensorFlow. Features reinforcement learning, GPU acceleration, and real-time gameplay visualization.
2048 2048-ai 2048-game artificial-intelligence cuda deep-learning game-ai gpu-computing machine-learning neural-networks pygame python reinforcement-learning self-learning tensorflow
Last synced: 30 Dec 2024
https://github.com/tdavidcl/cu_intercept
cuda cuda-memory cuda-programming hook massif memory-tracking preload
Last synced: 05 Feb 2025
https://github.com/djenriquez/ccminer
Dockerized ccminer
cuda docker ethereum mining nvidia nvidia-docker
Last synced: 01 Feb 2025
https://github.com/jxtngx/cuda-lab
simple CUDA kernels and Python bindings
artificial-intelligence cpp cuda deep-learning machine-learning neural-networks python
Last synced: 26 Jan 2025
https://github.com/skyguy126/cuda-learnings
Collection of personal CUDA learnings.
Last synced: 05 Feb 2025
https://github.com/cs550-epfl/review
Review of the paper A Formal Analysis of the NVIDIA PTX Memory Consistency Model
cuda formal-verification gpu memory-consistency ptx simt
Last synced: 05 Feb 2025
https://github.com/amitkumarj441/deep-learning-on-your-finger
A rich collection of dockerfiles for installing deep learning dependecies on your way :rocket:
Last synced: 26 Jan 2025
https://github.com/xza85hrf/flux_pipeline
FluxPipeline is a prototype experimental project that provides a framework for working with the FLUX.1-schnell image generation model. This project is intended for educational and experimental purposes only.
ai cuda docker educational experimental flux1 flux1-schnell flux1ai gradio image-generation model non-commercial python pytorch research transformer-model
Last synced: 22 Dec 2024
https://github.com/macaycz/nn
A lightweight, GPU-accelerated machine learning library built with CUDA.
cuda deep-learning gpu machine-learning neural-network
Last synced: 13 Feb 2025
https://github.com/jeremywildsmith/shadowhash
Elixir distributed Shadow File password cracker with GPU accelerated cracking for md5crypt hashing algorithm.
cracking-hashes cuda distributed-systems elixir hashing nx security
Last synced: 13 Feb 2025
https://github.com/h1me01/cuda_neural_network
Cuda version of my previous AVX-512 based neural network.
chess cuda cuda-programming neural-network
Last synced: 07 Jan 2025
https://github.com/danieljvickers/fluid_simulation
An educational example for learning the Navier-Stoke equations. Also included is a C++ and CUDA shared object library, buildable with CMake, for use in your personal projects.
cpp cuda differential-equations navier-stokes numpy physics python simulation
Last synced: 30 Dec 2024
https://github.com/chibby0ne/cuda_by_example
Old notes (and new ones) of the Cuda by Example book
cuda cuda-programming gpgpu gpu-computing gpu-programming
Last synced: 31 Dec 2024
https://github.com/trentonom0r3/raft-analysis
Simple analysis script 'demotest.py' using RAFT optical flow to get flow vectors, occlusion masks, and Information on keyframes with significant motion changes
cuda flow-maps occlusion-masks opticalflow python pytorch raft
Last synced: 08 Feb 2025
https://github.com/versi379/optimized-matrix-multiplication
This project utilizes CUDA and cuBLAS to optimize matrix multiplication, achieving up to a 5x speedup on large matrices by leveraging GPU acceleration. It also improves memory efficiency and reduces data transfer times between CPU and GPU.
cublas cuda cuda-programming hpc matrix-multiplication parallel-computing parallel-programming
Last synced: 21 Jan 2025
https://github.com/popke523/rybki
A 3D shoal of fish animation using the boids algorithm, OpenGL for rendering and CUDA for parallel processing.
Last synced: 08 Feb 2025
https://github.com/zelosleone/audiobook-generator
A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.
ai-audio audiobook cuda gpu-acceleration machine-learning pdf-converter python pytorch speech-synthesis text-processing text-to-speech
Last synced: 03 Feb 2025
https://github.com/iebeid/cuda-particles
A simple visualization of particles calcualted using CUDA
Last synced: 12 Jan 2025
https://github.com/toshikinakamura0412/dotfiles_for_docker
My dotfiles for docker of some linux distribution
cuda docker docker-compose dotfiles git neovim ros-noetic tmux zsh
Last synced: 20 Nov 2024
https://github.com/isquicha/cuda-parallel-studies
Learning CUDA programming here =D
cuda cuda-programming cuda-toolkit
Last synced: 22 Jan 2025
https://github.com/gladap/heterogeneous_computing_project
Heterogeneous parallel programming exercise using OpenMP and CUDA to parallelize image filters
cuda heterogeneous-parallel-programming
Last synced: 05 Feb 2025
https://github.com/ribin-baby/cuda_cudnn_installation_on_ubuntu20.04
Installation of CUDA-11.8 with cuDNN-8.7 for ubuntu(20.04) server A30 GPU, and onnx gpu installation guide
cuda gpu linux onnxruntime server
Last synced: 16 Jan 2025
https://github.com/sedflix/cuda_pattern_matching
Getting words frequency using the concepts of pattern matching in CUDA
Last synced: 31 Dec 2024
https://github.com/vectorworksreal/sd-forge-docker
sd forge webui docker image.
ai-art artificial-intelligence containerization cuda docker docker-image forge image-to-image machine-learning sd-forge stable-diffusion stable-diffusion-webui text-to-image ubuntu webui
Last synced: 10 Feb 2025
https://github.com/patriciobcs/mini-aevol
Parallel implementation of a reduced version of the Aevol simulator
Last synced: 20 Jan 2025
https://github.com/pauloruszel/yolo11_face_detection
cuda nvcc nvidia-gpu pip python3 pytorch widerface-dataset yolo11
Last synced: 09 Feb 2025
https://github.com/sferez/sspp_sparse_matrix_cuda
Small Scale Parallel Programming, Sparse Matrix multiplication with CUDA
cpp cuda omp omp-parallel parallel-computing small-scale-parallel-programming sparse-matrix
Last synced: 13 Jan 2025
https://github.com/pintamonas4575/rlgan-project-maadm-upm
Neuroevolution to learn the Lunar Lander from Gymnasium and a GAN to learn to color images. Subject from the ML and BD master´s degree of UPM.
cuda deep-learning gan genetic-algorithm lunar-lander machine-learning mlp python3 pytorch reinforcement-learning tensorflow
Last synced: 05 Feb 2025
https://github.com/k-hengzhou/hphoto
一个基于AI的智能照片管理工具,支持人脸识别、相似人脸自动聚类和nsfw检测
cuda insightface nsfw nsfw-detection nudenet photos
Last synced: 09 Jan 2025
https://github.com/f14-bertolotti/torchess
cuda torch extension for a chess engine
Last synced: 05 Feb 2025
https://github.com/kts-o7/n-body-parallel-implementation
A simple study to compare the speed-up obtained by using different parallelization formats like MPI,OpenMP and CUDA for FFT implementation of n-body simulation
cuda mpi openmp parallel-computing pthreads
Last synced: 05 Feb 2025
https://github.com/thomasvonwu/interview-note
Share Interview Questions and Summarize Answers
Last synced: 05 Feb 2025
https://github.com/fikri-rouzan/cuda-c-program-part-2
CUDA C program from NVIDIA course.
Last synced: 05 Feb 2025
https://github.com/fikri-rouzan/cuda-c-program-part-1
CUDA C program from NVIDIA course.
Last synced: 05 Feb 2025
https://github.com/f-koehler/itesol
WIP: Iterative eigensolvers for C++20, Python and CUDA
cpp20 cuda eigenvalues linear-algebra python
Last synced: 28 Dec 2024
https://github.com/roryclear/cuda-ml
simple cuda optimized mnist classifier
colab-notebook cuda mnist-classification pycuda
Last synced: 21 Jan 2025
https://github.com/fikri-rouzan/cuda-c-program-part-3
CUDA C program from NVIDIA course.
Last synced: 05 Feb 2025
https://github.com/parlaynu/inference-tvm
Export ONNX to ApacheTVM and run inference in containerized environments.
apache-tvm cuda docker jetson-nano onnx raspberrypi4 x86-64
Last synced: 28 Jan 2025
https://github.com/lruizap/testcuda
Guide to install and use cuda for programming
Last synced: 02 Feb 2025
https://github.com/ionmich/cs149-local-dev
Provides `conda` installation instructions for Stanford's CS149 (Parallel Computing) programming assignments
conda cs149 cuda ispc parallel-computing
Last synced: 06 Feb 2025
https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-python
Explore how to use Numba—the just-in-time, type-specializing Python function compiler—to create and launch CUDA kernels to accelerate Python programs on massively parallel NVIDIA GPUs.
accelerated-computing cuda cuda-programming jit numba nvidia python
Last synced: 06 Feb 2025
https://github.com/mmz33/practice-cuda
c cpp cuda cuda-programming gpu-programming parallel-programming
Last synced: 22 Jan 2025
https://github.com/branebb/nn-framework
Framework for creating neural networks using C++ and CUDA platform. This project is part of my final university assignment for bachelor's degree.
cmake cpp cuda cuda-programming
Last synced: 19 Nov 2024
https://github.com/bd2720/accesspatterns
Comparing chunked vs. striped memory access patterns for CPU and GPU code using the CUDA toolkit in C.
c cache cuda cuda-toolkit performance-analysis performance-testing profiling
Last synced: 31 Jan 2025
https://github.com/mattjesc/federated-learning-simulation-1gpu-mi-is
Federated Learning Simulation on a Single GPU with Model Interpretability and Interactive Visualization
ai cuda deep-learning distributed-systems federated-learning gpu hpc keras machine-learning ml model-interpretability python pytorch simulation streamlit tensorflow
Last synced: 12 Oct 2024
https://github.com/raiszo/cs334
Journey thorugh Intro to Parallel Programming
Last synced: 25 Jan 2025
https://github.com/dragonscypher/prompty
Tool for generating smart and secure prompts for language models!
autotokenizer bert-model cuda google-t5 llm python3 tensorflow threading
Last synced: 22 Jan 2025
https://github.com/phantom7knight/cuda-fusion
This project is for learning CUDA to understand the GPU work better.
cuda cuda-programming gpgpu gpu
Last synced: 08 Feb 2025
https://github.com/prateekshukla1108/thunderkittens-docs
Documentation for ThunderKittens framework
Last synced: 24 Jan 2025
https://github.com/kanchishimono/python-images
Ubuntu based Python container images, including CUDA images
container-image cuda docker dockerfile machine-learning python python3
Last synced: 26 Jan 2025
https://github.com/vwkyc/detectron2-api
Detectron2 server API
api cpu-inference-api cuda detectron2 flask gunicorn self-hosted
Last synced: 05 Feb 2025
https://github.com/nvaranki/cmmx
CUDA matrix multiplication (official guide, modified)
Last synced: 10 Dec 2024
https://github.com/demetriantitus/machine-vision---yolov8
This project provides a comprehensive guide to object detection in cluttered environments using YOLOv8. It demonstrates how to identify and classify objects in both still images and video streams
computer-vision cuda dataset image-classification machine-learning nvidia-gpu object-detection surveillance traffic-monitoring video-analysis yolov8
Last synced: 05 Feb 2025
https://github.com/rkarahul/person-detector-faceverifier
Person-Detector-FaceVerifier is a sophisticated system for detecting and verifying faces in images. Ideal for applications like passport control and security, it combines advanced face detection with precise verification techniques.
bootstrap5 css3 cuda django html5 javascipt opencv-python os python pytorch yolov8
Last synced: 05 Feb 2025
https://github.com/dasbd72/nthu-ipc-2022
National Tsing Hua University - Introduction to Parallel Computing - 2022
cuda cuda-programming hpc mpi openmp pthreads
Last synced: 05 Feb 2025
https://github.com/sebp/vscode-sycl-dpcpp-cuda
Sample project to use the VS Code Remote - Containers extension to develop SYCL applications for NVIDIA GPUs using the oneAPI DPC++ compiler.
cuda dpcpp fedora gpu-computing podman sycl vscode
Last synced: 08 Feb 2025
https://github.com/sydney-informatics-hub/computer-vision-fine-tuning
Fine tune a computer vision to solve your task locally, on HPC, in a container, or in the cloud!
computer-vision cuda deep-learning python
Last synced: 22 Jan 2025
https://github.com/thalesmg/haskell-accelerate-parconc
Example and benchmark of Accelerate-HS from Parallel and Concurrent Programming in Haskell
accelerate cuda gpu-computing haskell parallel-computing
Last synced: 08 Feb 2025
https://github.com/thanduriel/cuda_hip_comparison
performance study of atomics on GPUs
Last synced: 05 Feb 2025
https://github.com/apostolis1/parallel-processing-systems
Project of the undergrad course "Parallel Processing Systems" - NTUA
benchmark c cuda mpi openmp parallel-computing
Last synced: 05 Feb 2025
https://github.com/jonyandunh/stanforddogsresnet
A classifier for 120 dogs classified at Stanford Dogs Dataset, using the Pytorch framework and using custom Resnet for neural network learning
cuda deep-learning python pytorch resnet resnet-18 standford-dog stanford
Last synced: 14 Jan 2025
https://github.com/anne-andresen/autoencoder_3d_c_cuda
3D Autoencoder training in raw C/CUDA
Last synced: 05 Feb 2025
https://github.com/shineiarakawa/cuda-cmake-minimal-template
A minimal CUDA C++ project template with CMake
cmake cuda dear-imgui opengl project-template stb-image
Last synced: 21 Jan 2025
https://github.com/baonguyen6742/uv-install-torch
Tutorial to install torch/pytorch with cuda using uv
cuda install installation package python pytorch resolver torch torchaudio torchvision tutorial uv
Last synced: 12 Feb 2025
https://github.com/grindelfp/cuda-n-body-simulation
Simulation of N-Body movement using CUDA.
Last synced: 12 Feb 2025
https://github.com/sustia-llc/gpu_logger_poc
GPU execution verification system with immutable Kafka logging. Monitors CUDA operations, validates GPU performance, and maintains auditable operation history. Built with Rust and Candle for reliable ML model execution tracking.
candle-core cuda docker gpu gpu-computing kafka logging machine-learning mlops monitoring nvidia performance-testing rust
Last synced: 12 Feb 2025
https://github.com/brocbyte/cuball
CUDA-based implementation of "Real-Time Rigid Body Simulation on GPUs" [from GPU Gems 3]
Last synced: 05 Jan 2025
https://github.com/boostibot/bachelors
My bachelors thesis at CTU in Prague, Faculty of Nuclear Sciences and Physical Engineering supervised by Ing. Pavel Strachota, Ph.D
crystal-growth cuda finite-volume-method parallel-programming phase-field-method
Last synced: 18 Jan 2025
https://github.com/sid911/neuralnetworkcpp
A small experiment to learn about neural networks and their runtimes in cpp
cpp cuda machine-learning neural-network
Last synced: 14 Jan 2025