Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2025-02-12 00:07:06 UTC
- JSON Representation
https://github.com/kobinarth-panchalingam/parallel-and-concurrent-programming
Semester - 7 | CS4533 - Parallel and Concurrent Programming | Labs
c concurrent-programming cuda java openmp pthreads
Last synced: 08 Jan 2025
https://github.com/starlitdreams/lunar-landing
This project implements a DQN agent using PyTorch to solve the LunarLander-v2 environment from OpenAI Gym. The agent learns to control the lunar lander using experience replay and a target network, aiming to maximize rewards by landing smoothly. Uses CUDA for computation.
artificial-intelligence cuda deep-learning gymnasium neural-network neural-networks numpy nvidia-gpu python python3 torch
Last synced: 05 Feb 2025
https://github.com/seieric/pytorch-mpi-singularity
Singularity Container including PyTorch with CUDA and mpi backend for DistributedDataParallel
cuda hpc nvidia openmpi pytorch singularity utokyo
Last synced: 05 Feb 2025
https://github.com/cooliron2311/cumd5bf
CUDA based md5 password bruteforcer
Last synced: 05 Feb 2025
https://github.com/iglee/jax-cuda-eicl-exp-docker
Docker for getting jax to work with cuda, for reproducing ml experiments like eicl. Sure, let's NOT make a compatibility matrix and let people fight for their lives on cuda
cuda docker jax jaxline ml-engineering ml-experiments tensorflow
Last synced: 05 Feb 2025
https://github.com/cuda8/brainwords2
GPU brainflayer for sale $250
brain brainflayer brainwords cuda gpu key pass passphrase private
Last synced: 23 Oct 2024
https://github.com/alexkranias/triton_vs_cuda
Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.
cuda cuda-kernels gpu gpu-programming parallel-programming python triton
Last synced: 05 Feb 2025
https://github.com/rajshrestha86/kmeans-clusterize-cuda
Implementation of K-Means algorithm from scratch using CUDA.
Last synced: 07 Feb 2025
https://github.com/deepschneider/tinygrad-universal
Universal version of Tinygrad with CUDA and OpenCL support
autograd automatic-differentiation cuda pycuda pyopencl tinygrad tinygrad-cuda
Last synced: 16 Jan 2025
https://github.com/jamesnulliu/learning-programming-massively-parallel-processors
Leaning notes of Programming Massively Parallel Processors, 4-th edition.
Last synced: 02 Feb 2025
https://github.com/sangioai/torchpace
PyTorch CUDA/C++ extension of PACE: Transformer non-linearlity accelerator engine.
Last synced: 02 Feb 2025
https://github.com/bhavinpatel4199/image-processing-with-opencv-and-cuda-on-google-colab
This repository demonstrates image processing using OpenCV with CUDA for GPU acceleration on Google Colab. It includes basics like displaying and manipulating images, alongside advanced techniques using CUDA to enhance performance. Ideal for learning GPU-accelerated image processing in Python.
computer-vision cuda google-colab gpu-acceleration high-performance-computing image-processing opencv pixel-manupulation
Last synced: 12 Feb 2025
https://github.com/bjornmelin/edge-ai-engineering
📱 Optimized ML for edge devices. Showcasing efficient model deployment, GPU-CPU memory transfer optimization, and real-world edge AI applications. 🤖
cuda edge-computing embedded-systems gpu-optimization iot mobile-ml model-optimization python tflite
Last synced: 02 Feb 2025
https://github.com/phrutis/brainwords2
GPU brainflayer for sale $250
brain brainflayer brainwords cuda gpu key pass passphrase private
Last synced: 05 Feb 2025
https://github.com/ne0nwinds/gpupuzzles
My solutions to srush/GPU-Puzzles using CUDA
Last synced: 02 Feb 2025
https://github.com/atelierarith/julia_gpu_playground
For those who want use Julia with GPU
cuda docker docker-compose julia
Last synced: 06 Feb 2025
https://github.com/ysl1016/cudadigitfilter
CUDA-based parallel image filtering system for MNIST dataset
computer-vision cuda deep-learning gpu-acceleration image-processing mnist parallel-computing
Last synced: 02 Feb 2025
https://github.com/sephiroth7712/k-nearest-neigbours
Implementation of K-Nearest Neighbors algorithm using multiple parallel computing approaches: CUDA (GPU), Hadoop, Spark, MPI, OpenMP, and PThreads. Demonstrates scalable machine learning across different parallel computing paradigms from GPU to distributed frameworks.
cuda cuda-programming hadoop-mapreduce java mpi multiprocessing multithreading openmp pthreads scala spark
Last synced: 06 Feb 2025
https://github.com/ypatel2022/gpu-accelerated-game-of-life
Accelerating Game of Life Compute with CUDA.
Last synced: 28 Dec 2024
https://github.com/bjornmelin/ai-system-design
🎨 Large-scale AI system architectures and implementations. Features distributed training systems, multi-GPU pipelines, and efficient resource management. 🏗️
architecture cuda distributed-systems engineering gpu-computing production scalability system-design
Last synced: 02 Feb 2025
https://github.com/belrbez/ship-graphic-qt-qml-cuda-c
Client-Server application for Rocket driving in QML graphics
c client-server cpp cuda qml qt5 rocket
Last synced: 06 Feb 2025
https://github.com/rzxmha/linear_algebra
Linear Algebra project from TripleTen
blas computational-science cuda data-science data-visualization eigenvectors gram-schmidt linear-transformations matrix-calculations numpy nvidia python symmetric-matrices typescript
Last synced: 02 Feb 2025
https://github.com/srivanijayanthi/pytorch-onnx-tensorrt-conversion
This repository provides a step-by-step guide to converting a PyTorch model to the ONNX format and subsequently to TensorRT for optimized inference.
Last synced: 24 Jan 2025
https://github.com/jiriklepl/bits-knn-jpdc2024
Replication package for the paper Towards Optimal GPU-accelerated K-Nearest Neighbors Search
bitonic-sort cuda gpu k-nearest-neighbors knn-search top-k
Last synced: 26 Jan 2025
https://github.com/bjornmelin/tensorflow-evolution
🧠 Progressive journey through TensorFlow, from basics to advanced architectures. Featuring custom training pipelines, optimized GPU implementations, and production-ready models. Includes CUDA optimizations for large-scale training. 🚀
cuda deep-learning gpu-optimization machine-learning ml-engineering neural-networks python tensorflow
Last synced: 24 Jan 2025
https://github.com/wiktor2718/matrix_flow
Matrix Flow is a simple machine learning library written in Rust and CUDA. It was created as a portfolio project to deepen my understanding of machine learning, GPU programming, and Rust. It provides an API for matrix manipulation and includes specially optimized neural networks.
adam-optimizer benchmarking cuda deep-learning gpu-computing machine-learning matrix-operations neural-networks portfolio-project rust
Last synced: 26 Jan 2025
https://github.com/sbstndb/neural_k
A simple Neural Network library using Kokkos enabling CUDA or OpenMP backend
ai cuda kokkos library neural-network openmp
Last synced: 05 Feb 2025
https://github.com/timdev-r/cv-ground-truth-extraction
(Dump) Helper for ground truth extraction, movement analytics and silhouette visual demonstration
computer-vision cuda ground-truth intel-realsense pandas python
Last synced: 21 Jan 2025
https://github.com/bjornmelin/cuda-core-projects
🎯 Essential CUDA programming patterns and optimizations. Showcasing parallel computing expertise through matrix operations, memory management, and advanced kernel implementations. 💻
cpp cuda cuda-kernels gpu-computing high-performance-computing nvidia optimization parallel-computing
Last synced: 24 Jan 2025
https://github.com/abhiram-kandiyana/cuda-blast-2024
Reimplementation of NCBI BLAST with CUDA backend for faster retrieval
blast cuda gpu-acceleration parallel-processing
Last synced: 21 Jan 2025
https://github.com/spatialgraphics/tardis
Travel space and time by using autodiff and codegen
Last synced: 05 Feb 2025
https://github.com/obitech/tuc-ki-gpu-docker
cuda docker machine-learning nvidia-docker nvidia-gpu tensorflow tuc
Last synced: 30 Dec 2024
https://github.com/jxtngx/cuda-lab
simple CUDA kernels and Python bindings
artificial-intelligence cpp cuda deep-learning machine-learning neural-networks python
Last synced: 26 Jan 2025
https://github.com/chibby0ne/cuda_by_example
Old notes (and new ones) of the Cuda by Example book
cuda cuda-programming gpgpu gpu-computing gpu-programming
Last synced: 31 Dec 2024
https://github.com/zelosleone/audiobook-generator
A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.
ai-audio audiobook cuda gpu-acceleration machine-learning pdf-converter python pytorch speech-synthesis text-processing text-to-speech
Last synced: 03 Feb 2025
https://github.com/nourmorsy/convolution-neural-network-cuda
Code for optimization to CNN using CUDA
Last synced: 13 Jan 2025
https://github.com/shreya888/learning-cuda-with-cpp-and-pytorch
My notes, code, & insights will be recorded here while learning CUDA with C++ and PyTorch
Last synced: 30 Dec 2024
https://github.com/h1me01/cuda_neural_network
Cuda version of my previous AVX-512 based neural network.
chess cuda cuda-programming neural-network
Last synced: 07 Jan 2025
https://github.com/tdavidcl/cu_intercept
cuda cuda-memory cuda-programming hook massif memory-tracking preload
Last synced: 05 Feb 2025
https://github.com/sedflix/cuda_pattern_matching
Getting words frequency using the concepts of pattern matching in CUDA
Last synced: 31 Dec 2024
https://github.com/lord-turmoil/cudacmakedemo
A demo for building CUDA program with CMake
Last synced: 23 Jan 2025
https://github.com/michaelfranzl/image_fah-client
Dockerfile for Folding@home client with AMD and Nvidia GPGPU support
container cuda debian docker foldingathome gpu-computing opencl
Last synced: 21 Jan 2025
https://github.com/k-hengzhou/hphoto
一个基于AI的智能照片管理工具,支持人脸识别、相似人脸自动聚类和nsfw检测
cuda insightface nsfw nsfw-detection nudenet photos
Last synced: 09 Jan 2025
https://github.com/jmuwrobotics/libbicos
GPU-Accelerated Binary Correspondence Search for Multishot Stereo Vision
computer-vision cuda depth-map stereo-camera stereo-matching stereo-vision
Last synced: 30 Dec 2024
https://github.com/scar17off/ai-2048
A Python implementation of 2048 with a self-learning AI agent powered by TensorFlow. Features reinforcement learning, GPU acceleration, and real-time gameplay visualization.
2048 2048-ai 2048-game artificial-intelligence cuda deep-learning game-ai gpu-computing machine-learning neural-networks pygame python reinforcement-learning self-learning tensorflow
Last synced: 30 Dec 2024
https://github.com/bardifarsi/threadpoolmanager
ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.
cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe
Last synced: 29 Dec 2024
https://github.com/danieljvickers/fluid_simulation
An educational example for learning the Navier-Stoke equations. Also included is a C++ and CUDA shared object library, buildable with CMake, for use in your personal projects.
cpp cuda differential-equations navier-stokes numpy physics python simulation
Last synced: 30 Dec 2024
https://github.com/skyguy126/cuda-learnings
Collection of personal CUDA learnings.
Last synced: 05 Feb 2025
https://github.com/occisor2/fluidsimulation
Second project of my parallel algorithms course
cuda high-performance-computing
Last synced: 11 Jan 2025
https://github.com/f-koehler/itesol
WIP: Iterative eigensolvers for C++20, Python and CUDA
cpp20 cuda eigenvalues linear-algebra python
Last synced: 28 Dec 2024
https://github.com/cs550-epfl/review
Review of the paper A Formal Analysis of the NVIDIA PTX Memory Consistency Model
cuda formal-verification gpu memory-consistency ptx simt
Last synced: 05 Feb 2025
https://github.com/ionmich/cs149-local-dev
Provides `conda` installation instructions for Stanford's CS149 (Parallel Computing) programming assignments
conda cs149 cuda ispc parallel-computing
Last synced: 06 Feb 2025
https://github.com/leo27945875/pybind11_cuda_matmul
cpp cuda matrix-multiplication pybind11 python3
Last synced: 23 Jan 2025
https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-python
Explore how to use Numba—the just-in-time, type-specializing Python function compiler—to create and launch CUDA kernels to accelerate Python programs on massively parallel NVIDIA GPUs.
accelerated-computing cuda cuda-programming jit numba nvidia python
Last synced: 06 Feb 2025
https://github.com/lruizap/testcuda
Guide to install and use cuda for programming
Last synced: 02 Feb 2025
https://github.com/sid911/neuralnetworkcpp
A small experiment to learn about neural networks and their runtimes in cpp
cpp cuda machine-learning neural-network
Last synced: 14 Jan 2025
https://github.com/djenriquez/ccminer
Dockerized ccminer
cuda docker ethereum mining nvidia nvidia-docker
Last synced: 01 Feb 2025
https://github.com/roryclear/cuda-ml
simple cuda optimized mnist classifier
colab-notebook cuda mnist-classification pycuda
Last synced: 21 Jan 2025
https://github.com/airvzxf/c-plus-plus-understanding-cuda
Understanding CUDA with C++
cuda hacktoberfest hacktoberfest-accepted
Last synced: 27 Jan 2025
https://github.com/shineiarakawa/particle-stabilizer
A C++ and CUDA-based program for simulating the motion of particles.
Last synced: 13 Jan 2025
https://github.com/mathiasotnes/gemm
General Matrix Multiplication (GEMM) optimization in Cuda.
Last synced: 31 Jan 2025
https://github.com/hnthap/vietnamese-word-segment
Vietnamese word segmentation package.
cuda torch transformers vietnamese vietnamese-nlp vietnamese-tokenizer word-segmentation
Last synced: 21 Jan 2025
https://github.com/trentonom0r3/raft-analysis
Simple analysis script 'demotest.py' using RAFT optical flow to get flow vectors, occlusion masks, and Information on keyframes with significant motion changes
cuda flow-maps occlusion-masks opticalflow python pytorch raft
Last synced: 08 Feb 2025
https://github.com/xza85hrf/flux_pipeline
FluxPipeline is a prototype experimental project that provides a framework for working with the FLUX.1-schnell image generation model. This project is intended for educational and experimental purposes only.
ai cuda docker educational experimental flux1 flux1-schnell flux1ai gradio image-generation model non-commercial python pytorch research transformer-model
Last synced: 22 Dec 2024
https://github.com/vwkyc/detectron2-api
Detectron2 server API
api cpu-inference-api cuda detectron2 flask gunicorn self-hosted
Last synced: 05 Feb 2025
https://github.com/nvaranki/cmmx
CUDA matrix multiplication (official guide, modified)
Last synced: 10 Dec 2024
https://github.com/demetriantitus/machine-vision---yolov8
This project provides a comprehensive guide to object detection in cluttered environments using YOLOv8. It demonstrates how to identify and classify objects in both still images and video streams
computer-vision cuda dataset image-classification machine-learning nvidia-gpu object-detection surveillance traffic-monitoring video-analysis yolov8
Last synced: 05 Feb 2025
https://github.com/rkarahul/person-detector-faceverifier
Person-Detector-FaceVerifier is a sophisticated system for detecting and verifying faces in images. Ideal for applications like passport control and security, it combines advanced face detection with precise verification techniques.
bootstrap5 css3 cuda django html5 javascipt opencv-python os python pytorch yolov8
Last synced: 05 Feb 2025
https://github.com/dasbd72/nthu-ipc-2022
National Tsing Hua University - Introduction to Parallel Computing - 2022
cuda cuda-programming hpc mpi openmp pthreads
Last synced: 05 Feb 2025
https://github.com/popke523/rybki
A 3D shoal of fish animation using the boids algorithm, OpenGL for rendering and CUDA for parallel processing.
Last synced: 08 Feb 2025
https://github.com/juntyr/necsim-rust-analysis
Analysis of the spatially explicit biodiversity simulation `necsim-rust`
analysis biodiversity cuda mpi necsim rust simulation
Last synced: 25 Jan 2025
https://github.com/sid911/scions_old
A small, fast and easy to use Machine Learning framework for edge
cpp cuda library machine-learning
Last synced: 14 Jan 2025
https://github.com/thanduriel/cuda_hip_comparison
performance study of atomics on GPUs
Last synced: 05 Feb 2025
https://github.com/apostolis1/parallel-processing-systems
Project of the undergrad course "Parallel Processing Systems" - NTUA
benchmark c cuda mpi openmp parallel-computing
Last synced: 05 Feb 2025
https://github.com/anne-andresen/autoencoder_3d_c_cuda
3D Autoencoder training in raw C/CUDA
Last synced: 05 Feb 2025
https://github.com/iebeid/cuda-particles
A simple visualization of particles calcualted using CUDA
Last synced: 12 Jan 2025
https://github.com/prateekshukla1108/thunderkittens-docs
Documentation for ThunderKittens framework
Last synced: 24 Jan 2025
https://github.com/shineiarakawa/cuda-cmake-minimal-template
A minimal CUDA C++ project template with CMake
cmake cuda dear-imgui opengl project-template stb-image
Last synced: 21 Jan 2025
https://github.com/patriciobcs/mini-aevol
Parallel implementation of a reduced version of the Aevol simulator
Last synced: 20 Jan 2025
https://github.com/versi379/optimized-matrix-multiplication
This project utilizes CUDA and cuBLAS to optimize matrix multiplication, achieving up to a 5x speedup on large matrices by leveraging GPU acceleration. It also improves memory efficiency and reduces data transfer times between CPU and GPU.
cublas cuda cuda-programming hpc matrix-multiplication parallel-computing parallel-programming
Last synced: 21 Jan 2025
https://github.com/usman619/pdc
Parallel and Distributed Computing
cuda distributed-computing distributed-systems nextcloud
Last synced: 13 Jan 2025