An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/nabilshadman/cuda-4-dummies

Lecture slides and exercise files of the CUDA 4 Dummies course (2025)

cuda gpu-computing high-performance-computing nsight-systems nvidia-gpu parallel-computing

Last synced: 31 Oct 2025

https://github.com/mohamedsamirx/yolov12-tensorrt-cpp

YOLOv12 Inference Using CPP, Tensorrt, And CUDA

cpp cuda tensorrt tensorrt-inference yolo yolov12

Last synced: 15 Apr 2026

https://github.com/manu-sh/cuda-mandelbrot

how to use cuda acceleration to compute mandelbrot set

cuda mandelbrot ppm-image

Last synced: 15 Apr 2026

https://github.com/ahmed5827/image_generation

This application provides a graphical user interface (GUI) for generating images using the Stable Diffusion model. The GUI allows users to input a text prompt, and the application generates an image based on the prompt.

ai cuda generative-ai image-generation

Last synced: 15 Apr 2026

https://github.com/lyynn777/cuda-bitonic-sort

Simple CUDA project to implement Bitonic Sort and compare it with normal CPU sorting.

bitonic-sort cuda gpu-computing gpu-vs-cpu parallel-computing performance-testing pycuda python

Last synced: 15 Apr 2026

https://github.com/flosmume/cpp-cuda-streams-and-pinned-mem

A CUDA C++ demo showing how to overlap data transfer and kernel execution using multiple streams and pinned (page-locked) host memory. This project illustrates asynchronous memcpy, event timing, and performance benefits of concurrent GPU execution — essential for building high-throughput pipelines.

asynchronous-execution cuda cuda-streams gpu parallel-programming performance-optimization pinned-memory

Last synced: 13 May 2026

https://github.com/uva-trasgo/controllers

Read-only mirror of the official repository: https://gitlab.com/trasgo-group-valladolid/controllers. Controllers is a library written in C11 that provides a simplified way to program applications that can exploit heterogeneous computational platforms including accelerators and/or multi-core CPUs.

cuda heterogeneous-computing heterogeneous-parallel-programming hip opencl openmp

Last synced: 12 May 2026

https://github.com/mahdi-hasan-shuvo/ml-opensource-project

is an open source repository focused on providing practical and educational machine learning resources. The project aims to make learning and applying machine learning more accessible through well-documented code, tutorials, and real-world examples.

cuda machine-learning machine-learning-algorithms ml-projects open-source python

Last synced: 19 May 2026

https://github.com/tkemmer/cunessie.jl

CUDA-accelerated Nonlocal Electrostatics in Structured Solvents

bioinformatics boundary-element-method cuda electrostatics gpu-computing julia proteins

Last synced: 31 Jan 2026

https://github.com/snandasena/courseera_gpu_specilization

Example for Cuda streaming

c cpp cuda

Last synced: 15 Apr 2026

https://github.com/eastonman/tensorrt-pytorch-wrapper

A wrapper makes TensorRT engine accept PyTorch Cuda Tensor.

cuda pytorch tensorrt

Last synced: 06 May 2026

https://github.com/naetherm/derelictcublas

Dynamic bindings to the CuBLAS library for the D Programming Language.

cublas cuda d derelict dlang

Last synced: 31 Oct 2025

https://github.com/starlitdreams/pacman-convolutional-q-learning

This project implements a Deep Q-Network (DQN) using PyTorch to train an agent to play Atari's Ms. Pac-Man. It utilizes reinforcement learning with a convolutional neural network (CNN) for image processing. Features include experience replay, frame preprocessing, and CUDA support, with trained model saving and video rendering of gameplay.

artificial-intelligence artificial-neural-networks atari cuda deep-learning deep-learning-algorithms deep-q-learning deeplearning gymnasium gymnasium-environment python pytorch

Last synced: 15 Apr 2026

https://github.com/ramyacp14/document-based-question-and-answers

Developed a document question answering system that utilizes Llama and LangChain for contextual and accurate answers. The system supports .txt documents, intelligent text splitting, and context-aware querying through an easy-to-use Streamlit interface.

chroma cuda hugging-face langchain llama python recursivecharactertextsplitter streamlit

Last synced: 07 Mar 2026

https://github.com/materight/pyav-cuda

Extension of PyAV with hardware encoding and decoding support. Compatible with PyTorch and Nvidia codecs.

cuda cuvid ffmpeg libav pytorch

Last synced: 01 Feb 2026

https://github.com/cscfi/csc-env-julia

Julia language environment including MPI.jl, CUDA.jl and AMDGPU.jl preferences for HPC clusters at CSC.

amdgpu ansible cuda hpc julia julia-language mpi

Last synced: 01 Feb 2026

https://github.com/storterald/neural-network

Simple neural network implementation in C++ and CUDA

asm asmx86 c-plus-plus cmake cpp cuda machine-learning neural-network

Last synced: 28 Mar 2025

https://github.com/amypad/miutil

Basic functionality needed for AMYPAD

cuda matlab medical-imaging python

Last synced: 13 May 2025

https://github.com/teambipartite/bipartite-gemm

High throughput data-parallel GEMM implementations in Cuda using Cuda cores and Tensor cores

cuda data-parallelism gemm

Last synced: 17 Apr 2026

https://github.com/m-torhan/cuda-fractals

CUDA C++ implementation of Fractals visualization

cuda

Last synced: 25 Feb 2026

https://github.com/actepukc/uv-app-starter-pack

Bootstrap PySide6 GUI apps quickly using uv, with built-in PyTorch/CUDA handling.

astral-uv cross-platform cuda gui pyside6 python pytorch qt6 starter-kit template

Last synced: 30 Apr 2026

https://github.com/ivanfioravanti/tflops_mps

TFLOPs testing on MPS and CUDA

cuda mps tflops

Last synced: 19 May 2026

https://github.com/grindelfp/cuda-n-body-simulation

Simulation of N-Body movement using CUDA.

cuda n-body-simulation

Last synced: 06 Apr 2025

https://github.com/drilonaliu/parallel-fractal-tree

GPU-accelerated fractal tree generation with CUDA and OpenGL interoperability.

cuda fractal-tree fractals gpu

Last synced: 19 May 2026

https://github.com/xza85hrf/flag_prediction_project

This application predicts the name of a country (or countries) based on an input flag image. It uses advanced image processing techniques and deep learning models built with PyTorch to classify flags accurately.

cross-validation cuda data-augmentation docker efficientnetb0 flag-recognition image-classification machine-learning mixed-precision-training mobilenetv2 python pytorch resnet resnet-50 transfer-learning

Last synced: 15 Apr 2026

https://github.com/patriciobcs/mini-aevol

Parallel implementation of a reduced version of the Aevol simulator

aevol cuda simulation

Last synced: 19 May 2026

https://github.com/muneeb706/cuda

sample programs implemented using cuda (gpu)

cplusplus cuda gpu-programming

Last synced: 19 May 2026

https://github.com/fieldcure/fieldcure-whisper-runtimes

Pre-built Whisper.net native runtime binaries (CPU/CUDA/Vulkan) for the FieldCure software ecosystem.

cuda dotnet native-binaries nuget redistributable vulkan whisper whisper-net

Last synced: 01 Jun 2026

https://github.com/chiragajain/gpu-optimization-roadmap

This repository is part of a structured curriculum designed to master GPU optimization, Triton, Deep Learning, and LLMs. This section focuses on GPU fundamentals, CUDA programming, and PyTorch optimizations.

cuda deeplearning gpu-acceleration learning python pytorch triton

Last synced: 18 Feb 2026

https://github.com/kar-dim/CAS-2D

Implementation of the AMD FidelityFX CAS (Contrast Adaptive Sharpening) algorithm on CUDA, for sharpening static images.

cpp cuda dll fidelityfx gpu image-processing parallel-computing sharpen

Last synced: 01 Nov 2025

https://github.com/baremetalrt/baremetalrt

BareMetalRT — edge GPU compute mesh

cuda distributed-computing gpu inference llm nvidia tensorrt windows

Last synced: 18 Apr 2026

https://github.com/mxm-tr/docker-darknet-opencv

Accelerated objects detection on streams and files, using a Docker darknet YOLO container

cuda docker docker-compose object-recognition opencv-python python3 yolo

Last synced: 10 Apr 2026

https://github.com/joe-mruz/hgvisualizer

An interactive simulation and visualization tool for evolving hypergraphs, inspired by the Wolfram Physics Project.

cpp cuda hypergraph physics simulator wolfram

Last synced: 02 May 2026

https://github.com/kirubhakaranm/vision-pipeline-cuda

High-performance camera processing pipeline with CUDA GPU acceleration, CPU multithreading, and real-time TCP/IP telemetry monitoring (1,200+ FPS, <1ms latency)

computer-vision cpp17 cuda edge-detection gpu-acceleration image-processing multithreading networking opencv performance-optimization real-time robotics tcp-ip telemetry

Last synced: 12 Apr 2026

https://github.com/sangioai/sph

CUDA and OpenMP versions of SPH (Smoothed Particle Hydrodynamics) serial algorithm.

cuda openmp

Last synced: 27 Apr 2026

https://github.com/lruizap/testcuda

Guide to install and use cuda for programming

cuda cudnn nvidia pytorch

Last synced: 12 May 2026

https://github.com/amitkumarj441/deep-learning-on-your-finger

A rich collection of dockerfiles for installing deep learning dependecies on your way :rocket:

cuda cudnn gcp

Last synced: 18 Apr 2026

https://github.com/debanjan06/spatial-streamio

An optimized, out-of-core asynchronous data streaming pipeline for high-throughput 3D point cloud training loops. Features low-level numpy.memmap zero-copy reads and multi-threaded ring prefetching to eliminate I/O bottlenecks, delivering a 33.33% throughput efficiency gain on PyTorch CUDA workloads.

asynchronous-programming cuda data-engineering deep-learning-pipelines io-optimization memory-mapping point-cloud pytorch

Last synced: 11 Jun 2026

https://github.com/matteopolak/stock-predict

Stock prediction with LSTM using TensorFlow and TypeScript.

ai artificial-intelligence cuda lstm machine-learning stock tensorflow typescript

Last synced: 09 May 2026

https://github.com/muppetsg2/cudaraytracer

A custom ray tracer originally developed during university studies to run on CPU, now ported to GPU using CUDA. This project was created to explore GPU rendering techniques and to gain hands-on experience with CUDA programming.

cuda mit-license nvidia-cuda nvidia-gpu raytracing sfml stb-image student-project study-project

Last synced: 16 Apr 2026

https://github.com/xstupi00/N-Body-CUDA

PCG - Parallel Computations on GPU - Project - N-Body-CUDA

cuda gpu-acceleration gpu-computing nbody-simulation optimization parallel-computing pcg vut vut-fit

Last synced: 11 Mar 2025

https://github.com/farukalamai/cpp-for-cuda

A structured C++ learning path designed specifically for developers preparing to learn CUDA programming.

cpp cuda gpu nvidia

Last synced: 09 Jun 2026

https://github.com/brendanm12345/simple_renderer_cs149

Simple CUDA renderer implementation. 19th most efficient out of 150+ submissions

cpp cuda

Last synced: 18 May 2026

https://github.com/rajshrestha86/kmeans-clusterize-cuda

Implementation of K-Means algorithm from scratch using CUDA.

c cuda kmeans-clustering

Last synced: 18 May 2026

https://github.com/yashpotdar-py/flood-vision

Flood Vision - A deep learning–based computer vision system for flood mapping and damage assessment using aerial imagery.

cuda deep-learning flood-detection iot python

Last synced: 16 Apr 2026

https://github.com/kentakoong/mtnlog

A simple multinode performance logger for Python

cuda lanta nvitop python slurm-cluster

Last synced: 11 Jan 2026

https://github.com/amruthapatil/nyu-cudaconvolution

Implementing convolution operations on an image using CUDA, exploiting different methodologies - basic, tiled, and cuDNN

cuda high-performance

Last synced: 13 Mar 2025

https://github.com/jiriklepl/bits-knn-jpdc2024

Replication package for the paper Towards Optimal GPU-accelerated K-Nearest Neighbors Search

bitonic-sort cuda gpu k-nearest-neighbors knn-search top-k

Last synced: 21 Mar 2025

https://github.com/sferez/sspp_sparse_matrix_cuda

Small Scale Parallel Programming, Sparse Matrix multiplication with CUDA

cpp cuda omp omp-parallel parallel-computing small-scale-parallel-programming sparse-matrix

Last synced: 30 Apr 2026

https://github.com/equiel-1703/cuhip

Wrapper tool to convert CUDA source code to HIP code and compile it with HIPCC. Useful for learning CUDA programming using AMD devices..

cuda hip

Last synced: 14 May 2026

https://github.com/edcalderin/huggingface_ragflow

This project implements a classic Retrieval-Augmented Generation (RAG) system using HuggingFace models with quantization techniques. The system processes PDF documents, extracts their content, and enables interactive question-answering through a Streamlit web application.

bitsandbytes cuda huggingface huggingface-embeddings langchain langchain-community large-language-models llm nf4 python qdrant quantization rag retrieval-augmented-generation ruff streamlit text-generation

Last synced: 15 Jul 2025

https://github.com/aayes89/pyllm

Entrena tu propio LLM desde cero

cpu cuda llm llm-training pip python3

Last synced: 18 May 2026

https://github.com/edisonslightbulbs/viewer

Exploring real-time 3D point cloud rendering using Cuda and openGL

cuda cxx11 opengl pangolin submodule

Last synced: 02 May 2026

https://github.com/aaaastark/nvidia-cuda-google-colab

Deployment of NVIDIA-CUDA on Google Colab. With in examples codes (Vector Addition and Matrix Multiplication).

c cpp cuda googlecolab googlecolaboratory matrix-multiplication nvidia python vector-addition

Last synced: 16 Apr 2026

https://github.com/alexjmercer/cuda-npp-assignment

Learning about CUDA and NVIDIA Performance Primitives. Part of Coursera Assignment.

cuda gpu-programming npp nppi

Last synced: 13 Feb 2026

https://github.com/ivanbuccella/sf2bio

Deep reinforcement learning for de novo drug design: a ReLeaSe method execution on a Docker Environment

cuda deep-learning deep-reinforcement-learning docker docker-compose machine-learning nvidia-cuda nvidia-docker reinforcement-learning release release-method

Last synced: 01 May 2026

https://github.com/tlabaltoh/tlab-sharescreen-server-win

Software frame encoder using CUDA and cast encoded frames over UDP. Trying to implement a custom streaming protocol and shader based frame encoder/decoder for screencast.

cuda desktop-capture screensharing unity unity3d windows-graphics-capture

Last synced: 14 Feb 2026

https://github.com/avarga1/vllm-hb

vLLM-compatible inference runtime in pure Rust. Zero Python. Zero libtorch. CUDA via candle.

candle cuda inference llm openai-api rust tokio vllm

Last synced: 07 Apr 2026

https://github.com/mrtejas/cv-sandbox

A collection of Computer Vision mini-projects tuned for a number of tasks, including face detection, object detection, image segmentation and CLIP. Trained on popular datasets and includes comparative study of the methods. Done as a part of S24 course : Computer Vision at IIIT Hyd

computer-vision cuda ml opencv pytorch yolo

Last synced: 01 May 2026

https://github.com/loveboyme/yolov5-tensorrt-accelerator

基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization

cuda dynamic-shapes-cuda-stream fp16 int8 pycuda tensorrt yolov5

Last synced: 29 Mar 2025

https://github.com/ankhoa1212/cuda-program

This is a GPU program built with CUDA using parallel reduction

cpp cuda curand gpu-programming parallel-reduction

Last synced: 14 Feb 2026

https://github.com/srmlcn/spirals

The purpose of the Spirals script is to create a computer-generated image. The image maps to GPUs with CUDA support.

cgi cuda gpu numba nvidia python

Last synced: 28 Feb 2026

https://github.com/nagharjun17/mlir-to-ptx-cuda

Creating an MLIR dialect that fuses Addition + ReLU, lowers to NVVM and LLVM IR and generates PTX to run the kernel on CUDA GPU

cpp cuda deep-learning llvm mlir ptx

Last synced: 18 Apr 2026

https://github.com/akira4o4/cuda-yolo-processing

CUDA YOLO Processing

cuda yolo

Last synced: 12 Jul 2025

https://github.com/wiktor2718/matrix_flow

Matrix Flow is a simple machine learning library written in Rust and CUDA. It was created as a portfolio project to deepen my understanding of machine learning, GPU programming, and Rust. It provides an API for matrix manipulation and includes specially optimized neural networks.

adam-optimizer benchmarking cuda deep-learning gpu-computing machine-learning matrix-operations neural-networks portfolio-project rust

Last synced: 18 May 2026

https://github.com/fikri-rouzan/cuda-c-program-part-3

CUDA C program from NVIDIA course.

c cuda

Last synced: 01 May 2026

https://github.com/cppshizoids/cuda

This is my basic lessons of CUDA

cuda cuda-demo cuda-programming

Last synced: 15 Jul 2025

https://github.com/mattjesc/gpu-accelerated-fap

GPU-Accelerated Frequency Analysis Prototype using CUDA, Unit Testing, and User-Defined Settings

c cmake cpp cuda cufft googletest gpu gpu-acceleration gpu-computing gpu-programming nvidia signal-processing test test-automation testing unit-testing

Last synced: 16 Apr 2026

https://github.com/tfogal/gemm-db

For creating a cacheable GEMM cost model.

cuda rust

Last synced: 18 May 2026

https://github.com/smoke-y/athena

Deep learning library

cuda deep-learning deep-learning-library

Last synced: 01 Mar 2026

https://github.com/demetriantitus/machine-vision---yolov8

This project provides a comprehensive guide to object detection in cluttered environments using YOLOv8. It demonstrates how to identify and classify objects in both still images and video streams

computer-vision cuda dataset image-classification machine-learning nvidia-gpu object-detection surveillance traffic-monitoring video-analysis yolov8

Last synced: 18 May 2026

https://github.com/obj-wtf/gan-architecture

APP For training GAN Models on Architecture Plan

architecture building cuda gan pix2pix-tensorflow plan

Last synced: 18 May 2026

https://github.com/aarid/cuda_operations

This project compares performance between CPU and GPU with CUDA operations. Two simples cases are used: matrix multiplication and 2d convolution.

conv2d cuda cuda-programming gpu gpu-computing matrix-multiplication

Last synced: 02 Mar 2026

https://github.com/anselm67/cuda_mnist

A CUDA implementation of MNIST - for CUDA beginners.

cuda gpu gpu-computing gpu-programming mnist mnist-classification

Last synced: 02 Mar 2026

https://github.com/moshiba/fmindex

ultra fast parallel FM index generation for DNA reads

cpp cuda fmindex parallel

Last synced: 18 May 2026

https://github.com/atticuszeller/pytorch-lightning-uv

📦 Zero-config Deep Learning template with PyTorch Lightning, UV package manager, W&B tracking, and modern Python tooling 🚀

classification cuda deep-learning machine-learning mnist-classification python pytorch pytorch-lightning typer uv

Last synced: 16 Apr 2026

https://github.com/ivanbgd/cuda_quad_c

Calculates a definite integral by using three different rules. Compares sequential to parallel implementations.

cuda integrals parallel-implementations

Last synced: 28 Mar 2025

https://github.com/rushirg/cuda-matrix-multiplication

Matrix Multiplication on GPGPU in CUDA

cpu cuda gpu parallel-processing

Last synced: 17 May 2026

https://github.com/tomosatop/docker-lammps

Lammps を手軽に使いたかったので、サービスを作りました

cuda lammps wsl-ubuntu

Last synced: 28 Mar 2025

https://github.com/puzzlef/vector-max-cuda

Performance of sequential vs CUDA-based vector element max.

basics cuda element experiment max vector

Last synced: 17 May 2026

https://github.com/darshanakgr/meanfiltergpu

A gpu implementation of mean filter in CUDA

c cuda image-processing

Last synced: 01 May 2026

https://github.com/reuben-sun/pybind-cuda-demo

一个 基于pybind11实现python调用cuda C++接口 的示例

cpp cuda pybind11 python pytorch

Last synced: 07 Apr 2026

https://github.com/miferreiro/cdap-cuda

CUDA exercises for the subject of "Computación Distribuída e de Altas Prestacións" in the Master Degree of Computer Engineering of the University of Vigo in 2020

c cuda scan

Last synced: 17 May 2026

https://github.com/eagleeee2/ethminer

EthMiner is a powerful Ethereum mining software optimized for GPU performance using OpenCL and CUDA technologies. It provides easy setup, detailed performance metrics, and robust compatibility with major mining pools, ensuring maximum efficiency and profitability for both novice and experienced miners.

cryptocurrency cuda eth ethash ethereum ethereum-mining gpu-mining mining-pool mining-software open-source

Last synced: 16 Apr 2026

https://github.com/harmeshgv/gpu-powered-bert-finetuning

Efficient fine-tuning of BERT models using CUDA-powered GPUs, optimized for laptops and devices with NVIDIA RTX 3000/4000 series or CUDA-compatible GPUs. Ideal for fast NLP model training with PyTorch and Hugging Face Transformers.

bert-model cuda finetuning-llms pytorch

Last synced: 16 Apr 2026

https://github.com/tianzonglin/cloud-control-gui

A tool to compute, visualize, analyse and drag points (high-dimensional data)

cuda interaction-design visualization

Last synced: 25 Apr 2026

https://github.com/versi379/optimized-matrix-multiplication

This project utilizes CUDA and cuBLAS to optimize matrix multiplication, achieving up to a 5x speedup on large matrices by leveraging GPU acceleration. It also improves memory efficiency and reduces data transfer times between CPU and GPU.

cublas cuda cuda-programming hpc matrix-multiplication parallel-computing parallel-programming

Last synced: 17 May 2026

https://github.com/santiagoenriquega/gpu_projects

Various Python GPU accelerated computations and simulations.

cuda cupy numba opencl pyopencl python

Last synced: 17 May 2026

https://github.com/ergus/algorithms

Set of multiple algorithms implemented in multiple paradigms

algorithms cmake concurrency cpp cuda gpgpu inter-language metaprogramming multithreading pthreads stl testing

Last synced: 17 May 2026