An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/materight/pyav-cuda

Extension of PyAV with hardware encoding and decoding support. Compatible with PyTorch and Nvidia codecs.

cuda cuvid ffmpeg libav pytorch

Last synced: 01 Feb 2026

https://github.com/cscfi/csc-env-julia

Julia language environment including MPI.jl, CUDA.jl and AMDGPU.jl preferences for HPC clusters at CSC.

amdgpu ansible cuda hpc julia julia-language mpi

Last synced: 01 Feb 2026

https://github.com/teambipartite/bipartite-gemm

High throughput data-parallel GEMM implementations in Cuda using Cuda cores and Tensor cores

cuda data-parallelism gemm

Last synced: 17 Apr 2026

https://github.com/m-torhan/cuda-fractals

CUDA C++ implementation of Fractals visualization

cuda

Last synced: 25 Feb 2026

https://github.com/xza85hrf/flag_prediction_project

This application predicts the name of a country (or countries) based on an input flag image. It uses advanced image processing techniques and deep learning models built with PyTorch to classify flags accurately.

cross-validation cuda data-augmentation docker efficientnetb0 flag-recognition image-classification machine-learning mixed-precision-training mobilenetv2 python pytorch resnet resnet-50 transfer-learning

Last synced: 15 Apr 2026

https://github.com/fieldcure/fieldcure-whisper-runtimes

Pre-built Whisper.net native runtime binaries (CPU/CUDA/Vulkan) for the FieldCure software ecosystem.

cuda dotnet native-binaries nuget redistributable vulkan whisper whisper-net

Last synced: 01 Jun 2026

https://github.com/baremetalrt/baremetalrt

BareMetalRT — edge GPU compute mesh

cuda distributed-computing gpu inference llm nvidia tensorrt windows

Last synced: 18 Apr 2026

https://github.com/muppetsg2/cudaraytracer

A custom ray tracer originally developed during university studies to run on CPU, now ported to GPU using CUDA. This project was created to explore GPU rendering techniques and to gain hands-on experience with CUDA programming.

cuda mit-license nvidia-cuda nvidia-gpu raytracing sfml stb-image student-project study-project

Last synced: 16 Apr 2026

https://github.com/yashpotdar-py/flood-vision

Flood Vision - A deep learning–based computer vision system for flood mapping and damage assessment using aerial imagery.

cuda deep-learning flood-detection iot python

Last synced: 16 Apr 2026

https://github.com/sferez/sspp_sparse_matrix_cuda

Small Scale Parallel Programming, Sparse Matrix multiplication with CUDA

cpp cuda omp omp-parallel parallel-computing small-scale-parallel-programming sparse-matrix

Last synced: 30 Apr 2026

https://github.com/aaaastark/nvidia-cuda-google-colab

Deployment of NVIDIA-CUDA on Google Colab. With in examples codes (Vector Addition and Matrix Multiplication).

c cpp cuda googlecolab googlecolaboratory matrix-multiplication nvidia python vector-addition

Last synced: 16 Apr 2026

https://github.com/alexjmercer/cuda-npp-assignment

Learning about CUDA and NVIDIA Performance Primitives. Part of Coursera Assignment.

cuda gpu-programming npp nppi

Last synced: 13 Feb 2026

https://github.com/tlabaltoh/tlab-sharescreen-server-win

Software frame encoder using CUDA and cast encoded frames over UDP. Trying to implement a custom streaming protocol and shader based frame encoder/decoder for screencast.

cuda desktop-capture screensharing unity unity3d windows-graphics-capture

Last synced: 14 Feb 2026

https://github.com/ankhoa1212/cuda-program

This is a GPU program built with CUDA using parallel reduction

cpp cuda curand gpu-programming parallel-reduction

Last synced: 14 Feb 2026

https://github.com/srmlcn/spirals

The purpose of the Spirals script is to create a computer-generated image. The image maps to GPUs with CUDA support.

cgi cuda gpu numba nvidia python

Last synced: 28 Feb 2026

https://github.com/nagharjun17/mlir-to-ptx-cuda

Creating an MLIR dialect that fuses Addition + ReLU, lowers to NVVM and LLVM IR and generates PTX to run the kernel on CUDA GPU

cpp cuda deep-learning llvm mlir ptx

Last synced: 18 Apr 2026

https://github.com/mattjesc/gpu-accelerated-fap

GPU-Accelerated Frequency Analysis Prototype using CUDA, Unit Testing, and User-Defined Settings

c cmake cpp cuda cufft googletest gpu gpu-acceleration gpu-computing gpu-programming nvidia signal-processing test test-automation testing unit-testing

Last synced: 16 Apr 2026

https://github.com/smoke-y/athena

Deep learning library

cuda deep-learning deep-learning-library

Last synced: 01 Mar 2026

https://github.com/aarid/cuda_operations

This project compares performance between CPU and GPU with CUDA operations. Two simples cases are used: matrix multiplication and 2d convolution.

conv2d cuda cuda-programming gpu gpu-computing matrix-multiplication

Last synced: 02 Mar 2026

https://github.com/anselm67/cuda_mnist

A CUDA implementation of MNIST - for CUDA beginners.

cuda gpu gpu-computing gpu-programming mnist mnist-classification

Last synced: 02 Mar 2026

https://github.com/atticuszeller/pytorch-lightning-uv

📦 Zero-config Deep Learning template with PyTorch Lightning, UV package manager, W&B tracking, and modern Python tooling 🚀

classification cuda deep-learning machine-learning mnist-classification python pytorch pytorch-lightning typer uv

Last synced: 16 Apr 2026

https://github.com/eagleeee2/ethminer

EthMiner is a powerful Ethereum mining software optimized for GPU performance using OpenCL and CUDA technologies. It provides easy setup, detailed performance metrics, and robust compatibility with major mining pools, ensuring maximum efficiency and profitability for both novice and experienced miners.

cryptocurrency cuda eth ethash ethereum ethereum-mining gpu-mining mining-pool mining-software open-source

Last synced: 16 Apr 2026

https://github.com/harmeshgv/gpu-powered-bert-finetuning

Efficient fine-tuning of BERT models using CUDA-powered GPUs, optimized for laptops and devices with NVIDIA RTX 3000/4000 series or CUDA-compatible GPUs. Ideal for fast NLP model training with PyTorch and Hugging Face Transformers.

bert-model cuda finetuning-llms pytorch

Last synced: 16 Apr 2026

https://github.com/mathiasotnes/gemm

General Matrix Multiplication (GEMM) optimization in Cuda.

cuda gpu

Last synced: 26 Mar 2025

https://github.com/iebeid/cuda-particles

A simple visualization of particles calcualted using CUDA

cuda opengl

Last synced: 17 Apr 2026

https://github.com/lbaf23/gpuinfo

cuda gpu

Last synced: 17 Apr 2026

https://github.com/jonmarty/pycuda-kmeans

A parallelized PyCuda implementation of the KMeans clustering algorithm.

cuda kmeans pycuda

Last synced: 25 Apr 2026

https://github.com/jdibenes/game_of_life_cuda

OpenGL / CUDA implementation of Conway's Game of Life.

cpp cuda opengl qt6 simulation

Last synced: 02 Apr 2026

https://github.com/chrisdalvit/gpu-matrix-transpose

Implementation and benchmarking of different matrix transpose with CUDA

c cpp cuda cuda-kernels cuda-programming gpu-acceleration gpu-computing gpu-programming matrix-transpose nvidia-gpu

Last synced: 17 Apr 2026

https://github.com/loreloc/triturus

A bunch of triton kernels with increasing complexity for learning and exploring triton and GPU programming

cuda pytorch triton

Last synced: 17 Apr 2026

https://github.com/stckvrflw/pem-spgemm

pemSpGEMM - An Improved SpGEMM Algorithm

cpp cuda

Last synced: 17 Apr 2026

https://github.com/void4main/bifurcation-diagram

These little python scripts plot a bifurcation diagram into a png file (work fine on a raspberry pi and accelerated on a NVIDIA Jetson Nano) - but still a lot of room for improvements ...

bifurcation cuda feigenbaum gpu jetson logistic map nano numba sequence vectorize

Last synced: 17 Apr 2026

https://github.com/bjornmelin/ml-production-engineering

⚙️ End-to-end ML deployment solutions. Focused on model serving, multi-GPU optimization, and production-grade system implementation. 🎯

cuda deployment docker fastapi gpu-computing kubernetes mlops production

Last synced: 17 Apr 2026

https://github.com/bjornmelin/nlp-engineering-hub

📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤

cuda gpu-optimization huggingface huggingface-transformers langchain language-models large-language-models nlp openai python transformers

Last synced: 17 Apr 2026

https://github.com/vibesmiths/mcp-rvc

GPU service for voice cloning via Retrieval-based Voice Conversion (CUDA + ROCm)

cuda docker gpu rocm rvc tts voice-cloning

Last synced: 17 Apr 2026

https://github.com/vibesmiths/mcp-musicgen

GPU service for text-to-music generation via Meta AudioCraft (CUDA + ROCm)

audiocraft cuda docker gpu musicgen python rocm text-to-music

Last synced: 17 Apr 2026

https://github.com/briiqn/obj2schem

A CUDA enabled .obj model to schematic (Sponge V3) converter

cuda minecraft schematics wavefront-obj worldedit

Last synced: 17 Apr 2026

https://github.com/cs550-epfl/report

EPFL CS-550 project report

cuda formal-verification gpu memory-consistency ptx simt

Last synced: 03 Jun 2026

https://github.com/qompassai/qudaz

Qompass AI Cuda library for Zig

cuda zig

Last synced: 17 Apr 2026

https://github.com/qompassai/cuda

Qompass AI on CUDA

cuda nvidia

Last synced: 17 Apr 2026

https://github.com/synapticore-io/torch-cuda

PyTorch CUDA 12.8 project template - GPU-accelerated ML with modern Python packaging (uv), checkpointing, early stopping, and reproducibility utilities.

cuda gpu project-template python pytorch

Last synced: 04 Apr 2026

https://github.com/seieric/pytorch-mpi-singularity

Singularity Container including PyTorch with CUDA and mpi backend for DistributedDataParallel

cuda hpc nvidia openmpi pytorch singularity utokyo

Last synced: 18 Apr 2026

https://github.com/thalesmg/haskell-accelerate-parconc

Example and benchmark of Accelerate-HS from Parallel and Concurrent Programming in Haskell

accelerate cuda gpu-computing haskell parallel-computing

Last synced: 18 Apr 2026

https://github.com/qanastek/concurency-tetravex

This software is an fast and reliable tetravex solver based on C++ and CUDA.

c-plus-plus cuda parrallel-computing tetravex

Last synced: 18 Apr 2026

https://github.com/abdelrahman-amen/active_learning_in_nlp

I applied active learning to the IMDB dataset for sentiment analysis. Starting with a small labeled subset, I trained a model and used uncertainty sampling to select and label challenging reviews. This iterative process improved performance while reducing labeling effort.

activelearning cuda entropy imdb-dataset margin nlp python sklearnex torch uncertainty

Last synced: 18 Apr 2026

https://github.com/betarixm/csed490c

POSTECH: Heterogeneous Parallel Computing (Fall 2023)

cuda gpu parallel-computing postech

Last synced: 19 Apr 2026

https://github.com/evstigneevnm/slurm_gpu_mpi_docker

This is a repository that contains a sample of how to make a Dockerfile and compile your program that uses MPI into slurm with enroot and pyxis from NVIDIA.

cuda docker enroot mpi nvidia pyxis slurm

Last synced: 18 Apr 2026

https://github.com/cooliron2311/cumd5bf

CUDA based md5 password bruteforcer

cuda md5 python

Last synced: 18 Apr 2026

https://github.com/marcellodesales/nvidea-cuda-ubuntu-docker

NVidea CUDA base image on Ubuntu Linux, used to run Machine Learning

ai cuda docker docker-compose machine-learning ml nvidia-docker ubuntu

Last synced: 18 Apr 2026

https://github.com/dmmutua/cuda_projects

An Implementation of a variety of Algorithms & Technical Papers Mostly Related to Machine Learning & Deep Learning in CUDA C

c cuda cuda-programming deep-learning machine-learning machine-learning-algorithms

Last synced: 18 Apr 2026

https://github.com/genpat-it/ohe-rs

Ultra-fast one-hot encoding for bioinformatics and ML, powered by Rust + CUDA. Built for cgMLST allele profiles and large-scale categorical data.

bioinformatics cuda machine-learning one-hot-encoding performance pyo3 python rust

Last synced: 04 Jun 2026

https://github.com/liebemama/repo-fastapi

GPU-ready FastAPI AI inference server with plugin system, supporting CUDA, ROCm, CPU, and macOS MPS.

ai-server cuda fastapi gpu inference mps plugins pytorch rocm

Last synced: 05 Apr 2026

https://github.com/ex539/docker-dev-env

A collection of ready-to-use Docker development environments for multiple Linux distributions (Ubuntu, Debian, Alpine, Arch, Kali). Includes shared configurations, utility scripts, and comprehensive documentation for reproducible development setups across teams and CI/CD pipelines.

big-data cpp cuda docker docker-image docker-php docker-setup environment hadoop jenkins kubernetes qtcreator reproducibility x11

Last synced: 05 Apr 2026

https://github.com/sagar-brahaman/imagefilterpy

Example of custom image filter for MRTech IFF Python SDK

camera cuda dng genicam gpu h264 h265 image-processing jetson json mipi rest-api rtsp tiff

Last synced: 18 Apr 2026

https://github.com/aditiisaxena/cuda-accelerated-box-filter-for-texture-image-enhancement

Enhances grayscale texture images using a CUDA-based box filter. Built with CUDA, C++14, and OpenCV for high-performance image processing.

cpp cuda gpu-programming linux nvidia opencv

Last synced: 18 Apr 2026

https://github.com/dougeeai/llama-cpp-python-wheels

Pre-built wheels for llama-cpp-python across platforms and CUDA versions

ampere cuda cuda13 gguf llama-cpp-python llm machine-learning prebuilt python313 rtx3060 rtx3070 rtx3080 rtx3090 wheels windows

Last synced: 18 Apr 2026

https://github.com/intelav/gpu-agent-opt

AI Agent Framework for GPU Kernel Autotuning & Optimization. Automate CUDA kernel exploration, profiling, and tuning with AI-driven agents for deep learning, geospatial AI, and HPC workloads.

ai-agents autotuning cuda deep-l edge-ai geospatial gpu hpc nvidia optimization performance pytorch

Last synced: 19 Apr 2026

https://github.com/vicen-te/tiny-nn

A tiny neural network framework for fully-connected layers with CPU and CUDA support

backpropagation cplusplus-20 cpu cuda cuda-12-8 kernel multi-threaded neural-network nn

Last synced: 19 Apr 2026

https://github.com/timanema/msc-thesis-public

Repository containing a GPU-accelerated compressor based on FSST

compression cpp cuda gpu thesis

Last synced: 19 Apr 2026

https://github.com/zjeffer/docker-arch-cuda

Arch Linux base image with the latest CUDA, CUDNN and LibTorch preinstalled.

archlinux cuda docker libtorch pytorch

Last synced: 19 Apr 2026

https://github.com/fatlipp/toyslam

SLAM implementation from scratch w/o external graph optimization libs

cuda gpu lidar-slam mapping odometry robotics slam

Last synced: 20 Apr 2026

https://github.com/ydkn/htw-progko-cuda

Parallel processing of image transformations. Part of the "Programmierkonzepte und Algorithmen" course at HTW-Berlin.

cuda image-transformations opencv

Last synced: 20 Apr 2026

https://github.com/tameronline/repo-fastapi

GPU-Ready FastAPI AI Inference Server with plugin system (CUDA/CPU/MPS/ROCm)

ai-server cuda deep-learning fastapi inference mps nlp plugins pytorch rocm

Last synced: 20 Apr 2026

https://github.com/rtfirst/voice-to-text

Cross-platform Push-to-Talk speech-to-text — local Whisper transcription (CUDA/MPS) with optional Anthropic API correction and live VU meter overlay. Windows 11 + macOS.

cuda macos push-to-talk python speech-to-text voice-input whisper windows

Last synced: 04 Jun 2026

https://github.com/amirbroker/cupydtw

Use Cuda for Dynamic Time Warping

cuda dtw dynamic-time-warping python

Last synced: 20 Apr 2026

https://github.com/alexkranias/triton_vs_cuda

Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.

cuda cuda-kernels gpu gpu-programming parallel-programming python triton

Last synced: 20 Apr 2026

https://github.com/jusqua/dip-benchmark

Departmental undergraduate research project at UFS. Digital image processing benchmark using multiple tools to learn new ways to develop image processors.

benchmark cuda image-processing matlab opencv sycl visiongl

Last synced: 20 Apr 2026

https://github.com/bonevbs/cuknn

Cuda implementation of k-nearest neighbor search

cuda knn-search

Last synced: 20 Apr 2026

https://github.com/py-sandy/llama.cpp-windows-builder

Automated, reproducible build scripts for llama.cpp on Windows 10/11. Installs prerequisites, configures CMake and builds with CUDA.

ai build-scripts build-tool builder cuda llamacpp script scripts windows windows-10 windows-11

Last synced: 20 Apr 2026

https://github.com/mrkct/cuda-raytracer

Simple CUDA-Accelerated raytracer

cuda gpu raytracing raytracing-one-weekend

Last synced: 21 Apr 2026

https://github.com/rai-project/dlperf

Déjà vu: Modeling DNN Performance by Recalling History

benchmark cuda deep-learning modeling onnx performance tensorflow

Last synced: 21 Apr 2026

https://github.com/musaibbashir/object-detection

Pytorch+CUDA implementation of several image classification and object detection models like YOLO , Fast-CNN, RF-DETR

cnn computer-vision cuda image-classification object-detection pytorch yolo

Last synced: 21 Apr 2026

https://github.com/jiaau/kernels

This repository showcases common optimization techniques for kernels.

cpp cuda cute cutlass hpc kernel

Last synced: 21 Apr 2026

https://github.com/fxzxmicah/fedora-llama-cpp

llama.cpp tools with OpenMP, CUDA, and OpenVINO support

cuda fedora llama-cpp openmp openvino rpm

Last synced: 05 Jun 2026

https://github.com/dimitrijkrstev/pp-cuda-fft

A parallelised CUDA implementation of the FFT Radix-2 algorithm and its execution time comparison to the DFT and non-parallelised Radix-2

cuda fft parallel-computing

Last synced: 22 Apr 2026

https://github.com/mdnpascual/judgebarmashvp

Error bar for the game called Mash VP

cuda emgucv screencapturer tesseract-ocr

Last synced: 22 Apr 2026

https://github.com/maxenceleguery/jat

Tensor library

computation cuda tensor

Last synced: 24 Apr 2026

https://github.com/bikemazzell/tuonella-sift

A high-performance, memory-efficient CSV deduplication tool

csv cuda deduplication logger osint rust

Last synced: 24 Apr 2026

https://github.com/bardifarsi/threadpoolmanager

ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.

cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe

Last synced: 24 Apr 2026

https://github.com/juntyr/necsim-rust-analysis

Analysis of the spatially explicit biodiversity simulation `necsim-rust`

analysis biodiversity cuda mpi necsim rust simulation

Last synced: 24 Apr 2026

https://github.com/0xsooki/extending-jax

JAX Custom Operations with C++ and CUDA (using Pybind11)

cuda jax pybind11 xla

Last synced: 25 Apr 2026

https://github.com/sangioai/torchpace

PyTorch CUDA/C++ extension of PACE: Transformer non-linearlity accelerator engine.

cuda pytorch transformer

Last synced: 25 Apr 2026

https://github.com/daviddavo/19gpu

Short exercises for GPU at Complutense University of Madrid. Mirror from GitLab

accelerator cuda gpu-programming

Last synced: 26 Apr 2026

https://github.com/shashshukla/ee-210-signals-and-systems

Code for the assignments for EE-210, Signals and Systems, at IIT Bombay 2016.

cuda image-processing signal-processing

Last synced: 26 Apr 2026

https://github.com/countzero/windows_exllama

This is a playground to explore the ExLlama project in a Windows environment.

conda cuda exllama python torch

Last synced: 26 Apr 2026

https://github.com/alexyzha/cuda-bioinformatics

A CUDA-Accelerated Bioinformatics Toolchain

bioinformatics bioinformatics-tool cplusplus cuda

Last synced: 26 Apr 2026

https://github.com/mateuszk098/parallel-programming-examples

Simple parallel programming examples with CUDA, MPI and OpenMP.

cpp cuda mpi openmp parallel-programming

Last synced: 27 Apr 2026

https://github.com/kbredies/tgv_pycuda

Algorithms, examples and tests for denoising, deblurring, zooming, dequantization and compressive imaging with total variation (TV) and second-order total generalized variation (TGV) regularization. GPU-accelerated code using PyCUDA.

compressive-imaging cuda image-deblurring image-denoising image-dequantization image-zooming python3 total-generalized-variation total-variation

Last synced: 27 Apr 2026

https://github.com/notkartikye/cuda-image-box-filters

🖼️ CUDA-powered tool for applying box filters to a large amount of images

cuda cuda-library cuda-programming npp

Last synced: 27 Apr 2026

https://github.com/gladap/heterogeneous_computing_project

Heterogeneous parallel programming exercise using OpenMP and CUDA to parallelize image filters

cuda heterogeneous-parallel-programming

Last synced: 27 Apr 2026

https://github.com/perhuepenbecker/cudyn

CUDA library for irregular tasks using a dynamic block-internal balancing mechanism

cpp cuda cuda-library cuda-programming gpu-computing gpu-programming irregular

Last synced: 28 Apr 2026

https://github.com/ncorgan/arrayfire-config-info

A small command-line utility that outputs all available ArrayFire devices

arrayfire cuda gpu opencl

Last synced: 28 Apr 2026

https://github.com/obsidianplusplus/yolov5-tensorrt-accelerator

基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization

cuda dynamic-shapes-cuda-stream fp16 int8 pycuda tensorrt yolov5

Last synced: 28 Apr 2026