An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/sebftw/interp2gpu

GPU-accelerated 2D spline interpolation, à la interp2(..., "spline"), in MATLAB.

cuda gpu gpu-acceleration matlab spline spline-interpolation

Last synced: 10 May 2026

https://github.com/cashcon57/open-supersampling

OpenSuperSampling (OSS) — vendor-agnostic open-source RT denoising, upscaling, and frame extrapolation

cuda deep-learning dlss frame-generation fsr game-engine gaussian-splatting open-source real-time-rendering super-resolution upscaling

Last synced: 10 Jun 2026

https://github.com/dlr-amr/t8gpu

Header-only finite volume library targetting GPUs using t8code as meshing backend.

adaptive-mesh-refinement cuda finite-volume gpgpu-computing hpc mesh mpi parallel-computing simulation

Last synced: 10 May 2026

https://github.com/jeremywildsmith/shadowhash-distributed

Elixir distributed Shadow File password cracker with GPU accelerated cracking for md5crypt hashing algorithm.

cracking-hash cracking-hashes cracking-password cuda distributed-systems elixir erlang hashing nx security

Last synced: 11 May 2026

https://github.com/fabulani/360ip-with-cuda

360° Image Processing with CUDA and OpenCV.

360-image 360-video cpp cuda image-processing opencv

Last synced: 11 May 2026

https://github.com/apws25/accelmoe

This repository is for CUDA kernel re-implementation of CPU-based MoE model.

cpp cuda mixture-of-experts

Last synced: 11 May 2026

https://github.com/xza85hrf/flag_prediction_project

This application predicts the name of a country (or countries) based on an input flag image. It uses advanced image processing techniques and deep learning models built with PyTorch to classify flags accurately.

cross-validation cuda data-augmentation docker efficientnetb0 flag-recognition image-classification machine-learning mixed-precision-training mobilenetv2 python pytorch resnet resnet-50 transfer-learning

Last synced: 15 Apr 2026

https://github.com/sergiomarquezdev/yt-transcriber

🛠️ CLI tool to transcribe YouTube videos using OpenAI Whisper with CUDA acceleration, generate AI summaries (EN/ES) with Gemini, and create LinkedIn/Twitter content. Supports YouTube, Google Drive, and local files.

ai cli cuda gemini python transcription whisper youtube

Last synced: 15 May 2026

https://github.com/fieldcure/fieldcure-whisper-runtimes

Pre-built Whisper.net native runtime binaries (CPU/CUDA/Vulkan) for the FieldCure software ecosystem.

cuda dotnet native-binaries nuget redistributable vulkan whisper whisper-net

Last synced: 01 Jun 2026

https://github.com/separatrixxx/pgp_labs_7_sem

👓 Laboratory work for the 7 semester of MAI on PGP and PDP

cpp cuda nvidia

Last synced: 15 May 2026

https://github.com/abhiksark/gluon-by-example

Learn Triton's Gluon by example — the same GPU kernels written in Triton and Gluon, benchmarked

cuda deep-learning gluon gpu gpu-kernels triton tutorial

Last synced: 01 Jul 2026

https://github.com/baremetalrt/baremetalrt

BareMetalRT — edge GPU compute mesh

cuda distributed-computing gpu inference llm nvidia tensorrt windows

Last synced: 18 Apr 2026

https://github.com/tornikeo/minimal-vscode-cuda-meson

Minimal sample of using VSCode and Meson to build CUDA applications

cuda meson template vscode

Last synced: 08 Sep 2025

https://github.com/lablup/backend.ai-accelerator-cuda

The Backend.AI CUDA Accelerator Plugin

backendai cuda

Last synced: 16 May 2026

https://github.com/elcruzo/cuda-conv

Lightweight CUDA kernel for 2D image convolution achieving 20x+ speedup. Built with CuPy for the NVIDIA Hackathon.

computer-vision convolution cuda cupy gpu-computing hackathon high-performance-computing image-processing nvidia python

Last synced: 15 May 2026

https://github.com/muppetsg2/cudaraytracer

A custom ray tracer originally developed during university studies to run on CPU, now ported to GPU using CUDA. This project was created to explore GPU rendering techniques and to gain hands-on experience with CUDA programming.

cuda mit-license nvidia-cuda nvidia-gpu raytracing sfml stb-image student-project study-project

Last synced: 16 Apr 2026

https://github.com/farukalamai/cpp-for-cuda

A structured C++ learning path designed specifically for developers preparing to learn CUDA programming.

cpp cuda gpu nvidia

Last synced: 09 Jun 2026

https://github.com/bolu-atx/cuda-dojo

Level up your CUDA skills - RPG style. Do or do not, there is no try.

cuda examples learning tutorial

Last synced: 01 Jul 2026

https://github.com/yashpotdar-py/flood-vision

Flood Vision - A deep learning–based computer vision system for flood mapping and damage assessment using aerial imagery.

cuda deep-learning flood-detection iot python

Last synced: 16 Apr 2026

https://github.com/sferez/sspp_sparse_matrix_cuda

Small Scale Parallel Programming, Sparse Matrix multiplication with CUDA

cpp cuda omp omp-parallel parallel-computing small-scale-parallel-programming sparse-matrix

Last synced: 30 Apr 2026

https://github.com/antoniakras/semantic-video-search

GPU-optimized semantic search on video transcripts, with benchmarking of FAISS, Pinecone, and PostgreSQL vector databases. Deployed via Docker on FORTH’s GPU infrastructure.

bert-embeddings bert-fine-tuning cuda dokcer embedding-models embeddings-word2vec faiss-vector-database gpu-computing huggingface-transformers nlp-machine-learning pgvector pineconedb postgresql python pytorch retrieval-augmented-generation similarity-search vector-database whisper-ai

Last synced: 03 May 2026

https://github.com/lehoangan2906/cuda_basics

A simple implementation of operations on vectors and matrices, optimized for running on Nvidia GPU with CUDA

cpp cuda cuda-programming

Last synced: 16 Jun 2025

https://github.com/aaaastark/nvidia-cuda-google-colab

Deployment of NVIDIA-CUDA on Google Colab. With in examples codes (Vector Addition and Matrix Multiplication).

c cpp cuda googlecolab googlecolaboratory matrix-multiplication nvidia python vector-addition

Last synced: 16 Apr 2026

https://github.com/alexjmercer/cuda-npp-assignment

Learning about CUDA and NVIDIA Performance Primitives. Part of Coursera Assignment.

cuda gpu-programming npp nppi

Last synced: 13 Feb 2026

https://github.com/tlabaltoh/tlab-sharescreen-server-win

Software frame encoder using CUDA and cast encoded frames over UDP. Trying to implement a custom streaming protocol and shader based frame encoder/decoder for screencast.

cuda desktop-capture screensharing unity unity3d windows-graphics-capture

Last synced: 14 Feb 2026

https://github.com/ankhoa1212/cuda-program

This is a GPU program built with CUDA using parallel reduction

cpp cuda curand gpu-programming parallel-reduction

Last synced: 14 Feb 2026

https://github.com/srmlcn/spirals

The purpose of the Spirals script is to create a computer-generated image. The image maps to GPUs with CUDA support.

cgi cuda gpu numba nvidia python

Last synced: 28 Feb 2026

https://github.com/nagharjun17/mlir-to-ptx-cuda

Creating an MLIR dialect that fuses Addition + ReLU, lowers to NVVM and LLVM IR and generates PTX to run the kernel on CUDA GPU

cpp cuda deep-learning llvm mlir ptx

Last synced: 18 Apr 2026

https://github.com/mcp-tool-shop-org/gpu-container

Model-aware inference memory-placement planner for single-GPU rigs: profile hardware + model, generate explicit VRAM/RAM/NVMe placement plans across runtimes (llama.cpp/vLLM/...), and prove them with a measured receipt. Not VRAM overflow - declared placement.

cuda gpu inference llama-cpp llm moe offload vram wsl2

Last synced: 09 Jun 2026

https://github.com/mattjesc/gpu-accelerated-fap

GPU-Accelerated Frequency Analysis Prototype using CUDA, Unit Testing, and User-Defined Settings

c cmake cpp cuda cufft googletest gpu gpu-acceleration gpu-computing gpu-programming nvidia signal-processing test test-automation testing unit-testing

Last synced: 16 Apr 2026

https://github.com/illagrenan/cuda-90-cudnn7-runtime-1604-py36

Ubuntu 16.04 with Python 3.6 and CUDA9 Dockerfile

cuda dockerfile python ubuntu

Last synced: 03 May 2026

https://github.com/smoke-y/athena

Deep learning library

cuda deep-learning deep-learning-library

Last synced: 01 Mar 2026

https://github.com/aarid/cuda_operations

This project compares performance between CPU and GPU with CUDA operations. Two simples cases are used: matrix multiplication and 2d convolution.

conv2d cuda cuda-programming gpu gpu-computing matrix-multiplication

Last synced: 02 Mar 2026

https://github.com/anselm67/cuda_mnist

A CUDA implementation of MNIST - for CUDA beginners.

cuda gpu gpu-computing gpu-programming mnist mnist-classification

Last synced: 02 Mar 2026

https://github.com/deltatecs/voses

Volatile Secret Searcher - massively parallel, brute force memory dump analysis for (D)TLS secret extraction

cuda memory-hacking reverse-engineering tls

Last synced: 15 Jun 2025

https://github.com/atticuszeller/pytorch-lightning-uv

📦 Zero-config Deep Learning template with PyTorch Lightning, UV package manager, W&B tracking, and modern Python tooling 🚀

classification cuda deep-learning machine-learning mnist-classification python pytorch pytorch-lightning typer uv

Last synced: 16 Apr 2026

https://github.com/psteinb/gtc2017

Slides for my presentation at GTC 2017 from May 8-11 in Silicon Valley

compression cuda ffmpeg gpu gpu-computing h264 h265 microscopes spim

Last synced: 03 May 2026

https://github.com/viktor-akusoff/chernabogpy

ChernabogPy is a Python package for visualizing gravitational distortions caused by black holes using nonlinear ray tracing.

cuda gpu physics-simulation python3 relativity-of-space-and-time torch

Last synced: 15 May 2026

https://github.com/1180779/spheresraycasting

Raycasting of spheres

cuda opengl

Last synced: 02 Mar 2025

https://github.com/zury7/parallel-programming

A collection of performance optimizations and comparisons between multiprocessing and multithreading using pthreads, OpenMP, and CUDA. The experiments analyze execution speed, resource usage, and parallelization efficiency across different computational models. ( CS 4553 : Scientific Computing )

cuda openmp pthreads

Last synced: 08 May 2026

https://github.com/eagleeee2/ethminer

EthMiner is a powerful Ethereum mining software optimized for GPU performance using OpenCL and CUDA technologies. It provides easy setup, detailed performance metrics, and robust compatibility with major mining pools, ensuring maximum efficiency and profitability for both novice and experienced miners.

cryptocurrency cuda eth ethash ethereum ethereum-mining gpu-mining mining-pool mining-software open-source

Last synced: 16 Apr 2026

https://github.com/harmeshgv/gpu-powered-bert-finetuning

Efficient fine-tuning of BERT models using CUDA-powered GPUs, optimized for laptops and devices with NVIDIA RTX 3000/4000 series or CUDA-compatible GPUs. Ideal for fast NLP model training with PyTorch and Hugging Face Transformers.

bert-model cuda finetuning-llms pytorch

Last synced: 16 Apr 2026

https://github.com/grizzz13/minimal-cuda

Minimal configurations to setup cuda cpp in cmake.

cmake cpp cuda

Last synced: 18 Apr 2026

https://github.com/dstrigl/cnnplus

Master thesis 2010: Fast Convolutional Neural Network Training and Classification on CUDA GPUs

cnn convolutional-neural-networks cpp cuda gpu neural-networks speedup thesis

Last synced: 30 Jun 2026

https://github.com/AMYPAD/miutil

Basic functionality needed for AMYPAD

cuda matlab medical-imaging python

Last synced: 10 Apr 2025

https://github.com/phrutis/bip39scan

brute bip39 mnemonic GPU - $250

bip39 brute brute-force bruteforce cuda gpu mnemonic phrases seed

Last synced: 10 Apr 2025

https://github.com/joe-mruz/hgvisualizer

An interactive simulation and visualization tool for evolving hypergraphs, inspired by the Wolfram Physics Project.

cpp cuda hypergraph physics simulator wolfram

Last synced: 02 May 2026

https://github.com/iebeid/cuda-particles

A simple visualization of particles calcualted using CUDA

cuda opengl

Last synced: 17 Apr 2026

https://github.com/lbaf23/gpuinfo

cuda gpu

Last synced: 17 Apr 2026

https://github.com/jonmarty/pycuda-kmeans

A parallelized PyCuda implementation of the KMeans clustering algorithm.

cuda kmeans pycuda

Last synced: 25 Apr 2026

https://github.com/jdibenes/game_of_life_cuda

OpenGL / CUDA implementation of Conway's Game of Life.

cpp cuda opengl qt6 simulation

Last synced: 02 Apr 2026

https://github.com/chrisdalvit/gpu-matrix-transpose

Implementation and benchmarking of different matrix transpose with CUDA

c cpp cuda cuda-kernels cuda-programming gpu-acceleration gpu-computing gpu-programming matrix-transpose nvidia-gpu

Last synced: 17 Apr 2026

https://github.com/manishklach/gb300-rl-runtime

Close-to-metal C/CUDA lab for RL inference fast paths: persistent GPU workers, hugepage KV arenas, cacheline-aware command rings, and async reward handoff. Goal: remove page faults, malloc/free, scheduler wakeups, CPU round-trips, and KV migration from the per-token path.

ai-infrastructure close-to-metal cuda gb300 gpu-inference hpc lock-free nvlink reinforcement-learning spsc-queue

Last synced: 09 Jun 2026

https://github.com/loreloc/triturus

A bunch of triton kernels with increasing complexity for learning and exploring triton and GPU programming

cuda pytorch triton

Last synced: 17 Apr 2026

https://github.com/stckvrflw/pem-spgemm

pemSpGEMM - An Improved SpGEMM Algorithm

cpp cuda

Last synced: 17 Apr 2026

https://github.com/void4main/bifurcation-diagram

These little python scripts plot a bifurcation diagram into a png file (work fine on a raspberry pi and accelerated on a NVIDIA Jetson Nano) - but still a lot of room for improvements ...

bifurcation cuda feigenbaum gpu jetson logistic map nano numba sequence vectorize

Last synced: 17 Apr 2026

https://github.com/bjornmelin/ml-production-engineering

⚙️ End-to-end ML deployment solutions. Focused on model serving, multi-GPU optimization, and production-grade system implementation. 🎯

cuda deployment docker fastapi gpu-computing kubernetes mlops production

Last synced: 17 Apr 2026

https://github.com/bjornmelin/nlp-engineering-hub

📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤

cuda gpu-optimization huggingface huggingface-transformers langchain language-models large-language-models nlp openai python transformers

Last synced: 17 Apr 2026

https://github.com/vibesmiths/mcp-rvc

GPU service for voice cloning via Retrieval-based Voice Conversion (CUDA + ROCm)

cuda docker gpu rocm rvc tts voice-cloning

Last synced: 17 Apr 2026

https://github.com/vibesmiths/mcp-musicgen

GPU service for text-to-music generation via Meta AudioCraft (CUDA + ROCm)

audiocraft cuda docker gpu musicgen python rocm text-to-music

Last synced: 17 Apr 2026

https://github.com/briiqn/obj2schem

A CUDA enabled .obj model to schematic (Sponge V3) converter

cuda minecraft schematics wavefront-obj worldedit

Last synced: 17 Apr 2026

https://github.com/cs550-epfl/report

EPFL CS-550 project report

cuda formal-verification gpu memory-consistency ptx simt

Last synced: 03 Jun 2026

https://github.com/qompassai/qudaz

Qompass AI Cuda library for Zig

cuda zig

Last synced: 17 Apr 2026

https://github.com/kentakoong/mtnlog

A simple multinode performance logger for Python

cuda lanta nvitop python slurm-cluster

Last synced: 11 Jan 2026

https://github.com/qompassai/cuda

Qompass AI on CUDA

cuda nvidia

Last synced: 17 Apr 2026

https://github.com/synapticore-io/torch-cuda

PyTorch CUDA 12.8 project template - GPU-accelerated ML with modern Python packaging (uv), checkpointing, early stopping, and reproducibility utilities.

cuda gpu project-template python pytorch

Last synced: 04 Apr 2026

https://github.com/seieric/pytorch-mpi-singularity

Singularity Container including PyTorch with CUDA and mpi backend for DistributedDataParallel

cuda hpc nvidia openmpi pytorch singularity utokyo

Last synced: 18 Apr 2026

https://github.com/thalesmg/haskell-accelerate-parconc

Example and benchmark of Accelerate-HS from Parallel and Concurrent Programming in Haskell

accelerate cuda gpu-computing haskell parallel-computing

Last synced: 18 Apr 2026

https://github.com/qanastek/concurency-tetravex

This software is an fast and reliable tetravex solver based on C++ and CUDA.

c-plus-plus cuda parrallel-computing tetravex

Last synced: 18 Apr 2026

https://github.com/abdelrahman-amen/active_learning_in_nlp

I applied active learning to the IMDB dataset for sentiment analysis. Starting with a small labeled subset, I trained a model and used uncertainty sampling to select and label challenging reviews. This iterative process improved performance while reducing labeling effort.

activelearning cuda entropy imdb-dataset margin nlp python sklearnex torch uncertainty

Last synced: 18 Apr 2026

https://github.com/betarixm/csed490c

POSTECH: Heterogeneous Parallel Computing (Fall 2023)

cuda gpu parallel-computing postech

Last synced: 19 Apr 2026

https://github.com/evstigneevnm/slurm_gpu_mpi_docker

This is a repository that contains a sample of how to make a Dockerfile and compile your program that uses MPI into slurm with enroot and pyxis from NVIDIA.

cuda docker enroot mpi nvidia pyxis slurm

Last synced: 18 Apr 2026

https://github.com/cooliron2311/cumd5bf

CUDA based md5 password bruteforcer

cuda md5 python

Last synced: 18 Apr 2026

https://github.com/marcellodesales/nvidea-cuda-ubuntu-docker

NVidea CUDA base image on Ubuntu Linux, used to run Machine Learning

ai cuda docker docker-compose machine-learning ml nvidia-docker ubuntu

Last synced: 18 Apr 2026

https://github.com/dmmutua/cuda_projects

An Implementation of a variety of Algorithms & Technical Papers Mostly Related to Machine Learning & Deep Learning in CUDA C

c cuda cuda-programming deep-learning machine-learning machine-learning-algorithms

Last synced: 18 Apr 2026

https://github.com/genpat-it/ohe-rs

Ultra-fast one-hot encoding for bioinformatics and ML, powered by Rust + CUDA. Built for cgMLST allele profiles and large-scale categorical data.

bioinformatics cuda machine-learning one-hot-encoding performance pyo3 python rust

Last synced: 04 Jun 2026

https://github.com/liebemama/repo-fastapi

GPU-ready FastAPI AI inference server with plugin system, supporting CUDA, ROCm, CPU, and macOS MPS.

ai-server cuda fastapi gpu inference mps plugins pytorch rocm

Last synced: 05 Apr 2026

https://github.com/ex539/docker-dev-env

A collection of ready-to-use Docker development environments for multiple Linux distributions (Ubuntu, Debian, Alpine, Arch, Kali). Includes shared configurations, utility scripts, and comprehensive documentation for reproducible development setups across teams and CI/CD pipelines.

big-data cpp cuda docker docker-image docker-php docker-setup environment hadoop jenkins kubernetes qtcreator reproducibility x11

Last synced: 05 Apr 2026

https://github.com/sagar-brahaman/imagefilterpy

Example of custom image filter for MRTech IFF Python SDK

camera cuda dng genicam gpu h264 h265 image-processing jetson json mipi rest-api rtsp tiff

Last synced: 18 Apr 2026

https://github.com/aditiisaxena/cuda-accelerated-box-filter-for-texture-image-enhancement

Enhances grayscale texture images using a CUDA-based box filter. Built with CUDA, C++14, and OpenCV for high-performance image processing.

cpp cuda gpu-programming linux nvidia opencv

Last synced: 18 Apr 2026

https://github.com/equiel-1703/cuhip

Wrapper tool to convert CUDA source code to HIP code and compile it with HIPCC. Useful for learning CUDA programming using AMD devices..

cuda hip

Last synced: 14 May 2026

https://github.com/dougeeai/llama-cpp-python-wheels

Pre-built wheels for llama-cpp-python across platforms and CUDA versions

ampere cuda cuda13 gguf llama-cpp-python llm machine-learning prebuilt python313 rtx3060 rtx3070 rtx3080 rtx3090 wheels windows

Last synced: 18 Apr 2026

https://github.com/intelav/gpu-agent-opt

AI Agent Framework for GPU Kernel Autotuning & Optimization. Automate CUDA kernel exploration, profiling, and tuning with AI-driven agents for deep learning, geospatial AI, and HPC workloads.

ai-agents autotuning cuda deep-l edge-ai geospatial gpu hpc nvidia optimization performance pytorch

Last synced: 19 Apr 2026

https://github.com/vicen-te/tiny-nn

A tiny neural network framework for fully-connected layers with CPU and CUDA support

backpropagation cplusplus-20 cpu cuda cuda-12-8 kernel multi-threaded neural-network nn

Last synced: 19 Apr 2026

https://github.com/timanema/msc-thesis-public

Repository containing a GPU-accelerated compressor based on FSST

compression cpp cuda gpu thesis

Last synced: 19 Apr 2026

https://github.com/zjeffer/docker-arch-cuda

Arch Linux base image with the latest CUDA, CUDNN and LibTorch preinstalled.

archlinux cuda docker libtorch pytorch

Last synced: 19 Apr 2026

https://github.com/aledinola/ifp_cuda_mex

Solve the income fluctuation problem on the GPU

cuda gpu-computing matlab mex

Last synced: 14 May 2026

https://github.com/fatlipp/toyslam

SLAM implementation from scratch w/o external graph optimization libs

cuda gpu lidar-slam mapping odometry robotics slam

Last synced: 20 Apr 2026

https://github.com/ydkn/htw-progko-cuda

Parallel processing of image transformations. Part of the "Programmierkonzepte und Algorithmen" course at HTW-Berlin.

cuda image-transformations opencv

Last synced: 20 Apr 2026

https://github.com/tameronline/repo-fastapi

GPU-Ready FastAPI AI Inference Server with plugin system (CUDA/CPU/MPS/ROCm)

ai-server cuda deep-learning fastapi inference mps nlp plugins pytorch rocm

Last synced: 20 Apr 2026

https://github.com/rtfirst/voice-to-text

Cross-platform Push-to-Talk speech-to-text — local Whisper transcription (CUDA/MPS) with optional Anthropic API correction and live VU meter overlay. Windows 11 + macOS.

cuda macos push-to-talk python speech-to-text voice-input whisper windows

Last synced: 04 Jun 2026

https://github.com/amirbroker/cupydtw

Use Cuda for Dynamic Time Warping

cuda dtw dynamic-time-warping python

Last synced: 20 Apr 2026