An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/ubermorgott/morgottalk

Cross-platform desktop push-to-talk voice transcription. Single binary. GPU accelerated (CUDA/Vulkan/Metal/ROCm/OpenCL). Powered by whisper.cpp.

cuda desktop go gpu speech-to-text svelte transcription voice wails whisper

Last synced: 07 Apr 2026

https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-python

Explore how to use Numba—the just-in-time, type-specializing Python function compiler—to create and launch CUDA kernels to accelerate Python programs on massively parallel NVIDIA GPUs.

accelerated-computing cuda cuda-programming jit numba nvidia python

Last synced: 01 May 2026

https://github.com/programmergnome/kutyai

This is a python dog breed recognizer graphical application with 420 breeds and 42000 images.

cuda deep-learning image-classification python3 qt5-gui tensorflow transfer-learning

Last synced: 11 May 2026

https://github.com/drilonaliu/bachelor-thesis

Parallel Programming Fractals

cuda fractals gpu parallel-programming

Last synced: 15 May 2026

https://github.com/phantom7knight/cuda-fusion

This project is for learning CUDA to understand the GPU work better.

cuda cuda-programming gpgpu gpu

Last synced: 17 May 2026

https://github.com/chensongpoixs/cmedia_transcode

媒体服务转码版本GPU(cuda) 支持H264与H265转码

cuda gpu h264 h265 media transcode-media

Last synced: 19 May 2026

https://github.com/iebeid/cuda-particles

A simple visualization of particles calcualted using CUDA

cuda opengl

Last synced: 17 Apr 2026

https://github.com/lbaf23/gpuinfo

cuda gpu

Last synced: 17 Apr 2026

https://github.com/kratugautam99/logiclink-project

LogicLink is a conversational AI chatbot developed by Kratu Gautam (AIML Engineer). Powered by the TinyLlama-1.1B-Chat-v1.0 model, it provides an interactive interface for engaging conversations, query resolution, and task assistance. Version 5 features streaming responses, conversation management, and a sleek GUI.

antd-design chatbot-application conversational-ai cuda gradio graphical-user-interface huggingface-spaces huggingface-transformers jupyter-notebooks keras large-language-models mlops model-service-controller modelscope-studio natural-language-generation natural-language-processing pytorch reasoning-agent tensorflow

Last synced: 07 Apr 2026

https://github.com/andresvalle/ocr-extraction

Text extraction from images using EasyOCR and parallelization with PyTorch

cuda ocr pytorch

Last synced: 01 May 2026

https://github.com/zyn10/cuda_code

cude practice

cuda cuda-programming

Last synced: 22 Jun 2025

https://github.com/antoniakras/semantic-video-search

GPU-optimized semantic search on video transcripts, with benchmarking of FAISS, Pinecone, and PostgreSQL vector databases. Deployed via Docker on FORTH’s GPU infrastructure.

bert-embeddings bert-fine-tuning cuda dokcer embedding-models embeddings-word2vec faiss-vector-database gpu-computing huggingface-transformers nlp-machine-learning pgvector pineconedb postgresql python pytorch retrieval-augmented-generation similarity-search vector-database whisper-ai

Last synced: 03 May 2026

https://github.com/jonmarty/pycuda-kmeans

A parallelized PyCuda implementation of the KMeans clustering algorithm.

cuda kmeans pycuda

Last synced: 25 Apr 2026

https://github.com/mcp-tool-shop-org/gpu-container

Model-aware inference memory-placement planner for single-GPU rigs: profile hardware + model, generate explicit VRAM/RAM/NVMe placement plans across runtimes (llama.cpp/vLLM/...), and prove them with a measured receipt. Not VRAM overflow - declared placement.

cuda gpu inference llama-cpp llm moe offload vram wsl2

Last synced: 09 Jun 2026

https://github.com/jdibenes/game_of_life_cuda

OpenGL / CUDA implementation of Conway's Game of Life.

cpp cuda opengl qt6 simulation

Last synced: 02 Apr 2026

https://github.com/kar-dim/cas-2d

Implementation of the AMD FidelityFX CAS (Contrast Adaptive Sharpening) algorithm on CUDA/OpenCL, for sharpening static images.

cpp cuda dll fidelityfx gpu image-processing parallel-computing sharpen

Last synced: 22 Jun 2025

https://github.com/chrisdalvit/gpu-matrix-transpose

Implementation and benchmarking of different matrix transpose with CUDA

c cpp cuda cuda-kernels cuda-programming gpu-acceleration gpu-computing gpu-programming matrix-transpose nvidia-gpu

Last synced: 17 Apr 2026

https://github.com/marius311/cudadistributedtools.jl

A set of utility tools for multi-GPU + multi-process workflows

cuda distributed julia

Last synced: 01 May 2026

https://github.com/kanchishimono/python-images

Ubuntu based Python container images, including CUDA images

container-image cuda docker dockerfile machine-learning python python3

Last synced: 30 Apr 2026

https://github.com/loreloc/triturus

A bunch of triton kernels with increasing complexity for learning and exploring triton and GPU programming

cuda pytorch triton

Last synced: 17 Apr 2026

https://github.com/stckvrflw/pem-spgemm

pemSpGEMM - An Improved SpGEMM Algorithm

cpp cuda

Last synced: 17 Apr 2026

https://github.com/void4main/bifurcation-diagram

These little python scripts plot a bifurcation diagram into a png file (work fine on a raspberry pi and accelerated on a NVIDIA Jetson Nano) - but still a lot of room for improvements ...

bifurcation cuda feigenbaum gpu jetson logistic map nano numba sequence vectorize

Last synced: 17 Apr 2026

https://github.com/bjornmelin/ml-production-engineering

⚙️ End-to-end ML deployment solutions. Focused on model serving, multi-GPU optimization, and production-grade system implementation. 🎯

cuda deployment docker fastapi gpu-computing kubernetes mlops production

Last synced: 17 Apr 2026

https://github.com/bjornmelin/nlp-engineering-hub

📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤

cuda gpu-optimization huggingface huggingface-transformers langchain language-models large-language-models nlp openai python transformers

Last synced: 17 Apr 2026

https://github.com/rkarahul/person-detector-faceverifier

Person-Detector-FaceVerifier is a sophisticated system for detecting and verifying faces in images. Ideal for applications like passport control and security, it combines advanced face detection with precise verification techniques.

bootstrap5 css3 cuda django html5 javascipt opencv-python os python pytorch yolov8

Last synced: 07 Apr 2026

https://github.com/f14-bertolotti/torchess

cuda torch extension for a chess engine

chess cuda torch

Last synced: 01 May 2026

https://github.com/ribin-baby/cuda_cudnn_installation_on_ubuntu20.04

Installation of CUDA-11.8 with cuDNN-8.7 for ubuntu(20.04) server A30 GPU, and onnx gpu installation guide

cuda gpu linux onnxruntime server

Last synced: 16 May 2026

https://github.com/vibesmiths/mcp-rvc

GPU service for voice cloning via Retrieval-based Voice Conversion (CUDA + ROCm)

cuda docker gpu rocm rvc tts voice-cloning

Last synced: 17 Apr 2026

https://github.com/vibesmiths/mcp-musicgen

GPU service for text-to-music generation via Meta AudioCraft (CUDA + ROCm)

audiocraft cuda docker gpu musicgen python rocm text-to-music

Last synced: 17 Apr 2026

https://github.com/briiqn/obj2schem

A CUDA enabled .obj model to schematic (Sponge V3) converter

cuda minecraft schematics wavefront-obj worldedit

Last synced: 17 Apr 2026

https://github.com/hshindo/libcuda.jl

CUDA GPU array for Julia

cuda gpu julia

Last synced: 16 May 2026

https://github.com/cs550-epfl/report

EPFL CS-550 project report

cuda formal-verification gpu memory-consistency ptx simt

Last synced: 03 Jun 2026

https://github.com/flosmume/cpp-cuda-deepvision-rtx-starter

CUDA C++ practice project for RTX 4070 SUPER — explore GPU concurrency, pinned memory, and Nsight profiling. Includes SAXPY and 2D blur kernels to train optimization, stream overlap, and timing analysis for NVIDIA Developer Technology Engineering skillset.

cpp cuda cuda-kernels cuda-streams deep-learning-inference gpu gpu-optimization gpu-profiling high-performance-computing nsight nvidia parrallel-computing pinned-memory

Last synced: 16 May 2026

https://github.com/ahmadrafidev/learn-cuda

A place where I learn about CUDA

cuda cuda-programming gpu os parallel-programming

Last synced: 13 Apr 2025

https://github.com/qompassai/qudaz

Qompass AI Cuda library for Zig

cuda zig

Last synced: 17 Apr 2026

https://github.com/drbh/quemer

GPU accelerated k-mer counter

biology cuda gpu

Last synced: 07 May 2025

https://github.com/aeyage/intraday_prices

GPU-accelerated portfolio optimisation

cuda cupy nvidia-gpu

Last synced: 05 Apr 2025

https://github.com/illagrenan/cuda-90-cudnn7-runtime-1604-py36

Ubuntu 16.04 with Python 3.6 and CUDA9 Dockerfile

cuda dockerfile python ubuntu

Last synced: 03 May 2026

https://github.com/morristai/kvik-rs

KvikIO Rust implementation

cuda cufile gds kvikio nvidia rust

Last synced: 02 Apr 2026

https://github.com/qompassai/cuda

Qompass AI on CUDA

cuda nvidia

Last synced: 17 Apr 2026

https://github.com/synapticore-io/torch-cuda

PyTorch CUDA 12.8 project template - GPU-accelerated ML with modern Python packaging (uv), checkpointing, early stopping, and reproducibility utilities.

cuda gpu project-template python pytorch

Last synced: 04 Apr 2026

https://github.com/seieric/pytorch-mpi-singularity

Singularity Container including PyTorch with CUDA and mpi backend for DistributedDataParallel

cuda hpc nvidia openmpi pytorch singularity utokyo

Last synced: 18 Apr 2026

https://github.com/psteinb/gtc2017

Slides for my presentation at GTC 2017 from May 8-11 in Silicon Valley

compression cuda ffmpeg gpu gpu-computing h264 h265 microscopes spim

Last synced: 03 May 2026

https://github.com/thalesmg/haskell-accelerate-parconc

Example and benchmark of Accelerate-HS from Parallel and Concurrent Programming in Haskell

accelerate cuda gpu-computing haskell parallel-computing

Last synced: 18 Apr 2026

https://github.com/qanastek/concurency-tetravex

This software is an fast and reliable tetravex solver based on C++ and CUDA.

c-plus-plus cuda parrallel-computing tetravex

Last synced: 18 Apr 2026

https://github.com/abdelrahman-amen/active_learning_in_nlp

I applied active learning to the IMDB dataset for sentiment analysis. Starting with a small labeled subset, I trained a model and used uncertainty sampling to select and label challenging reviews. This iterative process improved performance while reducing labeling effort.

activelearning cuda entropy imdb-dataset margin nlp python sklearnex torch uncertainty

Last synced: 18 Apr 2026

https://github.com/betarixm/csed490c

POSTECH: Heterogeneous Parallel Computing (Fall 2023)

cuda gpu parallel-computing postech

Last synced: 19 Apr 2026

https://github.com/ousscher/esi_2cs_hpc_tp

A collection of High-Performance Computing (HPC) codes showcasing parallel computing techniques. This repository includes implementations in CUDA, MPI, OpenMP, and threading ...

c cuda mpi openmp pthreads

Last synced: 18 Mar 2025

https://github.com/aledinola/ifp_cuda_mex

Solve the income fluctuation problem on the GPU

cuda gpu-computing matlab mex

Last synced: 14 May 2026

https://github.com/evstigneevnm/slurm_gpu_mpi_docker

This is a repository that contains a sample of how to make a Dockerfile and compile your program that uses MPI into slurm with enroot and pyxis from NVIDIA.

cuda docker enroot mpi nvidia pyxis slurm

Last synced: 18 Apr 2026

https://github.com/cooliron2311/cumd5bf

CUDA based md5 password bruteforcer

cuda md5 python

Last synced: 18 Apr 2026

https://github.com/marcellodesales/nvidea-cuda-ubuntu-docker

NVidea CUDA base image on Ubuntu Linux, used to run Machine Learning

ai cuda docker docker-compose machine-learning ml nvidia-docker ubuntu

Last synced: 18 Apr 2026

https://github.com/dmmutua/cuda_projects

An Implementation of a variety of Algorithms & Technical Papers Mostly Related to Machine Learning & Deep Learning in CUDA C

c cuda cuda-programming deep-learning machine-learning machine-learning-algorithms

Last synced: 18 Apr 2026

https://github.com/ne0nwinds/gpupuzzles

My solutions to srush/GPU-Puzzles using CUDA

cpp cuda gpgpu

Last synced: 16 May 2026

https://github.com/genpat-it/ohe-rs

Ultra-fast one-hot encoding for bioinformatics and ML, powered by Rust + CUDA. Built for cgMLST allele profiles and large-scale categorical data.

bioinformatics cuda machine-learning one-hot-encoding performance pyo3 python rust

Last synced: 04 Jun 2026

https://github.com/ronaldsg20/compu-paralela

Códigos de ejemplo para computación paralela y distribuida

cuda opencv openmp posix-threads

Last synced: 14 May 2026

https://github.com/liebemama/repo-fastapi

GPU-ready FastAPI AI inference server with plugin system, supporting CUDA, ROCm, CPU, and macOS MPS.

ai-server cuda fastapi gpu inference mps plugins pytorch rocm

Last synced: 05 Apr 2026

https://github.com/ex539/docker-dev-env

A collection of ready-to-use Docker development environments for multiple Linux distributions (Ubuntu, Debian, Alpine, Arch, Kali). Includes shared configurations, utility scripts, and comprehensive documentation for reproducible development setups across teams and CI/CD pipelines.

big-data cpp cuda docker docker-image docker-php docker-setup environment hadoop jenkins kubernetes qtcreator reproducibility x11

Last synced: 05 Apr 2026

https://github.com/lionpsiuc/postgraduate

A collection of assignments and projects completed during my M.Sc. in High-Performance Computing at Trinity College Dublin.

c cpp cuda

Last synced: 01 May 2026

https://github.com/manishklach/gb300-rl-runtime

Close-to-metal C/CUDA lab for RL inference fast paths: persistent GPU workers, hugepage KV arenas, cacheline-aware command rings, and async reward handoff. Goal: remove page faults, malloc/free, scheduler wakeups, CPU round-trips, and KV migration from the per-token path.

ai-infrastructure close-to-metal cuda gb300 gpu-inference hpc lock-free nvlink reinforcement-learning spsc-queue

Last synced: 09 Jun 2026

https://github.com/sagar-brahaman/imagefilterpy

Example of custom image filter for MRTech IFF Python SDK

camera cuda dng genicam gpu h264 h265 image-processing jetson json mipi rest-api rtsp tiff

Last synced: 18 Apr 2026

https://github.com/aditiisaxena/cuda-accelerated-box-filter-for-texture-image-enhancement

Enhances grayscale texture images using a CUDA-based box filter. Built with CUDA, C++14, and OpenCV for high-performance image processing.

cpp cuda gpu-programming linux nvidia opencv

Last synced: 18 Apr 2026

https://github.com/naetherm/derelictcurand

Dynamic bindings to the CuRAND library for the D Programming Language.

cuda curand d derelict dlang

Last synced: 27 Mar 2025

https://github.com/dougeeai/llama-cpp-python-wheels

Pre-built wheels for llama-cpp-python across platforms and CUDA versions

ampere cuda cuda13 gguf llama-cpp-python llm machine-learning prebuilt python313 rtx3060 rtx3070 rtx3080 rtx3090 wheels windows

Last synced: 18 Apr 2026

https://github.com/intelav/gpu-agent-opt

AI Agent Framework for GPU Kernel Autotuning & Optimization. Automate CUDA kernel exploration, profiling, and tuning with AI-driven agents for deep learning, geospatial AI, and HPC workloads.

ai-agents autotuning cuda deep-l edge-ai geospatial gpu hpc nvidia optimization performance pytorch

Last synced: 19 Apr 2026

https://github.com/vicen-te/tiny-nn

A tiny neural network framework for fully-connected layers with CPU and CUDA support

backpropagation cplusplus-20 cpu cuda cuda-12-8 kernel multi-threaded neural-network nn

Last synced: 19 Apr 2026

https://github.com/bd2720/accesspatterns

Comparing chunked vs. striped memory access patterns for CPU and GPU code using the CUDA toolkit in C.

c cache cuda cuda-toolkit performance-analysis performance-testing profiling

Last synced: 16 May 2026

https://github.com/timanema/msc-thesis-public

Repository containing a GPU-accelerated compressor based on FSST

compression cpp cuda gpu thesis

Last synced: 19 Apr 2026

https://github.com/zjeffer/docker-arch-cuda

Arch Linux base image with the latest CUDA, CUDNN and LibTorch preinstalled.

archlinux cuda docker libtorch pytorch

Last synced: 19 Apr 2026

https://github.com/zepedroresende/matrixmultiplication

Matrix Multiplication optimizations on intel and CUDA

c cpp cuda hpc matrix-multiplication omp optimization

Last synced: 01 May 2026

https://github.com/yash-1335/qwen600

🚀 Build a fast inference engine for the QWEN3-0.6B model using CUDA, optimizing performance with minimal dependencies for efficient learning and practice.

cuda cuda-programming gpu llamacpp llm llm-inference qwen qwen3 transformer

Last synced: 16 May 2026

https://github.com/fatlipp/toyslam

SLAM implementation from scratch w/o external graph optimization libs

cuda gpu lidar-slam mapping odometry robotics slam

Last synced: 20 Apr 2026

https://github.com/ydkn/htw-progko-cuda

Parallel processing of image transformations. Part of the "Programmierkonzepte und Algorithmen" course at HTW-Berlin.

cuda image-transformations opencv

Last synced: 20 Apr 2026

https://github.com/tameronline/repo-fastapi

GPU-Ready FastAPI AI Inference Server with plugin system (CUDA/CPU/MPS/ROCm)

ai-server cuda deep-learning fastapi inference mps nlp plugins pytorch rocm

Last synced: 20 Apr 2026

https://github.com/rtfirst/voice-to-text

Cross-platform Push-to-Talk speech-to-text — local Whisper transcription (CUDA/MPS) with optional Anthropic API correction and live VU meter overlay. Windows 11 + macOS.

cuda macos push-to-talk python speech-to-text voice-input whisper windows

Last synced: 04 Jun 2026

https://github.com/amirbroker/cupydtw

Use Cuda for Dynamic Time Warping

cuda dtw dynamic-time-warping python

Last synced: 20 Apr 2026

https://github.com/lanceberge/cuda-newton-fractals

Parallelize and visualize the Newton Iteration

cpp cuda mathematical-modelling visualization

Last synced: 16 May 2026

https://github.com/alexkranias/triton_vs_cuda

Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.

cuda cuda-kernels gpu gpu-programming parallel-programming python triton

Last synced: 20 Apr 2026

https://github.com/voschezang/holographic-projector-simulations

Optimizations of Simulations of Holographic Projectors using CUDA

cuda gpu holography parallel-computing photonics

Last synced: 16 May 2026

https://github.com/jusqua/dip-benchmark

Departmental undergraduate research project at UFS. Digital image processing benchmark using multiple tools to learn new ways to develop image processors.

benchmark cuda image-processing matlab opencv sycl visiongl

Last synced: 20 Apr 2026

https://github.com/bonevbs/cuknn

Cuda implementation of k-nearest neighbor search

cuda knn-search

Last synced: 20 Apr 2026

https://github.com/py-sandy/llama.cpp-windows-builder

Automated, reproducible build scripts for llama.cpp on Windows 10/11. Installs prerequisites, configures CMake and builds with CUDA.

ai build-scripts build-tool builder cuda llamacpp script scripts windows windows-10 windows-11

Last synced: 20 Apr 2026

https://github.com/mrkct/cuda-raytracer

Simple CUDA-Accelerated raytracer

cuda gpu raytracing raytracing-one-weekend

Last synced: 21 Apr 2026

https://github.com/rai-project/dlperf

Déjà vu: Modeling DNN Performance by Recalling History

benchmark cuda deep-learning modeling onnx performance tensorflow

Last synced: 21 Apr 2026

https://github.com/musaibbashir/object-detection

Pytorch+CUDA implementation of several image classification and object detection models like YOLO , Fast-CNN, RF-DETR

cnn computer-vision cuda image-classification object-detection pytorch yolo

Last synced: 21 Apr 2026

https://github.com/arya2004/parallel-computing

Parallel Computing Uni Course

cuda

Last synced: 18 May 2026