An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/david-palma/cuda-programming

Educational CUDA C/C++ programming repository with commented examples on GPU parallel computing, matrix operations, and performance profiling. Requires a CUDA-enabled NVIDIA GPU.

c-cpp cpp cuda cuda-toolkit education gpu gpu-programming kernel matrix-operations nvcc nvidia parallel-computing parallel-programming practice profiling threads

Last synced: 25 Apr 2026

https://github.com/crcrpar/dev-chainer

Dockerfile for Chainer Development in VSCode

chainer cuda docker nvidia-docker vscode

Last synced: 26 Apr 2026

https://github.com/haleelrah/Vision-pro-MAX

A Raspberry Pi-based object detection system for assisting visually impaired individuals. This project utilizes YOLO object detection and a Hailo 8L TPU to identify obstacles like manholes, potholes, and bumps, providing real-time audio feedback to aid navigation.

bash computer-vision cuda fine-tuning jupyter-notebook object-detection opencv python pytorch raspberry-pi rpi-camera ssh text-to-speech ultralytics yolo yolov8

Last synced: 30 Dec 2025

https://github.com/gravitytwog/electromagneticfield

Electro-magnetic field simulation made with CUDA

c cuda cuda-kernels cuda-programming

Last synced: 26 Apr 2026

https://github.com/linux-alex/geep

GEEP (Genetic Evolutionary Engineering Platform) - a C++/Qt framework for genetic programming, optimized with CUDA acceleration. GEEP enables large-scale population-based optimization, ideal for solving high-dimensional problems using evolutionary algorithms and GPU computing.

cpp cuda framework genetic-programming

Last synced: 18 May 2026

https://github.com/oaslananka/cv_cuda_cpp_sample

This is a sample project demonstrating how to use OpenCV and CUDA in C++ for detecting people in drone footage with YOLO. The project aims to be simple and understandable for those who want to learn how to use OpenCV and CUDA in C++.

computervision cpp cuda opencv

Last synced: 01 May 2026

https://github.com/baremetalrt/baremetalrt

BareMetalRT — edge GPU compute mesh

cuda distributed-computing gpu inference llm nvidia tensorrt windows

Last synced: 18 Apr 2026

https://github.com/sergiomarquezdev/yt-transcriber

🛠️ CLI tool to transcribe YouTube videos using OpenAI Whisper with CUDA acceleration, generate AI summaries (EN/ES) with Gemini, and create LinkedIn/Twitter content. Supports YouTube, Google Drive, and local files.

ai cli cuda gemini python transcription whisper youtube

Last synced: 15 May 2026

https://github.com/separatrixxx/pgp_labs_7_sem

👓 Laboratory work for the 7 semester of MAI on PGP and PDP

cpp cuda nvidia

Last synced: 15 May 2026

https://github.com/mohammadshabazuddin/text_to_speech_generation_with_llm_with_hugging_face

Build a text-to-speech generation system using LLMs and Hugging Face to convert text into natural audio speech.

cuda huggingface-transformers llms nlp

Last synced: 03 May 2026

https://github.com/muppetsg2/cudaraytracer

A custom ray tracer originally developed during university studies to run on CPU, now ported to GPU using CUDA. This project was created to explore GPU rendering techniques and to gain hands-on experience with CUDA programming.

cuda mit-license nvidia-cuda nvidia-gpu raytracing sfml stb-image student-project study-project

Last synced: 16 Apr 2026

https://github.com/9prady9/archdock

Arch linux docker image for app development

arch-linux arrayfire cuda docker-image forge opencl

Last synced: 03 May 2026

https://github.com/tornikeo/minimal-vscode-cuda-meson

Minimal sample of using VSCode and Meson to build CUDA applications

cuda meson template vscode

Last synced: 08 Sep 2025

https://github.com/lablup/backend.ai-accelerator-cuda

The Backend.AI CUDA Accelerator Plugin

backendai cuda

Last synced: 16 May 2026

https://github.com/elcruzo/cuda-conv

Lightweight CUDA kernel for 2D image convolution achieving 20x+ speedup. Built with CuPy for the NVIDIA Hackathon.

computer-vision convolution cuda cupy gpu-computing hackathon high-performance-computing image-processing nvidia python

Last synced: 15 May 2026

https://github.com/yashpotdar-py/flood-vision

Flood Vision - A deep learning–based computer vision system for flood mapping and damage assessment using aerial imagery.

cuda deep-learning flood-detection iot python

Last synced: 16 Apr 2026

https://github.com/sferez/sspp_sparse_matrix_cuda

Small Scale Parallel Programming, Sparse Matrix multiplication with CUDA

cpp cuda omp omp-parallel parallel-computing small-scale-parallel-programming sparse-matrix

Last synced: 30 Apr 2026

https://github.com/aaaastark/nvidia-cuda-google-colab

Deployment of NVIDIA-CUDA on Google Colab. With in examples codes (Vector Addition and Matrix Multiplication).

c cpp cuda googlecolab googlecolaboratory matrix-multiplication nvidia python vector-addition

Last synced: 16 Apr 2026

https://github.com/alexjmercer/cuda-npp-assignment

Learning about CUDA and NVIDIA Performance Primitives. Part of Coursera Assignment.

cuda gpu-programming npp nppi

Last synced: 13 Feb 2026

https://github.com/tlabaltoh/tlab-sharescreen-server-win

Software frame encoder using CUDA and cast encoded frames over UDP. Trying to implement a custom streaming protocol and shader based frame encoder/decoder for screencast.

cuda desktop-capture screensharing unity unity3d windows-graphics-capture

Last synced: 14 Feb 2026

https://github.com/lehoangan2906/cuda_basics

A simple implementation of operations on vectors and matrices, optimized for running on Nvidia GPU with CUDA

cpp cuda cuda-programming

Last synced: 16 Jun 2025

https://github.com/ankhoa1212/cuda-program

This is a GPU program built with CUDA using parallel reduction

cpp cuda curand gpu-programming parallel-reduction

Last synced: 14 Feb 2026

https://github.com/srmlcn/spirals

The purpose of the Spirals script is to create a computer-generated image. The image maps to GPUs with CUDA support.

cgi cuda gpu numba nvidia python

Last synced: 28 Feb 2026

https://github.com/nagharjun17/mlir-to-ptx-cuda

Creating an MLIR dialect that fuses Addition + ReLU, lowers to NVVM and LLVM IR and generates PTX to run the kernel on CUDA GPU

cpp cuda deep-learning llvm mlir ptx

Last synced: 18 Apr 2026

https://github.com/mattjesc/gpu-accelerated-fap

GPU-Accelerated Frequency Analysis Prototype using CUDA, Unit Testing, and User-Defined Settings

c cmake cpp cuda cufft googletest gpu gpu-acceleration gpu-computing gpu-programming nvidia signal-processing test test-automation testing unit-testing

Last synced: 16 Apr 2026

https://github.com/smoke-y/athena

Deep learning library

cuda deep-learning deep-learning-library

Last synced: 01 Mar 2026

https://github.com/aarid/cuda_operations

This project compares performance between CPU and GPU with CUDA operations. Two simples cases are used: matrix multiplication and 2d convolution.

conv2d cuda cuda-programming gpu gpu-computing matrix-multiplication

Last synced: 02 Mar 2026

https://github.com/anselm67/cuda_mnist

A CUDA implementation of MNIST - for CUDA beginners.

cuda gpu gpu-computing gpu-programming mnist mnist-classification

Last synced: 02 Mar 2026

https://github.com/deltatecs/voses

Volatile Secret Searcher - massively parallel, brute force memory dump analysis for (D)TLS secret extraction

cuda memory-hacking reverse-engineering tls

Last synced: 15 Jun 2025

https://github.com/atticuszeller/pytorch-lightning-uv

📦 Zero-config Deep Learning template with PyTorch Lightning, UV package manager, W&B tracking, and modern Python tooling 🚀

classification cuda deep-learning machine-learning mnist-classification python pytorch pytorch-lightning typer uv

Last synced: 16 Apr 2026

https://github.com/deep-1704/coa_lab_repo_grp01

COA Lab assignments

cuda gpgpu-sim

Last synced: 24 Dec 2025

https://github.com/viktor-akusoff/chernabogpy

ChernabogPy is a Python package for visualizing gravitational distortions caused by black holes using nonlinear ray tracing.

cuda gpu physics-simulation python3 relativity-of-space-and-time torch

Last synced: 15 May 2026

https://github.com/eagleeee2/ethminer

EthMiner is a powerful Ethereum mining software optimized for GPU performance using OpenCL and CUDA technologies. It provides easy setup, detailed performance metrics, and robust compatibility with major mining pools, ensuring maximum efficiency and profitability for both novice and experienced miners.

cryptocurrency cuda eth ethash ethereum ethereum-mining gpu-mining mining-pool mining-software open-source

Last synced: 16 Apr 2026

https://github.com/harmeshgv/gpu-powered-bert-finetuning

Efficient fine-tuning of BERT models using CUDA-powered GPUs, optimized for laptops and devices with NVIDIA RTX 3000/4000 series or CUDA-compatible GPUs. Ideal for fast NLP model training with PyTorch and Hugging Face Transformers.

bert-model cuda finetuning-llms pytorch

Last synced: 16 Apr 2026

https://github.com/1180779/spheresraycasting

Raycasting of spheres

cuda opengl

Last synced: 02 Mar 2025

https://github.com/zury7/parallel-programming

A collection of performance optimizations and comparisons between multiprocessing and multithreading using pthreads, OpenMP, and CUDA. The experiments analyze execution speed, resource usage, and parallelization efficiency across different computational models. ( CS 4553 : Scientific Computing )

cuda openmp pthreads

Last synced: 08 May 2026

https://github.com/grizzz13/minimal-cuda

Minimal configurations to setup cuda cpp in cmake.

cmake cpp cuda

Last synced: 18 Apr 2026

https://github.com/AMYPAD/miutil

Basic functionality needed for AMYPAD

cuda matlab medical-imaging python

Last synced: 10 Apr 2025

https://github.com/phrutis/bip39scan

brute bip39 mnemonic GPU - $250

bip39 brute brute-force bruteforce cuda gpu mnemonic phrases seed

Last synced: 10 Apr 2025

https://github.com/joe-mruz/hgvisualizer

An interactive simulation and visualization tool for evolving hypergraphs, inspired by the Wolfram Physics Project.

cpp cuda hypergraph physics simulator wolfram

Last synced: 02 May 2026

https://github.com/iebeid/cuda-particles

A simple visualization of particles calcualted using CUDA

cuda opengl

Last synced: 17 Apr 2026

https://github.com/lbaf23/gpuinfo

cuda gpu

Last synced: 17 Apr 2026

https://github.com/farukalamai/cpp-for-cuda

A structured C++ learning path designed specifically for developers preparing to learn CUDA programming.

cpp cuda gpu nvidia

Last synced: 09 Jun 2026

https://github.com/jonmarty/pycuda-kmeans

A parallelized PyCuda implementation of the KMeans clustering algorithm.

cuda kmeans pycuda

Last synced: 25 Apr 2026

https://github.com/jdibenes/game_of_life_cuda

OpenGL / CUDA implementation of Conway's Game of Life.

cpp cuda opengl qt6 simulation

Last synced: 02 Apr 2026

https://github.com/chrisdalvit/gpu-matrix-transpose

Implementation and benchmarking of different matrix transpose with CUDA

c cpp cuda cuda-kernels cuda-programming gpu-acceleration gpu-computing gpu-programming matrix-transpose nvidia-gpu

Last synced: 17 Apr 2026

https://github.com/antoniakras/semantic-video-search

GPU-optimized semantic search on video transcripts, with benchmarking of FAISS, Pinecone, and PostgreSQL vector databases. Deployed via Docker on FORTH’s GPU infrastructure.

bert-embeddings bert-fine-tuning cuda dokcer embedding-models embeddings-word2vec faiss-vector-database gpu-computing huggingface-transformers nlp-machine-learning pgvector pineconedb postgresql python pytorch retrieval-augmented-generation similarity-search vector-database whisper-ai

Last synced: 03 May 2026

https://github.com/loreloc/triturus

A bunch of triton kernels with increasing complexity for learning and exploring triton and GPU programming

cuda pytorch triton

Last synced: 17 Apr 2026

https://github.com/stckvrflw/pem-spgemm

pemSpGEMM - An Improved SpGEMM Algorithm

cpp cuda

Last synced: 17 Apr 2026

https://github.com/void4main/bifurcation-diagram

These little python scripts plot a bifurcation diagram into a png file (work fine on a raspberry pi and accelerated on a NVIDIA Jetson Nano) - but still a lot of room for improvements ...

bifurcation cuda feigenbaum gpu jetson logistic map nano numba sequence vectorize

Last synced: 17 Apr 2026

https://github.com/bjornmelin/ml-production-engineering

⚙️ End-to-end ML deployment solutions. Focused on model serving, multi-GPU optimization, and production-grade system implementation. 🎯

cuda deployment docker fastapi gpu-computing kubernetes mlops production

Last synced: 17 Apr 2026

https://github.com/bjornmelin/nlp-engineering-hub

📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤

cuda gpu-optimization huggingface huggingface-transformers langchain language-models large-language-models nlp openai python transformers

Last synced: 17 Apr 2026

https://github.com/vibesmiths/mcp-rvc

GPU service for voice cloning via Retrieval-based Voice Conversion (CUDA + ROCm)

cuda docker gpu rocm rvc tts voice-cloning

Last synced: 17 Apr 2026

https://github.com/vibesmiths/mcp-musicgen

GPU service for text-to-music generation via Meta AudioCraft (CUDA + ROCm)

audiocraft cuda docker gpu musicgen python rocm text-to-music

Last synced: 17 Apr 2026

https://github.com/briiqn/obj2schem

A CUDA enabled .obj model to schematic (Sponge V3) converter

cuda minecraft schematics wavefront-obj worldedit

Last synced: 17 Apr 2026

https://github.com/cs550-epfl/report

EPFL CS-550 project report

cuda formal-verification gpu memory-consistency ptx simt

Last synced: 03 Jun 2026

https://github.com/kentakoong/mtnlog

A simple multinode performance logger for Python

cuda lanta nvitop python slurm-cluster

Last synced: 11 Jan 2026

https://github.com/qompassai/qudaz

Qompass AI Cuda library for Zig

cuda zig

Last synced: 17 Apr 2026

https://github.com/qompassai/cuda

Qompass AI on CUDA

cuda nvidia

Last synced: 17 Apr 2026

https://github.com/synapticore-io/torch-cuda

PyTorch CUDA 12.8 project template - GPU-accelerated ML with modern Python packaging (uv), checkpointing, early stopping, and reproducibility utilities.

cuda gpu project-template python pytorch

Last synced: 04 Apr 2026

https://github.com/seieric/pytorch-mpi-singularity

Singularity Container including PyTorch with CUDA and mpi backend for DistributedDataParallel

cuda hpc nvidia openmpi pytorch singularity utokyo

Last synced: 18 Apr 2026

https://github.com/thalesmg/haskell-accelerate-parconc

Example and benchmark of Accelerate-HS from Parallel and Concurrent Programming in Haskell

accelerate cuda gpu-computing haskell parallel-computing

Last synced: 18 Apr 2026

https://github.com/qanastek/concurency-tetravex

This software is an fast and reliable tetravex solver based on C++ and CUDA.

c-plus-plus cuda parrallel-computing tetravex

Last synced: 18 Apr 2026

https://github.com/abdelrahman-amen/active_learning_in_nlp

I applied active learning to the IMDB dataset for sentiment analysis. Starting with a small labeled subset, I trained a model and used uncertainty sampling to select and label challenging reviews. This iterative process improved performance while reducing labeling effort.

activelearning cuda entropy imdb-dataset margin nlp python sklearnex torch uncertainty

Last synced: 18 Apr 2026

https://github.com/betarixm/csed490c

POSTECH: Heterogeneous Parallel Computing (Fall 2023)

cuda gpu parallel-computing postech

Last synced: 19 Apr 2026

https://github.com/evstigneevnm/slurm_gpu_mpi_docker

This is a repository that contains a sample of how to make a Dockerfile and compile your program that uses MPI into slurm with enroot and pyxis from NVIDIA.

cuda docker enroot mpi nvidia pyxis slurm

Last synced: 18 Apr 2026

https://github.com/cooliron2311/cumd5bf

CUDA based md5 password bruteforcer

cuda md5 python

Last synced: 18 Apr 2026

https://github.com/marcellodesales/nvidea-cuda-ubuntu-docker

NVidea CUDA base image on Ubuntu Linux, used to run Machine Learning

ai cuda docker docker-compose machine-learning ml nvidia-docker ubuntu

Last synced: 18 Apr 2026

https://github.com/equiel-1703/cuhip

Wrapper tool to convert CUDA source code to HIP code and compile it with HIPCC. Useful for learning CUDA programming using AMD devices..

cuda hip

Last synced: 14 May 2026

https://github.com/dmmutua/cuda_projects

An Implementation of a variety of Algorithms & Technical Papers Mostly Related to Machine Learning & Deep Learning in CUDA C

c cuda cuda-programming deep-learning machine-learning machine-learning-algorithms

Last synced: 18 Apr 2026

https://github.com/genpat-it/ohe-rs

Ultra-fast one-hot encoding for bioinformatics and ML, powered by Rust + CUDA. Built for cgMLST allele profiles and large-scale categorical data.

bioinformatics cuda machine-learning one-hot-encoding performance pyo3 python rust

Last synced: 04 Jun 2026

https://github.com/liebemama/repo-fastapi

GPU-ready FastAPI AI inference server with plugin system, supporting CUDA, ROCm, CPU, and macOS MPS.

ai-server cuda fastapi gpu inference mps plugins pytorch rocm

Last synced: 05 Apr 2026

https://github.com/ex539/docker-dev-env

A collection of ready-to-use Docker development environments for multiple Linux distributions (Ubuntu, Debian, Alpine, Arch, Kali). Includes shared configurations, utility scripts, and comprehensive documentation for reproducible development setups across teams and CI/CD pipelines.

big-data cpp cuda docker docker-image docker-php docker-setup environment hadoop jenkins kubernetes qtcreator reproducibility x11

Last synced: 05 Apr 2026

https://github.com/sagar-brahaman/imagefilterpy

Example of custom image filter for MRTech IFF Python SDK

camera cuda dng genicam gpu h264 h265 image-processing jetson json mipi rest-api rtsp tiff

Last synced: 18 Apr 2026

https://github.com/aditiisaxena/cuda-accelerated-box-filter-for-texture-image-enhancement

Enhances grayscale texture images using a CUDA-based box filter. Built with CUDA, C++14, and OpenCV for high-performance image processing.

cpp cuda gpu-programming linux nvidia opencv

Last synced: 18 Apr 2026

https://github.com/aledinola/ifp_cuda_mex

Solve the income fluctuation problem on the GPU

cuda gpu-computing matlab mex

Last synced: 14 May 2026

https://github.com/dougeeai/llama-cpp-python-wheels

Pre-built wheels for llama-cpp-python across platforms and CUDA versions

ampere cuda cuda13 gguf llama-cpp-python llm machine-learning prebuilt python313 rtx3060 rtx3070 rtx3080 rtx3090 wheels windows

Last synced: 18 Apr 2026

https://github.com/intelav/gpu-agent-opt

AI Agent Framework for GPU Kernel Autotuning & Optimization. Automate CUDA kernel exploration, profiling, and tuning with AI-driven agents for deep learning, geospatial AI, and HPC workloads.

ai-agents autotuning cuda deep-l edge-ai geospatial gpu hpc nvidia optimization performance pytorch

Last synced: 19 Apr 2026

https://github.com/vicen-te/tiny-nn

A tiny neural network framework for fully-connected layers with CPU and CUDA support

backpropagation cplusplus-20 cpu cuda cuda-12-8 kernel multi-threaded neural-network nn

Last synced: 19 Apr 2026

https://github.com/timanema/msc-thesis-public

Repository containing a GPU-accelerated compressor based on FSST

compression cpp cuda gpu thesis

Last synced: 19 Apr 2026

https://github.com/zjeffer/docker-arch-cuda

Arch Linux base image with the latest CUDA, CUDNN and LibTorch preinstalled.

archlinux cuda docker libtorch pytorch

Last synced: 19 Apr 2026

https://github.com/ronaldsg20/compu-paralela

Códigos de ejemplo para computación paralela y distribuida

cuda opencv openmp posix-threads

Last synced: 14 May 2026

https://github.com/fatlipp/toyslam

SLAM implementation from scratch w/o external graph optimization libs

cuda gpu lidar-slam mapping odometry robotics slam

Last synced: 20 Apr 2026

https://github.com/ydkn/htw-progko-cuda

Parallel processing of image transformations. Part of the "Programmierkonzepte und Algorithmen" course at HTW-Berlin.

cuda image-transformations opencv

Last synced: 20 Apr 2026

https://github.com/tameronline/repo-fastapi

GPU-Ready FastAPI AI Inference Server with plugin system (CUDA/CPU/MPS/ROCm)

ai-server cuda deep-learning fastapi inference mps nlp plugins pytorch rocm

Last synced: 20 Apr 2026

https://github.com/rtfirst/voice-to-text

Cross-platform Push-to-Talk speech-to-text — local Whisper transcription (CUDA/MPS) with optional Anthropic API correction and live VU meter overlay. Windows 11 + macOS.

cuda macos push-to-talk python speech-to-text voice-input whisper windows

Last synced: 04 Jun 2026

https://github.com/amirbroker/cupydtw

Use Cuda for Dynamic Time Warping

cuda dtw dynamic-time-warping python

Last synced: 20 Apr 2026

https://github.com/arya2004/parallel-computing

Parallel Computing Uni Course

cuda

Last synced: 18 May 2026

https://github.com/alexkranias/triton_vs_cuda

Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.

cuda cuda-kernels gpu gpu-programming parallel-programming python triton

Last synced: 20 Apr 2026

https://github.com/juntyr/necsim-rust-docs

Documentation of the spatially explicit biodiversity simulation necsim-rust

biodiversity cuda docs mpi necsim rust simulation

Last synced: 14 May 2026

https://github.com/jusqua/dip-benchmark

Departmental undergraduate research project at UFS. Digital image processing benchmark using multiple tools to learn new ways to develop image processors.

benchmark cuda image-processing matlab opencv sycl visiongl

Last synced: 20 Apr 2026

https://github.com/bonevbs/cuknn

Cuda implementation of k-nearest neighbor search

cuda knn-search

Last synced: 20 Apr 2026

https://github.com/py-sandy/llama.cpp-windows-builder

Automated, reproducible build scripts for llama.cpp on Windows 10/11. Installs prerequisites, configures CMake and builds with CUDA.

ai build-scripts build-tool builder cuda llamacpp script scripts windows windows-10 windows-11

Last synced: 20 Apr 2026