An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/aaaastark/nvidia-cuda-google-colab

Deployment of NVIDIA-CUDA on Google Colab. With in examples codes (Vector Addition and Matrix Multiplication).

c cpp cuda googlecolab googlecolaboratory matrix-multiplication nvidia python vector-addition

Last synced: 16 Apr 2026

https://github.com/alexjmercer/cuda-npp-assignment

Learning about CUDA and NVIDIA Performance Primitives. Part of Coursera Assignment.

cuda gpu-programming npp nppi

Last synced: 13 Feb 2026

https://github.com/tlabaltoh/tlab-sharescreen-server-win

Software frame encoder using CUDA and cast encoded frames over UDP. Trying to implement a custom streaming protocol and shader based frame encoder/decoder for screencast.

cuda desktop-capture screensharing unity unity3d windows-graphics-capture

Last synced: 14 Feb 2026

https://github.com/AndreasKaratzas/orin

Setting up the NVIDIA Jetson Orin Nano Developer Kit

cuda cudnn jetpack6 nvidia-jetson nvidia-sdkmanager orin-nano

Last synced: 25 Feb 2025

https://github.com/ankhoa1212/cuda-program

This is a GPU program built with CUDA using parallel reduction

cpp cuda curand gpu-programming parallel-reduction

Last synced: 14 Feb 2026

https://github.com/srmlcn/spirals

The purpose of the Spirals script is to create a computer-generated image. The image maps to GPUs with CUDA support.

cgi cuda gpu numba nvidia python

Last synced: 28 Feb 2026

https://github.com/nagharjun17/mlir-to-ptx-cuda

Creating an MLIR dialect that fuses Addition + ReLU, lowers to NVVM and LLVM IR and generates PTX to run the kernel on CUDA GPU

cpp cuda deep-learning llvm mlir ptx

Last synced: 18 Apr 2026

https://github.com/anne-andresen/autoencoder_3d_c_cuda

3D Autoencoder training in raw C/CUDA

3d autoencoder c cuda nifti

Last synced: 28 Apr 2026

https://github.com/mattjesc/gpu-accelerated-fap

GPU-Accelerated Frequency Analysis Prototype using CUDA, Unit Testing, and User-Defined Settings

c cmake cpp cuda cufft googletest gpu gpu-acceleration gpu-computing gpu-programming nvidia signal-processing test test-automation testing unit-testing

Last synced: 16 Apr 2026

https://github.com/smoke-y/athena

Deep learning library

cuda deep-learning deep-learning-library

Last synced: 01 Mar 2026

https://github.com/aarid/cuda_operations

This project compares performance between CPU and GPU with CUDA operations. Two simples cases are used: matrix multiplication and 2d convolution.

conv2d cuda cuda-programming gpu gpu-computing matrix-multiplication

Last synced: 02 Mar 2026

https://github.com/anselm67/cuda_mnist

A CUDA implementation of MNIST - for CUDA beginners.

cuda gpu gpu-computing gpu-programming mnist mnist-classification

Last synced: 02 Mar 2026

https://github.com/fedesky25/hpc-project-2024

Project for the 2024 course of HPC: generator of streamplot of complex-valued functions

complex-numbers cuda openmp

Last synced: 30 Mar 2025

https://github.com/aledinola/ifp_cuda_mex

Solve the income fluctuation problem on the GPU

cuda gpu-computing matlab mex

Last synced: 14 May 2026

https://github.com/atticuszeller/pytorch-lightning-uv

📦 Zero-config Deep Learning template with PyTorch Lightning, UV package manager, W&B tracking, and modern Python tooling 🚀

classification cuda deep-learning machine-learning mnist-classification python pytorch pytorch-lightning typer uv

Last synced: 16 Apr 2026

https://github.com/ronaldsg20/compu-paralela

Códigos de ejemplo para computación paralela y distribuida

cuda opencv openmp posix-threads

Last synced: 14 May 2026

https://github.com/arya2004/parallel-computing

Parallel Computing Uni Course

cuda

Last synced: 18 May 2026

https://github.com/juntyr/necsim-rust-docs

Documentation of the spatially explicit biodiversity simulation necsim-rust

biodiversity cuda docs mpi necsim rust simulation

Last synced: 14 May 2026

https://github.com/cs550-epfl/review

Review of the paper A Formal Analysis of the NVIDIA PTX Memory Consistency Model

cuda formal-verification gpu memory-consistency ptx simt

Last synced: 30 Mar 2025

https://github.com/eagleeee2/ethminer

EthMiner is a powerful Ethereum mining software optimized for GPU performance using OpenCL and CUDA technologies. It provides easy setup, detailed performance metrics, and robust compatibility with major mining pools, ensuring maximum efficiency and profitability for both novice and experienced miners.

cryptocurrency cuda eth ethash ethereum ethereum-mining gpu-mining mining-pool mining-software open-source

Last synced: 16 Apr 2026

https://github.com/harmeshgv/gpu-powered-bert-finetuning

Efficient fine-tuning of BERT models using CUDA-powered GPUs, optimized for laptops and devices with NVIDIA RTX 3000/4000 series or CUDA-compatible GPUs. Ideal for fast NLP model training with PyTorch and Hugging Face Transformers.

bert-model cuda finetuning-llms pytorch

Last synced: 16 Apr 2026

https://github.com/td99/ai-sandbox

A collection of AI tools and prototypes.

ai cuda docker image-generation-ai nvidia python

Last synced: 08 Apr 2026

https://github.com/belrbez/ship-graphic-qt-qml-cuda-c

Client-Server application for Rocket driving in QML graphics

c client-server cpp cuda qml qt5 rocket

Last synced: 08 Apr 2026

https://github.com/uefi-code/bachelorgraduationdesign

I developed a PyTorch_For_PoorGuys framework and Let it train LLM on NVIDIA GeForce 2080Ti GPU as my Bachelor's Graduation Design Project

chatbot cuda gpu hacking large-language-models pytorch

Last synced: 03 May 2026

https://github.com/seanwevans/damnati

A CUDA-accelerated iterated prisoner's dilemma arena

arena cuda iterated-prisoners-dilemma prisoners-dilemma tournament

Last synced: 14 May 2026

https://github.com/dwain-barnes/llm-gguf-auto-converter

Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.

auto-converter batch-processing cuda gguf huggingface jupyter-notebook llama-cpp llm model-quantization

Last synced: 17 Jun 2025

https://github.com/sergeipapina/color2graycuda

color to gray image conversion nvidia CUDA kernel implementation using make or cmake to compile and link

cmake cuda cuda-kernels cuda-programming link makefile nvidia

Last synced: 06 Apr 2025

https://github.com/ousscher/esi_2cs_hpc_tp

A collection of High-Performance Computing (HPC) codes showcasing parallel computing techniques. This repository includes implementations in CUDA, MPI, OpenMP, and threading ...

c cuda mpi openmp pthreads

Last synced: 18 Mar 2025

https://github.com/iebeid/cuda-particles

A simple visualization of particles calcualted using CUDA

cuda opengl

Last synced: 17 Apr 2026

https://github.com/lbaf23/gpuinfo

cuda gpu

Last synced: 17 Apr 2026

https://github.com/jonmarty/pycuda-kmeans

A parallelized PyCuda implementation of the KMeans clustering algorithm.

cuda kmeans pycuda

Last synced: 25 Apr 2026

https://github.com/jdibenes/game_of_life_cuda

OpenGL / CUDA implementation of Conway's Game of Life.

cpp cuda opengl qt6 simulation

Last synced: 02 Apr 2026

https://github.com/chrisdalvit/gpu-matrix-transpose

Implementation and benchmarking of different matrix transpose with CUDA

c cpp cuda cuda-kernels cuda-programming gpu-acceleration gpu-computing gpu-programming matrix-transpose nvidia-gpu

Last synced: 17 Apr 2026

https://github.com/loreloc/triturus

A bunch of triton kernels with increasing complexity for learning and exploring triton and GPU programming

cuda pytorch triton

Last synced: 17 Apr 2026

https://github.com/stckvrflw/pem-spgemm

pemSpGEMM - An Improved SpGEMM Algorithm

cpp cuda

Last synced: 17 Apr 2026

https://github.com/void4main/bifurcation-diagram

These little python scripts plot a bifurcation diagram into a png file (work fine on a raspberry pi and accelerated on a NVIDIA Jetson Nano) - but still a lot of room for improvements ...

bifurcation cuda feigenbaum gpu jetson logistic map nano numba sequence vectorize

Last synced: 17 Apr 2026

https://github.com/bjornmelin/ml-production-engineering

⚙️ End-to-end ML deployment solutions. Focused on model serving, multi-GPU optimization, and production-grade system implementation. 🎯

cuda deployment docker fastapi gpu-computing kubernetes mlops production

Last synced: 17 Apr 2026

https://github.com/bjornmelin/nlp-engineering-hub

📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤

cuda gpu-optimization huggingface huggingface-transformers langchain language-models large-language-models nlp openai python transformers

Last synced: 17 Apr 2026

https://github.com/vibesmiths/mcp-rvc

GPU service for voice cloning via Retrieval-based Voice Conversion (CUDA + ROCm)

cuda docker gpu rocm rvc tts voice-cloning

Last synced: 17 Apr 2026

https://github.com/vibesmiths/mcp-musicgen

GPU service for text-to-music generation via Meta AudioCraft (CUDA + ROCm)

audiocraft cuda docker gpu musicgen python rocm text-to-music

Last synced: 17 Apr 2026

https://github.com/briiqn/obj2schem

A CUDA enabled .obj model to schematic (Sponge V3) converter

cuda minecraft schematics wavefront-obj worldedit

Last synced: 17 Apr 2026

https://github.com/cs550-epfl/report

EPFL CS-550 project report

cuda formal-verification gpu memory-consistency ptx simt

Last synced: 03 Jun 2026

https://github.com/adesoji1/youtubesummaryai

Python script for YouTube summary. The service should summarize an YouTube video by url. It should works for long video and for different languages.

cuda googleapi python3 speech-recognition transformers youtube-api-v3 youtube-dl

Last synced: 04 Apr 2025

https://github.com/qompassai/qudaz

Qompass AI Cuda library for Zig

cuda zig

Last synced: 17 Apr 2026

https://github.com/tylerfaulkner/n-body_simulation

CUDA N-Body Gravitational Simulation with rendering in Python with MatPlotLib

cuda simulation

Last synced: 20 May 2026

https://github.com/qompassai/cuda

Qompass AI on CUDA

cuda nvidia

Last synced: 17 Apr 2026

https://github.com/synapticore-io/torch-cuda

PyTorch CUDA 12.8 project template - GPU-accelerated ML with modern Python packaging (uv), checkpointing, early stopping, and reproducibility utilities.

cuda gpu project-template python pytorch

Last synced: 04 Apr 2026

https://github.com/seieric/pytorch-mpi-singularity

Singularity Container including PyTorch with CUDA and mpi backend for DistributedDataParallel

cuda hpc nvidia openmpi pytorch singularity utokyo

Last synced: 18 Apr 2026

https://github.com/thalesmg/haskell-accelerate-parconc

Example and benchmark of Accelerate-HS from Parallel and Concurrent Programming in Haskell

accelerate cuda gpu-computing haskell parallel-computing

Last synced: 18 Apr 2026

https://github.com/qanastek/concurency-tetravex

This software is an fast and reliable tetravex solver based on C++ and CUDA.

c-plus-plus cuda parrallel-computing tetravex

Last synced: 18 Apr 2026

https://github.com/abdelrahman-amen/active_learning_in_nlp

I applied active learning to the IMDB dataset for sentiment analysis. Starting with a small labeled subset, I trained a model and used uncertainty sampling to select and label challenging reviews. This iterative process improved performance while reducing labeling effort.

activelearning cuda entropy imdb-dataset margin nlp python sklearnex torch uncertainty

Last synced: 18 Apr 2026

https://github.com/betarixm/csed490c

POSTECH: Heterogeneous Parallel Computing (Fall 2023)

cuda gpu parallel-computing postech

Last synced: 19 Apr 2026

https://github.com/evstigneevnm/slurm_gpu_mpi_docker

This is a repository that contains a sample of how to make a Dockerfile and compile your program that uses MPI into slurm with enroot and pyxis from NVIDIA.

cuda docker enroot mpi nvidia pyxis slurm

Last synced: 18 Apr 2026

https://github.com/cooliron2311/cumd5bf

CUDA based md5 password bruteforcer

cuda md5 python

Last synced: 18 Apr 2026

https://github.com/marcellodesales/nvidea-cuda-ubuntu-docker

NVidea CUDA base image on Ubuntu Linux, used to run Machine Learning

ai cuda docker docker-compose machine-learning ml nvidia-docker ubuntu

Last synced: 18 Apr 2026

https://github.com/dmmutua/cuda_projects

An Implementation of a variety of Algorithms & Technical Papers Mostly Related to Machine Learning & Deep Learning in CUDA C

c cuda cuda-programming deep-learning machine-learning machine-learning-algorithms

Last synced: 18 Apr 2026

https://github.com/genpat-it/ohe-rs

Ultra-fast one-hot encoding for bioinformatics and ML, powered by Rust + CUDA. Built for cgMLST allele profiles and large-scale categorical data.

bioinformatics cuda machine-learning one-hot-encoding performance pyo3 python rust

Last synced: 04 Jun 2026

https://github.com/liebemama/repo-fastapi

GPU-ready FastAPI AI inference server with plugin system, supporting CUDA, ROCm, CPU, and macOS MPS.

ai-server cuda fastapi gpu inference mps plugins pytorch rocm

Last synced: 05 Apr 2026

https://github.com/ex539/docker-dev-env

A collection of ready-to-use Docker development environments for multiple Linux distributions (Ubuntu, Debian, Alpine, Arch, Kali). Includes shared configurations, utility scripts, and comprehensive documentation for reproducible development setups across teams and CI/CD pipelines.

big-data cpp cuda docker docker-image docker-php docker-setup environment hadoop jenkins kubernetes qtcreator reproducibility x11

Last synced: 05 Apr 2026

https://github.com/sagar-brahaman/imagefilterpy

Example of custom image filter for MRTech IFF Python SDK

camera cuda dng genicam gpu h264 h265 image-processing jetson json mipi rest-api rtsp tiff

Last synced: 18 Apr 2026

https://github.com/aditiisaxena/cuda-accelerated-box-filter-for-texture-image-enhancement

Enhances grayscale texture images using a CUDA-based box filter. Built with CUDA, C++14, and OpenCV for high-performance image processing.

cpp cuda gpu-programming linux nvidia opencv

Last synced: 18 Apr 2026

https://github.com/dougeeai/llama-cpp-python-wheels

Pre-built wheels for llama-cpp-python across platforms and CUDA versions

ampere cuda cuda13 gguf llama-cpp-python llm machine-learning prebuilt python313 rtx3060 rtx3070 rtx3080 rtx3090 wheels windows

Last synced: 18 Apr 2026

https://github.com/intelav/gpu-agent-opt

AI Agent Framework for GPU Kernel Autotuning & Optimization. Automate CUDA kernel exploration, profiling, and tuning with AI-driven agents for deep learning, geospatial AI, and HPC workloads.

ai-agents autotuning cuda deep-l edge-ai geospatial gpu hpc nvidia optimization performance pytorch

Last synced: 19 Apr 2026

https://github.com/vicen-te/tiny-nn

A tiny neural network framework for fully-connected layers with CPU and CUDA support

backpropagation cplusplus-20 cpu cuda cuda-12-8 kernel multi-threaded neural-network nn

Last synced: 19 Apr 2026

https://github.com/timanema/msc-thesis-public

Repository containing a GPU-accelerated compressor based on FSST

compression cpp cuda gpu thesis

Last synced: 19 Apr 2026

https://github.com/zjeffer/docker-arch-cuda

Arch Linux base image with the latest CUDA, CUDNN and LibTorch preinstalled.

archlinux cuda docker libtorch pytorch

Last synced: 19 Apr 2026

https://github.com/fatlipp/toyslam

SLAM implementation from scratch w/o external graph optimization libs

cuda gpu lidar-slam mapping odometry robotics slam

Last synced: 20 Apr 2026

https://github.com/ydkn/htw-progko-cuda

Parallel processing of image transformations. Part of the "Programmierkonzepte und Algorithmen" course at HTW-Berlin.

cuda image-transformations opencv

Last synced: 20 Apr 2026

https://github.com/tameronline/repo-fastapi

GPU-Ready FastAPI AI Inference Server with plugin system (CUDA/CPU/MPS/ROCm)

ai-server cuda deep-learning fastapi inference mps nlp plugins pytorch rocm

Last synced: 20 Apr 2026

https://github.com/rtfirst/voice-to-text

Cross-platform Push-to-Talk speech-to-text — local Whisper transcription (CUDA/MPS) with optional Anthropic API correction and live VU meter overlay. Windows 11 + macOS.

cuda macos push-to-talk python speech-to-text voice-input whisper windows

Last synced: 04 Jun 2026

https://github.com/amirbroker/cupydtw

Use Cuda for Dynamic Time Warping

cuda dtw dynamic-time-warping python

Last synced: 20 Apr 2026

https://github.com/alexkranias/triton_vs_cuda

Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.

cuda cuda-kernels gpu gpu-programming parallel-programming python triton

Last synced: 20 Apr 2026

https://github.com/rurumimic/candle

huggingface candle

cuda gpu huggingface nvidia transformer

Last synced: 05 May 2026

https://github.com/jusqua/dip-benchmark

Departmental undergraduate research project at UFS. Digital image processing benchmark using multiple tools to learn new ways to develop image processors.

benchmark cuda image-processing matlab opencv sycl visiongl

Last synced: 20 Apr 2026

https://github.com/bonevbs/cuknn

Cuda implementation of k-nearest neighbor search

cuda knn-search

Last synced: 20 Apr 2026

https://github.com/py-sandy/llama.cpp-windows-builder

Automated, reproducible build scripts for llama.cpp on Windows 10/11. Installs prerequisites, configures CMake and builds with CUDA.

ai build-scripts build-tool builder cuda llamacpp script scripts windows windows-10 windows-11

Last synced: 20 Apr 2026

https://github.com/larygwil/cuda-samples-old

nvidia cuda samples old (5.0 - 7.5)

cuda nvidia

Last synced: 03 May 2026

https://github.com/mrkct/cuda-raytracer

Simple CUDA-Accelerated raytracer

cuda gpu raytracing raytracing-one-weekend

Last synced: 21 Apr 2026

https://github.com/rai-project/dlperf

Déjà vu: Modeling DNN Performance by Recalling History

benchmark cuda deep-learning modeling onnx performance tensorflow

Last synced: 21 Apr 2026

https://github.com/musaibbashir/object-detection

Pytorch+CUDA implementation of several image classification and object detection models like YOLO , Fast-CNN, RF-DETR

cnn computer-vision cuda image-classification object-detection pytorch yolo

Last synced: 21 Apr 2026

https://github.com/jiaau/kernels

This repository showcases common optimization techniques for kernels.

cpp cuda cute cutlass hpc kernel

Last synced: 21 Apr 2026

https://github.com/fxzxmicah/fedora-llama-cpp

llama.cpp tools with OpenMP, CUDA, and OpenVINO support

cuda fedora llama-cpp openmp openvino rpm

Last synced: 05 Jun 2026

https://github.com/dimitrijkrstev/pp-cuda-fft

A parallelised CUDA implementation of the FFT Radix-2 algorithm and its execution time comparison to the DFT and non-parallelised Radix-2

cuda fft parallel-computing

Last synced: 22 Apr 2026

https://github.com/mdnpascual/judgebarmashvp

Error bar for the game called Mash VP

cuda emgucv screencapturer tesseract-ocr

Last synced: 22 Apr 2026

https://github.com/maxenceleguery/jat

Tensor library

computation cuda tensor

Last synced: 24 Apr 2026

https://github.com/bikemazzell/tuonella-sift

A high-performance, memory-efficient CSV deduplication tool

csv cuda deduplication logger osint rust

Last synced: 24 Apr 2026

https://github.com/bardifarsi/threadpoolmanager

ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.

cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe

Last synced: 24 Apr 2026

https://github.com/juntyr/necsim-rust-analysis

Analysis of the spatially explicit biodiversity simulation `necsim-rust`

analysis biodiversity cuda mpi necsim rust simulation

Last synced: 24 Apr 2026

https://github.com/alkaifaftab000/autonomous-maze-solver

Building an Autonomous Maze Solver using reinforcement learning to train agents for decision-making in dynamic grid-based environments

agent criticism cuda gymnasium-environment maze-solving-bot pytorch reinforcement-learning reward-functions

Last synced: 12 Apr 2026

https://github.com/0xsooki/extending-jax

JAX Custom Operations with C++ and CUDA (using Pybind11)

cuda jax pybind11 xla

Last synced: 25 Apr 2026

https://github.com/sangioai/torchpace

PyTorch CUDA/C++ extension of PACE: Transformer non-linearlity accelerator engine.

cuda pytorch transformer

Last synced: 25 Apr 2026

https://github.com/daviddavo/19gpu

Short exercises for GPU at Complutense University of Madrid. Mirror from GitLab

accelerator cuda gpu-programming

Last synced: 26 Apr 2026