An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/shashshukla/ee-210-signals-and-systems

Code for the assignments for EE-210, Signals and Systems, at IIT Bombay 2016.

cuda image-processing signal-processing

Last synced: 26 Apr 2026

https://github.com/countzero/windows_exllama

This is a playground to explore the ExLlama project in a Windows environment.

conda cuda exllama python torch

Last synced: 26 Apr 2026

https://github.com/alexyzha/cuda-bioinformatics

A CUDA-Accelerated Bioinformatics Toolchain

bioinformatics bioinformatics-tool cplusplus cuda

Last synced: 26 Apr 2026

https://github.com/mateuszk098/parallel-programming-examples

Simple parallel programming examples with CUDA, MPI and OpenMP.

cpp cuda mpi openmp parallel-programming

Last synced: 27 Apr 2026

https://github.com/kbredies/tgv_pycuda

Algorithms, examples and tests for denoising, deblurring, zooming, dequantization and compressive imaging with total variation (TV) and second-order total generalized variation (TGV) regularization. GPU-accelerated code using PyCUDA.

compressive-imaging cuda image-deblurring image-denoising image-dequantization image-zooming python3 total-generalized-variation total-variation

Last synced: 27 Apr 2026

https://github.com/notkartikye/cuda-image-box-filters

🖼️ CUDA-powered tool for applying box filters to a large amount of images

cuda cuda-library cuda-programming npp

Last synced: 27 Apr 2026

https://github.com/gladap/heterogeneous_computing_project

Heterogeneous parallel programming exercise using OpenMP and CUDA to parallelize image filters

cuda heterogeneous-parallel-programming

Last synced: 27 Apr 2026

https://github.com/perhuepenbecker/cudyn

CUDA library for irregular tasks using a dynamic block-internal balancing mechanism

cpp cuda cuda-library cuda-programming gpu-computing gpu-programming irregular

Last synced: 28 Apr 2026

https://github.com/ncorgan/arrayfire-config-info

A small command-line utility that outputs all available ArrayFire devices

arrayfire cuda gpu opencl

Last synced: 28 Apr 2026

https://github.com/obsidianplusplus/yolov5-tensorrt-accelerator

基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization

cuda dynamic-shapes-cuda-stream fp16 int8 pycuda tensorrt yolov5

Last synced: 28 Apr 2026

https://github.com/dlzou/rt-weekend

Ray Tracing in One Weekend, using CUDA

cuda ray-tracing

Last synced: 28 Apr 2026

https://github.com/rog0d/gpuss_watchers

"The GPU Watchers swore upon their shared memory hierarchy, from L1 to global memory, which also served as their mandate as lords of parallel computation."

cuda gpu-acceleration gpu-monitoring gpu-profiling

Last synced: 28 Apr 2026

https://github.com/axeloooo/pytorch

Collection of deep learning workflows in PyTorch, from fundamentals and classification to transfer learning and experiment tracking.

cuda python pytorch

Last synced: 28 Apr 2026

https://github.com/ltsyk/smart-snake-ai

Advanced Deep Q-Network AI for Snake Game with CUDA support and 700% performance boost

artificial-intelligence cuda deep-q-network dqn game-ai machine-learning pytorch reinforcement-learning snake-game

Last synced: 28 Apr 2026

https://github.com/atelierarith/julia_gpu_playground

For those who want use Julia with GPU

cuda docker docker-compose julia

Last synced: 28 Apr 2026

https://github.com/ccfelius/hpc

High Performance Computing (CUDA, MPI/openMP, high performance ML)

cuda high-performance-computing machine-learning mpi

Last synced: 28 Apr 2026

https://github.com/mathiasotnes/gemm

General Matrix Multiplication (GEMM) optimization in Cuda.

cuda gpu

Last synced: 26 Mar 2025

https://github.com/emanuelemessina/cuda-benchmark

Evaluate matrix calculations time between CPU and GPU (CUDA)

benchmark cuda matrix-calculations

Last synced: 28 Apr 2026

https://github.com/shermanlo77/modefilter

ImageJ plugin, Java and CuPy implementation of the mode filter and empirical null filter. The mode filter is an edge-preserving smoothing filter by taking the mode of the empirical density.

cuda cupy empirical-null fiji filter image-filter imagej jcuda mode-filter

Last synced: 28 Apr 2026

https://github.com/fedimser/aldyparen

Renders pictures and videos with algebraic fractals

cuda fractals graphics

Last synced: 29 Apr 2026

https://github.com/sandialabs/tenzing

Core library for optimizing CUDA+MPI programs as sequential decision problems.

cuda mpi scr-2759 sequential-decision-problem

Last synced: 29 Apr 2026

https://github.com/snandasena/cuda-at-scale-for-the-enterprise

Gauss Filter with CUDA and NPP

cpp cuda gpu nvidia

Last synced: 29 Apr 2026

https://github.com/apostolis1/parallel-processing-systems

Project of the undergrad course "Parallel Processing Systems" - NTUA

benchmark c cuda mpi openmp parallel-computing

Last synced: 29 Apr 2026

https://github.com/giog97/histogram_equalization_cuda

Performance comparison of sequential and parallel CUDA Histogram Equalization for image contrast enhancement.

cuda cuda-kernels cuda-programming histogram-equalization image-processing parallel-computing parallel-programming

Last synced: 29 Apr 2026

https://github.com/jonastoth/cuda_raytracer

University project to implement a basic Raytracer in CUDA

cpp14 cuda raytracer

Last synced: 29 Apr 2026

https://github.com/rdma-from-gpu/.github

Public code release for our paper "Toward GPU-centric Networking on Commodity Hardware"

cuda gpu linux network rdma research

Last synced: 29 Apr 2026

https://github.com/dogrego/gpgpu-rainbow-raytracer

A GPU-accelerated rainbow ray tracer with CPU reference implementation, CUDA for parallelized refraction/reflection, and OpenGL for interactive visualization

cuda gpgpu raytracing

Last synced: 29 Apr 2026

https://github.com/jeong-j/multicore

Multi Thread in Java / C / C++ / Pthread / CUDA

c cpp cuda java multicore pthread thread

Last synced: 29 Apr 2026

https://github.com/fikri-rouzan/cuda-c-program-part-2

CUDA C program from NVIDIA course.

c cuda

Last synced: 30 Apr 2026

https://github.com/fulvius31/triton-cache-tracker

A lightweight utility for monitoring and analyzing Triton kernel compilation cache behavior.

cache cuda gpu gpu-kernels triton triton-openai

Last synced: 30 Apr 2026

https://github.com/gaurisharan/cuda-ml-kernels

Repo for CUDA C++ GPU kernels for ML and HPC.

cpp cuda gpu hpc kernels ml parallel-computing systems-ml

Last synced: 30 Apr 2026

https://github.com/neel-dandiwala/npp_cudaatscale_project

For the enterprise course project, I have created a model that executes the histogram equalisation procedure on the given input image file.

cuda npp

Last synced: 30 Apr 2026

https://github.com/puzzlef/vector-multiplication-cuda

Comparing approaches for CUDA-based vector multiplication.

algorithm cuda map multiply operation pagerank primitive

Last synced: 30 Apr 2026

https://github.com/mahshid1378/piper-plus-3

Multilingual neural TTS (6 languages: JA/EN/ZH/ES/FR/PT, code supports SV) — C++, C#, Rust, Go, Python, npm (WASM). VITS + Prosody, streaming, CUDA/CoreML/DirectML. pip install piper-plus | npm install piper-plus | cargo install piper-plus-cli

cross-platform csharp cuda deep-learning dotnet japanese multilingual nuget onnx pytorch rust speech-synthesis streaming text-to-speech tts vits webassembly

Last synced: 08 Jun 2026

https://github.com/actepukc/uv-app-starter-pack

Bootstrap PySide6 GUI apps quickly using uv, with built-in PyTorch/CUDA handling.

astral-uv cross-platform cuda gui pyside6 python pytorch qt6 starter-kit template

Last synced: 30 Apr 2026

https://github.com/ivanbuccella/sf2bio

Deep reinforcement learning for de novo drug design: a ReLeaSe method execution on a Docker Environment

cuda deep-learning deep-reinforcement-learning docker docker-compose machine-learning nvidia-cuda nvidia-docker reinforcement-learning release release-method

Last synced: 01 May 2026

https://github.com/mrtejas/cv-sandbox

A collection of Computer Vision mini-projects tuned for a number of tasks, including face detection, object detection, image segmentation and CLIP. Trained on popular datasets and includes comparative study of the methods. Done as a part of S24 course : Computer Vision at IIIT Hyd

computer-vision cuda ml opencv pytorch yolo

Last synced: 01 May 2026

https://github.com/fikri-rouzan/cuda-c-program-part-3

CUDA C program from NVIDIA course.

c cuda

Last synced: 01 May 2026

https://github.com/darshanakgr/meanfiltergpu

A gpu implementation of mean filter in CUDA

c cuda image-processing

Last synced: 01 May 2026

https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-python

Explore how to use Numba—the just-in-time, type-specializing Python function compiler—to create and launch CUDA kernels to accelerate Python programs on massively parallel NVIDIA GPUs.

accelerated-computing cuda cuda-programming jit numba nvidia python

Last synced: 01 May 2026

https://github.com/andresvalle/ocr-extraction

Text extraction from images using EasyOCR and parallelization with PyTorch

cuda ocr pytorch

Last synced: 01 May 2026

https://github.com/marius311/cudadistributedtools.jl

A set of utility tools for multi-GPU + multi-process workflows

cuda distributed julia

Last synced: 01 May 2026

https://github.com/f14-bertolotti/torchess

cuda torch extension for a chess engine

chess cuda torch

Last synced: 01 May 2026

https://github.com/lionpsiuc/postgraduate

A collection of assignments and projects completed during my M.Sc. in High-Performance Computing at Trinity College Dublin.

c cpp cuda

Last synced: 01 May 2026

https://github.com/zepedroresende/matrixmultiplication

Matrix Multiplication optimizations on intel and CUDA

c cpp cuda hpc matrix-multiplication omp optimization

Last synced: 01 May 2026

https://github.com/d-krylov/cuda_to_opengl

Simple examples for CUDA OpenGL interoperability

cuda cuda-opengl opengl

Last synced: 01 May 2026

https://github.com/xueeinstein/udacity-cs344-cuda8

Code for Udacity CS344 (Intro to Parallel Programming) using CUDA 8.0

cuda cuda-8 parallel-computing

Last synced: 02 May 2026

https://github.com/cserajdeep/dnn-iris-pytorch

Deep Neural Network with Batch normalization for tabulat datasets.

batch batch-normalization classification cuda deep-learning dnn iris-dataset

Last synced: 02 May 2026

https://github.com/waz4/tinycomb

A lightweight C and CUDA library for efficiently calculating combinations with repetition. Jump to any combination much faster than bruteforce methods, leveraging precomputed factorials and `tiny-bignum-c` for big-number support.

c combinations-generator combinations-with-repetition cuda tiny-bignum-c tinycomb

Last synced: 02 May 2026

https://github.com/bjornmelin/edge-ai-engineering

📱 Optimized ML for edge devices. Showcasing efficient model deployment, GPU-CPU memory transfer optimization, and real-world edge AI applications. 🤖

cuda edge-computing embedded-systems gpu-optimization iot mobile-ml model-optimization python tflite

Last synced: 02 May 2026

https://github.com/moshidev/acap

Prácticas de la asignatura Arquitectura y Computación de Altas Prestaciones

cuda homework-assignments mpi pthreads

Last synced: 30 Mar 2025

https://github.com/rajshrestha86/kmeans-clusterize-cuda

Implementation of K-Means algorithm from scratch using CUDA.

c cuda kmeans-clustering

Last synced: 18 May 2026

https://github.com/amruthapatil/nyu-cudaconvolution

Implementing convolution operations on an image using CUDA, exploiting different methodologies - basic, tiled, and cuDNN

cuda high-performance

Last synced: 13 Mar 2025

https://github.com/luchrist69/ascent

📄 Improve your resume with Ascent, a simple web app that provides instant feedback to help you land more interviews, all for free.

agentic-ai ascent cuda dapr dapr-pub-sub datalog differential-equations docker engine kafka mpi odeint openai openai-api rancher-desktop rendering simulation simulation-framework

Last synced: 02 May 2026

https://github.com/prateekshukla1108/thunderkittens-docs

Documentation for ThunderKittens framework

cuda deep-le

Last synced: 18 Mar 2025

https://github.com/yinguobing/opencv-docker

Dockerfiles for OpenCV build.

cuda docker ffmpeg opencv

Last synced: 10 Apr 2026

https://github.com/jiriklepl/bits-knn-jpdc2024

Replication package for the paper Towards Optimal GPU-accelerated K-Nearest Neighbors Search

bitonic-sort cuda gpu k-nearest-neighbors knn-search top-k

Last synced: 21 Mar 2025

https://github.com/nourmorsy/convolution-neural-network-cuda

Code for optimization to CNN using CUDA

c cnn cuda

Last synced: 13 May 2026

https://github.com/brendanm12345/simple_renderer_cs149

Simple CUDA renderer implementation. 19th most efficient out of 150+ submissions

cpp cuda

Last synced: 18 May 2026

https://github.com/apws25/accelmoe

This repository is for CUDA kernel re-implementation of CPU-based MoE model.

cpp cuda mixture-of-experts

Last synced: 11 May 2026

https://github.com/daniilvorontsov/fourier-option-pricing

MSc thesis project concerned with option pricing for Levy Jump models. Package includes pricing implementations for European Call and Put options for Carr-Madan, COS and Fourier Time Stepping.

carr-madan cuda fourier-transform monte-carlo option-pricing

Last synced: 11 May 2026

https://github.com/theogravity/dual-rtx-6000-blackwell-gemma-4-31b-it-nvfp4

Optimized vLLM setup for Gemma 4 31B NVFP4 with MTP on dual RTX PRO 6000 Blackwell using vllm and docker: native FP4 Tensor Cores, Multi-Token Prediction (96.5% acceptance rate), and prefix caching. Includes benchmark results and replication scripts.

am5 amd blackwell cuda docker fp4 gemma gemma4 llm-inference multi-token-prediction nvfp4 prefix-caching rtx-6000 speculative-decoding tensor-parallel vllm

Last synced: 11 May 2026

https://github.com/realdougeubanks/unmanic.plugin.encoder_video_hevc_nvenc_gpu

Unmanic plugin: H.265/HEVC encoder using NVIDIA hevc_nvenc with a true end-to-end GPU pipeline. Fork of Josh5/unmanic.plugin.encoder_video_hevc_nvenc that adds -hwaccel_output_format cuda when NVDEC HW decoding is enabled, keeping decoded frames in GPU memory through NVENC. Drop-in replacement with sensible defaults and full settings parity.

cuda ffmpeg hardware-acceleration nvdec nvenc nvidia unmanic unmanic-plugin video-transcoding

Last synced: 12 May 2026

https://github.com/tomaszrewak/csgpathtracer

A constructive solid geometry path tracer.

computer-graphics cuda path-tracing rendering

Last synced: 12 May 2026

https://github.com/thesupercd/rainbow_table_builder

A high performance CUDA-based GPU accelerated Rainbow-Table maker, written in C++ without any external libraries or dependencies needed.

cpp cryptography cuda hash-table hashing parallel-processing rainbow-table sha3 sha3-512 uuid

Last synced: 12 May 2026

https://github.com/vishalanandv/small_scale_parallel_programming

The project describes the design and development of a sparse matrixvector product kernel, implemented using super computer.

clanguage cuda kernel

Last synced: 12 May 2026

https://github.com/brocbyte/cuball

CUDA-based implementation of "Real-Time Rigid Body Simulation on GPUs" [from GPU Gems 3]

cpp cuda

Last synced: 12 May 2026

https://github.com/aspragueumkc/hydra2dgpu

GPU-accelerated 2D shallow water equation solver for QGIS — CUDA finite-volume method with unstructured mesh support

cuda finite-volume-method gis gpu-computing hydraulic-modeling hydrodynamics qgis shallow-water-equations

Last synced: 11 Jun 2026

https://github.com/programmergnome/cuda-codes

Snippet repository for learning parallel GPU programming with CUDA.

c cpp-programming cuda cuda-kernel gpu-programming learning-materials parallel-programming parallelization

Last synced: 13 May 2026

https://github.com/rossbates/rummage

Rummage is a GPU accelerated npub miner for Nostr

cuda identity mining nostr

Last synced: 13 May 2026

https://github.com/nyxflower/mosaics-cuda-openmp

Simple image mosaic command line too (CUDA-OpenMP-C Implementation)

c cuda gpu-programming mosaic mosaic-images openmp parallel-computing parallel-processing

Last synced: 13 May 2026

https://github.com/gianmariaromano/pmc-translated-notes

The repository contains translated notes for the course "Programmazione di Sistemi Multicore" given by Professor De Sensi for the "Informatica" course at Sapienza Università di Roma.

cuda cuda-programming mpi multicore openmp parallel-computing parallel-programming pthreads

Last synced: 14 May 2026

https://github.com/gcol33/resolve

Neural network framework for species distribution modelling (PyTorch/C++/CUDA)

cpp cuda deep-learning ecology machine-learning neural-network pytorch species-distribution

Last synced: 12 Jun 2026

https://github.com/kaierikniermann/hpc-uzh-notes

These are some notes for the High Performance Computing course taught at UZH

cuda high-performance-computing mpi openacc openmp

Last synced: 13 Jun 2026

https://github.com/g023/cuda_inf

A self-contained CUDA inference engine for LiquidAI/LFM2.5-8B-A1B (hybrid conv + GQA-attention MoE, 8.5B params, 1B active) targeting a single RTX 3060 (12 GB). No Python, no frameworks at runtime: a single .cu engine + a header-only byte-level BPE tokenizer.

3060 ai c cpp cuda fast-inference gpu inference inference-engine large-language-models lfm25 liquidai llm moe nvidia open-source rtx token

Last synced: 15 Jun 2026

https://github.com/p4suta/mojiokoshi

Local audio transcription tool with real-time progress, powered by faster-whisper and CUDA

audio-transcription cuda docker fastapi faster-whisper gpu python self-hosted speech-to-text sveltekit transcription whisper

Last synced: 16 Jun 2026

https://github.com/hailiang-wang/cuda-get-started

Get started with CUDA

cuda machine-learning nvidia

Last synced: 17 Jun 2026

https://github.com/angchen0325/cuda-learn

Ang's CUDA-learn project

cuda gpu-computing

Last synced: 18 Jun 2026

https://github.com/rurumimic/cuda

compute unified device architecture

cuda deep-learning gpu nvidia

Last synced: 18 Jun 2026

https://github.com/acuoci/pbe-fixed-pivot-cuda

Fast CUDA implementation of aggregation and breakage terms in Population Balance Equations using the fixed pivot sectional method

aggregation breakage cuda fixed-pivot pbe

Last synced: 18 Jun 2026

https://github.com/farukalamai/jetson-yolo-cpp

Real-time object detection, segmentation and tracking on NVIDIA Jetson using YOLO + TensorRT in C++

cpp cuda jetson object-detection tensorrt yolo26

Last synced: 19 Jun 2026

https://github.com/aeyage/intraday-prices

gpu-accelerated portfolio optimisation

cuda cupy nvidia-gpu

Last synced: 19 Jun 2026

https://github.com/sbstndb/neural_k

A simple Neural Network library using Kokkos enabling CUDA or OpenMP backend

ai cuda kokkos library neural-network openmp

Last synced: 22 Jun 2026

https://github.com/sebsop/kmeans-thesis-segmentation

Real-time hybrid quantum-classical K-means segmentation using C++ and CUDA. Bachelor's Thesis at BBU bridging HPC and Quantum Machine Learning (QML).

cpp cuda hpc imgui kmeans opencv quantum-computing

Last synced: 23 Jun 2026

https://github.com/sebsop/realtime-parallel-kmeans-segmentation

Real-time C++ K-means image segmentation on live video streams, using OpenCV, RCC trees, and 5D features, optimized for consumer hardware with Sequential, Multi-threaded, MPI, and CUDA backends.

cpp cuda k-means-clustering mpi multithreading opencv rcc real-time-stream-processing

Last synced: 23 Jun 2026

https://github.com/cfregly/claude-gpu-perf-tune

31 GPU inference profiling and optimization skills for Claude Code, with a bundled MCP server

agent-skills claude-code cuda gpu inference llm mcp performance

Last synced: 23 Jun 2026

https://github.com/llm-db/understanding-gpu-architecture-implications-on-llm-serving-workloads

Understanding GPU Architecture Implications on LLM Serving Workloads (Master Thesis, ETH Zürich, 2024)

cuda inference pytorch rocm transformer

Last synced: 05 May 2026

https://github.com/yablokolabs/bendkernels

Pure Bend parallel algorithm kernels and GPU-scaling examples

algorithms bend cuda gpu hvm parallel-computing

Last synced: 24 Jun 2026

https://github.com/kobinarth-panchalingam/parallel-and-concurrent-programming

Semester - 7 | CS4533 - Parallel and Concurrent Programming | Labs

c concurrent-programming cuda java openmp pthreads

Last synced: 05 May 2026

https://github.com/zelosleone/audiobook-generator

A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.

ai-audio audiobook cuda gpu-acceleration machine-learning pdf-converter python pytorch speech-synthesis text-processing text-to-speech

Last synced: 05 May 2026

https://github.com/hurbalurba/quick-llama.cpp-server

The framework for posting a more modern cuda image for llama.cpp with cuda13 for just newer cards with RPC support. Started as just learning how to compile llama.cpp custom.

cuda cuda13 devops docker dockerbuild gguf llamacpp llm rpc

Last synced: 05 May 2026