An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/ronaldsg20/compu-paralela

Códigos de ejemplo para computación paralela y distribuida

cuda opencv openmp posix-threads

Last synced: 14 May 2026

https://github.com/alexkranias/triton_vs_cuda

Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.

cuda cuda-kernels gpu gpu-programming parallel-programming python triton

Last synced: 20 Apr 2026

https://github.com/arya2004/parallel-computing

Parallel Computing Uni Course

cuda

Last synced: 18 May 2026

https://github.com/jusqua/dip-benchmark

Departmental undergraduate research project at UFS. Digital image processing benchmark using multiple tools to learn new ways to develop image processors.

benchmark cuda image-processing matlab opencv sycl visiongl

Last synced: 20 Apr 2026

https://github.com/bonevbs/cuknn

Cuda implementation of k-nearest neighbor search

cuda knn-search

Last synced: 20 Apr 2026

https://github.com/py-sandy/llama.cpp-windows-builder

Automated, reproducible build scripts for llama.cpp on Windows 10/11. Installs prerequisites, configures CMake and builds with CUDA.

ai build-scripts build-tool builder cuda llamacpp script scripts windows windows-10 windows-11

Last synced: 20 Apr 2026

https://github.com/juntyr/necsim-rust-docs

Documentation of the spatially explicit biodiversity simulation necsim-rust

biodiversity cuda docs mpi necsim rust simulation

Last synced: 14 May 2026

https://github.com/mrkct/cuda-raytracer

Simple CUDA-Accelerated raytracer

cuda gpu raytracing raytracing-one-weekend

Last synced: 21 Apr 2026

https://github.com/rai-project/dlperf

Déjà vu: Modeling DNN Performance by Recalling History

benchmark cuda deep-learning modeling onnx performance tensorflow

Last synced: 21 Apr 2026

https://github.com/musaibbashir/object-detection

Pytorch+CUDA implementation of several image classification and object detection models like YOLO , Fast-CNN, RF-DETR

cnn computer-vision cuda image-classification object-detection pytorch yolo

Last synced: 21 Apr 2026

https://github.com/jiaau/kernels

This repository showcases common optimization techniques for kernels.

cpp cuda cute cutlass hpc kernel

Last synced: 21 Apr 2026

https://github.com/fxzxmicah/fedora-llama-cpp

llama.cpp tools with OpenMP, CUDA, and OpenVINO support

cuda fedora llama-cpp openmp openvino rpm

Last synced: 05 Jun 2026

https://github.com/dimitrijkrstev/pp-cuda-fft

A parallelised CUDA implementation of the FFT Radix-2 algorithm and its execution time comparison to the DFT and non-parallelised Radix-2

cuda fft parallel-computing

Last synced: 22 Apr 2026

https://github.com/mdnpascual/judgebarmashvp

Error bar for the game called Mash VP

cuda emgucv screencapturer tesseract-ocr

Last synced: 22 Apr 2026

https://github.com/maxenceleguery/jat

Tensor library

computation cuda tensor

Last synced: 24 Apr 2026

https://github.com/bikemazzell/tuonella-sift

A high-performance, memory-efficient CSV deduplication tool

csv cuda deduplication logger osint rust

Last synced: 24 Apr 2026

https://github.com/bardifarsi/threadpoolmanager

ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.

cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe

Last synced: 24 Apr 2026

https://github.com/juntyr/necsim-rust-analysis

Analysis of the spatially explicit biodiversity simulation `necsim-rust`

analysis biodiversity cuda mpi necsim rust simulation

Last synced: 24 Apr 2026

https://github.com/0xsooki/extending-jax

JAX Custom Operations with C++ and CUDA (using Pybind11)

cuda jax pybind11 xla

Last synced: 25 Apr 2026

https://github.com/sangioai/torchpace

PyTorch CUDA/C++ extension of PACE: Transformer non-linearlity accelerator engine.

cuda pytorch transformer

Last synced: 25 Apr 2026

https://github.com/daviddavo/19gpu

Short exercises for GPU at Complutense University of Madrid. Mirror from GitLab

accelerator cuda gpu-programming

Last synced: 26 Apr 2026

https://github.com/shashshukla/ee-210-signals-and-systems

Code for the assignments for EE-210, Signals and Systems, at IIT Bombay 2016.

cuda image-processing signal-processing

Last synced: 26 Apr 2026

https://github.com/seanwevans/damnati

A CUDA-accelerated iterated prisoner's dilemma arena

arena cuda iterated-prisoners-dilemma prisoners-dilemma tournament

Last synced: 14 May 2026

https://github.com/countzero/windows_exllama

This is a playground to explore the ExLlama project in a Windows environment.

conda cuda exllama python torch

Last synced: 26 Apr 2026

https://github.com/alexyzha/cuda-bioinformatics

A CUDA-Accelerated Bioinformatics Toolchain

bioinformatics bioinformatics-tool cplusplus cuda

Last synced: 26 Apr 2026

https://github.com/mateuszk098/parallel-programming-examples

Simple parallel programming examples with CUDA, MPI and OpenMP.

cpp cuda mpi openmp parallel-programming

Last synced: 27 Apr 2026

https://github.com/kbredies/tgv_pycuda

Algorithms, examples and tests for denoising, deblurring, zooming, dequantization and compressive imaging with total variation (TV) and second-order total generalized variation (TGV) regularization. GPU-accelerated code using PyCUDA.

compressive-imaging cuda image-deblurring image-denoising image-dequantization image-zooming python3 total-generalized-variation total-variation

Last synced: 27 Apr 2026

https://github.com/notkartikye/cuda-image-box-filters

🖼️ CUDA-powered tool for applying box filters to a large amount of images

cuda cuda-library cuda-programming npp

Last synced: 27 Apr 2026

https://github.com/gladap/heterogeneous_computing_project

Heterogeneous parallel programming exercise using OpenMP and CUDA to parallelize image filters

cuda heterogeneous-parallel-programming

Last synced: 27 Apr 2026

https://github.com/perhuepenbecker/cudyn

CUDA library for irregular tasks using a dynamic block-internal balancing mechanism

cpp cuda cuda-library cuda-programming gpu-computing gpu-programming irregular

Last synced: 28 Apr 2026

https://github.com/ncorgan/arrayfire-config-info

A small command-line utility that outputs all available ArrayFire devices

arrayfire cuda gpu opencl

Last synced: 28 Apr 2026

https://github.com/obsidianplusplus/yolov5-tensorrt-accelerator

基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization

cuda dynamic-shapes-cuda-stream fp16 int8 pycuda tensorrt yolov5

Last synced: 28 Apr 2026

https://github.com/dlzou/rt-weekend

Ray Tracing in One Weekend, using CUDA

cuda ray-tracing

Last synced: 28 Apr 2026

https://github.com/rog0d/gpuss_watchers

"The GPU Watchers swore upon their shared memory hierarchy, from L1 to global memory, which also served as their mandate as lords of parallel computation."

cuda gpu-acceleration gpu-monitoring gpu-profiling

Last synced: 28 Apr 2026

https://github.com/axeloooo/pytorch

Collection of deep learning workflows in PyTorch, from fundamentals and classification to transfer learning and experiment tracking.

cuda python pytorch

Last synced: 28 Apr 2026

https://github.com/ltsyk/smart-snake-ai

Advanced Deep Q-Network AI for Snake Game with CUDA support and 700% performance boost

artificial-intelligence cuda deep-q-network dqn game-ai machine-learning pytorch reinforcement-learning snake-game

Last synced: 28 Apr 2026

https://github.com/atelierarith/julia_gpu_playground

For those who want use Julia with GPU

cuda docker docker-compose julia

Last synced: 28 Apr 2026

https://github.com/ccfelius/hpc

High Performance Computing (CUDA, MPI/openMP, high performance ML)

cuda high-performance-computing machine-learning mpi

Last synced: 28 Apr 2026

https://github.com/dwain-barnes/llm-gguf-auto-converter

Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.

auto-converter batch-processing cuda gguf huggingface jupyter-notebook llama-cpp llm model-quantization

Last synced: 17 Jun 2025

https://github.com/emanuelemessina/cuda-benchmark

Evaluate matrix calculations time between CPU and GPU (CUDA)

benchmark cuda matrix-calculations

Last synced: 28 Apr 2026

https://github.com/shermanlo77/modefilter

ImageJ plugin, Java and CuPy implementation of the mode filter and empirical null filter. The mode filter is an edge-preserving smoothing filter by taking the mode of the empirical density.

cuda cupy empirical-null fiji filter image-filter imagej jcuda mode-filter

Last synced: 28 Apr 2026

https://github.com/baro-00/cpp-cuda-lab

Experimental C++ projects using NVIDIA CUDA for parallel computing. Learning & testing GPU kernels

cpp cuda

Last synced: 04 May 2026

https://github.com/fedimser/aldyparen

Renders pictures and videos with algebraic fractals

cuda fractals graphics

Last synced: 29 Apr 2026

https://github.com/sandialabs/tenzing

Core library for optimizing CUDA+MPI programs as sequential decision problems.

cuda mpi scr-2759 sequential-decision-problem

Last synced: 29 Apr 2026

https://github.com/snandasena/cuda-at-scale-for-the-enterprise

Gauss Filter with CUDA and NPP

cpp cuda gpu nvidia

Last synced: 29 Apr 2026

https://github.com/0x778/gaussian_filter_using_cuda

Implemention of gaussain filter using CUDA

cuda cuda-kernels cuda-programming image-processing

Last synced: 04 May 2026

https://github.com/apostolis1/parallel-processing-systems

Project of the undergrad course "Parallel Processing Systems" - NTUA

benchmark c cuda mpi openmp parallel-computing

Last synced: 29 Apr 2026

https://github.com/giog97/histogram_equalization_cuda

Performance comparison of sequential and parallel CUDA Histogram Equalization for image contrast enhancement.

cuda cuda-kernels cuda-programming histogram-equalization image-processing parallel-computing parallel-programming

Last synced: 29 Apr 2026

https://github.com/jonastoth/cuda_raytracer

University project to implement a basic Raytracer in CUDA

cpp14 cuda raytracer

Last synced: 29 Apr 2026

https://github.com/rdma-from-gpu/.github

Public code release for our paper "Toward GPU-centric Networking on Commodity Hardware"

cuda gpu linux network rdma research

Last synced: 29 Apr 2026

https://github.com/dogrego/gpgpu-rainbow-raytracer

A GPU-accelerated rainbow ray tracer with CPU reference implementation, CUDA for parallelized refraction/reflection, and OpenGL for interactive visualization

cuda gpgpu raytracing

Last synced: 29 Apr 2026

https://github.com/jeong-j/multicore

Multi Thread in Java / C / C++ / Pthread / CUDA

c cpp cuda java multicore pthread thread

Last synced: 29 Apr 2026

https://github.com/ousscher/esi_2cs_hpc_tp

A collection of High-Performance Computing (HPC) codes showcasing parallel computing techniques. This repository includes implementations in CUDA, MPI, OpenMP, and threading ...

c cuda mpi openmp pthreads

Last synced: 18 Mar 2025

https://github.com/fikri-rouzan/cuda-c-program-part-2

CUDA C program from NVIDIA course.

c cuda

Last synced: 30 Apr 2026

https://github.com/fulvius31/triton-cache-tracker

A lightweight utility for monitoring and analyzing Triton kernel compilation cache behavior.

cache cuda gpu gpu-kernels triton triton-openai

Last synced: 30 Apr 2026

https://github.com/gaurisharan/cuda-ml-kernels

Repo for CUDA C++ GPU kernels for ML and HPC.

cpp cuda gpu hpc kernels ml parallel-computing systems-ml

Last synced: 30 Apr 2026

https://github.com/neel-dandiwala/npp_cudaatscale_project

For the enterprise course project, I have created a model that executes the histogram equalisation procedure on the given input image file.

cuda npp

Last synced: 30 Apr 2026

https://github.com/puzzlef/vector-multiplication-cuda

Comparing approaches for CUDA-based vector multiplication.

algorithm cuda map multiply operation pagerank primitive

Last synced: 30 Apr 2026

https://github.com/mahshid1378/piper-plus-3

Multilingual neural TTS (6 languages: JA/EN/ZH/ES/FR/PT, code supports SV) — C++, C#, Rust, Go, Python, npm (WASM). VITS + Prosody, streaming, CUDA/CoreML/DirectML. pip install piper-plus | npm install piper-plus | cargo install piper-plus-cli

cross-platform csharp cuda deep-learning dotnet japanese multilingual nuget onnx pytorch rust speech-synthesis streaming text-to-speech tts vits webassembly

Last synced: 08 Jun 2026

https://github.com/actepukc/uv-app-starter-pack

Bootstrap PySide6 GUI apps quickly using uv, with built-in PyTorch/CUDA handling.

astral-uv cross-platform cuda gui pyside6 python pytorch qt6 starter-kit template

Last synced: 30 Apr 2026

https://github.com/alessiobugetti/histogram-equalization

Implements sequential and parallel histogram equalization in C++ and Python, utilizing CUDA for parallel computation on GPU

cuda gpu-acceleration histogram-equalization parallel-computing pycuda

Last synced: 04 May 2026

https://github.com/tiktokfnf33/rayleigh-taylor-instability-simulation

# CUDA Rayleigh-Taylor Instability SimulationThis repository features a high-performance simulation of the Rayleigh-Taylor instability using CUDA, Python, and C. Explore the implementation and results to understand fluid dynamics in a parallel computing context. 🖥️🚀

c computational-fluid-dynamics cuda euler-method finite-difference gpu-computing hpc numerical-simulation parallel-computing physics-simulation python rayleigh-taylor-instability runge-kutta

Last synced: 04 May 2026

https://github.com/ivanbuccella/sf2bio

Deep reinforcement learning for de novo drug design: a ReLeaSe method execution on a Docker Environment

cuda deep-learning deep-reinforcement-learning docker docker-compose machine-learning nvidia-cuda nvidia-docker reinforcement-learning release release-method

Last synced: 01 May 2026

https://github.com/mrtejas/cv-sandbox

A collection of Computer Vision mini-projects tuned for a number of tasks, including face detection, object detection, image segmentation and CLIP. Trained on popular datasets and includes comparative study of the methods. Done as a part of S24 course : Computer Vision at IIIT Hyd

computer-vision cuda ml opencv pytorch yolo

Last synced: 01 May 2026

https://github.com/fikri-rouzan/cuda-c-program-part-3

CUDA C program from NVIDIA course.

c cuda

Last synced: 01 May 2026

https://github.com/darshanakgr/meanfiltergpu

A gpu implementation of mean filter in CUDA

c cuda image-processing

Last synced: 01 May 2026

https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-python

Explore how to use Numba—the just-in-time, type-specializing Python function compiler—to create and launch CUDA kernels to accelerate Python programs on massively parallel NVIDIA GPUs.

accelerated-computing cuda cuda-programming jit numba nvidia python

Last synced: 01 May 2026

https://github.com/andresvalle/ocr-extraction

Text extraction from images using EasyOCR and parallelization with PyTorch

cuda ocr pytorch

Last synced: 01 May 2026

https://github.com/marius311/cudadistributedtools.jl

A set of utility tools for multi-GPU + multi-process workflows

cuda distributed julia

Last synced: 01 May 2026

https://github.com/f14-bertolotti/torchess

cuda torch extension for a chess engine

chess cuda torch

Last synced: 01 May 2026

https://github.com/lionpsiuc/postgraduate

A collection of assignments and projects completed during my M.Sc. in High-Performance Computing at Trinity College Dublin.

c cpp cuda

Last synced: 01 May 2026

https://github.com/zepedroresende/matrixmultiplication

Matrix Multiplication optimizations on intel and CUDA

c cpp cuda hpc matrix-multiplication omp optimization

Last synced: 01 May 2026

https://github.com/vladd12/libexecstd

Modern C++ library for using an execution context of computer devices

cpp cpp17 cuda gpu-acceleration gpu-computing

Last synced: 06 May 2026

https://github.com/BardiFarsi/ThreadPoolManager

ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.

cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe

Last synced: 15 May 2025

https://github.com/proafxin/cuda-docker

High performance computing Images with pycuda and tensorrt preinstalled

cuda docker dockerfile libcudnn nvidia-tensorrt pycuda python tensorrt

Last synced: 11 Apr 2026

https://github.com/zhaocc1106/cuda-programming

Learning cuda programming

cuda nvidia

Last synced: 23 Mar 2025

https://github.com/zhaocc1106/cuxx-programing

一些cuda库的样例,cuda、cublas、cublaslt、cusparse...

cublas cublaslt cuda cusparse

Last synced: 23 Mar 2025

https://github.com/abhiram-kandiyana/cuda-blast-2024

Reimplementation of NCBI BLAST with CUDA backend for faster retrieval

blast cuda gpu-acceleration parallel-processing

Last synced: 15 Mar 2025

https://github.com/mvishiu11/kmeans-clustering

K-Means Clustering with both GPU (CUDA) and CPU implementations

cuda kmeans-clustering

Last synced: 15 Mar 2025

https://github.com/gammahazard/locate-anything

Sleek, mobile-friendly web UI for NVIDIA LocateAnything-3B — open-vocabulary object detection & grounding on your own GPU, via one docker compose up.

bounding-boxes computer-vision cuda docker fastapi gpu grounding locate-anything machine-learning nvidia object-detection ocr open-vocabulary-detection react self-hosted tailwindcss typescript vision-language-model web-ui

Last synced: 28 May 2026

https://github.com/sahil-rajwar-2004/vector-cuda

vector calculation with GPU acceleration using CUDA

c cpp11 cuda cuda-kernels cuda-programming nvcc

Last synced: 15 May 2025

https://github.com/neel-dandiwala/cuda-programs

Miscellaneous programs that grasp the concept of Parallel Computing

cuda gpu-programming parallel-programming

Last synced: 16 May 2025

https://github.com/tchung1970/sd-cli-cuda

CUDA-accelerated Stable Diffusion plugin for wavespeed-desktop

cuda gpu linux nvidia stable-diffusion

Last synced: 09 May 2026

https://github.com/bikrammajhi/100-days-of-gpu

This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA kernels, Triton spells, and PTX sorcery.

cuda nsight-compute ptx triton

Last synced: 18 Jun 2025

https://github.com/usman619/pdc

Parallel and Distributed Computing

cuda distributed-computing distributed-systems nextcloud

Last synced: 11 Apr 2026