An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/potato3d/grid-rt

GPU-accelerated ray tracing using GLSL and CUDA

cuda glsl gpu ray-tracing real-time-rendering

Last synced: 15 Apr 2026

https://github.com/yashkathe/image-noise-reduction-with-cuda

This project conducts an analysis of image denoising technique - median blur, comparing GPU-accelerated (Numba) and CPU-based (OpenCV) processing speeds.

cuda cuda-programming gpu-programming hardware-speed-analysis image-analysis image-processing numba nvidia nvidia-cuda nvidia-gpu opencv parallel-programming

Last synced: 14 May 2025

https://github.com/zhangge6/how-to-optimize-playground

High-performance computing (HPC) demos since I was a freshmen.

cuda gemm x86

Last synced: 15 May 2026

https://github.com/egororachyov/spbench

Benchmark for sparse linear algebra libraries for CPU and GPU platforms.

benchmark cpp cpu cuda gpu-computing graphblas opencl sparse-matrices

Last synced: 15 May 2025

https://github.com/tristanpenman/cuda-examples

A collection of CUDA example code

cuda

Last synced: 10 Apr 2025

https://github.com/yunzhu-li/recognizer

An object recognizer mobile app based on deep convolutional neural networks

cnn cuda cudnn gpu ios python swift tensorflow

Last synced: 20 Apr 2026

https://github.com/kiwijuice56/cuda-mandelbox

Ray marching renderer of the 3D mandelbox fractal, accelerated with CUDA GPU code

3d 3d-graphics cpp cuda fractal fractal-images fractal-rendering mandelbox nvidia-cuda

Last synced: 02 May 2026

https://github.com/fabryprog/java-gpu

Support for offloading parallel-for loops in Java to NVIDIA CUDA compatible cards.

cuda gpu java nvidia parallel-computing

Last synced: 15 Apr 2026

https://github.com/hurricane1988/check-gpu-device

✨本项目是一个基于 Flask + Gunicorn + NVIDIA CUDA 的 API 服务,提供 CUDA 设备信息查询 和 健康检查 接口。支持 GPU 运行,可用于 深度学习推理环境 部署

cuda docker makefile nvidia python3 pytorch

Last synced: 10 Jul 2025

https://github.com/simmsb/p4haskell

P4 backend in haskell

compiler cuda gpu p4 p4c p4language

Last synced: 13 May 2026

https://github.com/larygwil/ffmpeg-static-cuda

ffmpeg static binaries for Linux that work on some old Nvidia gpu (not tested)

avc cuda cuvid ffmpeg h264 h265 hevc nvdec nvenc

Last synced: 06 May 2026

https://github.com/biodasturchi/gmx

🔬 Gromacs yordamida molekular modellashtirish

cuda gpu gromacs mdp topology tpr trr

Last synced: 12 May 2026

https://github.com/misha-kis/python-plane-ransac

Parallel RANSAC for plane detection for multiple point clouds using Python and CUDA

cuda numba plane-detection python ransac

Last synced: 13 May 2026

https://github.com/deftruth/ptx-isa-8.2-zh

🎉持续更新:CUDA 12.2 PTX-ISA-8.2学习笔记,部分中文翻译 + 个人理解 + 内联汇编示例,讲解CUDA 12.2 PTX-ISA-8.2 汇编指令;进行中.....

asm cpp cuda ptx

Last synced: 13 May 2026

https://github.com/tthebc01/kawpow

Containerized KAWPOW miner.

cuda docker kawpow ravencoin

Last synced: 22 Jun 2026

https://github.com/lordmathis/cudanet

Convolutional Neural Network inference library running on CUDA

convolutional-neural-networks cpp cuda pytorch

Last synced: 08 May 2026

https://github.com/ran-2012/inversion

solve geophysics using CUDA & TensorFlow

cpp cuda geophysics inversion-method python

Last synced: 11 May 2026

https://github.com/pd2871/high-performance-computing

This repo contain the logs of High Performance Computing module's final Assignment

blurred-images c cuda gaussian-blur matrix-multiplication multi-threading parallel-computing pthreads pthreads-api

Last synced: 10 May 2026

https://github.com/mrglaster/cuda-acfcalc

Calculation of the smallest ACF for signals of length N using CUDA technology.

acf c calculations cpp cuda google-colaboratory google-colaboratory-notebooks isu

Last synced: 06 May 2026

https://github.com/nachovizzo/saxpy_openacc_cpp

My way of thinking about OpenACC, C++, and Parallel computing in general

cpp cuda gpu openacc

Last synced: 23 Jun 2026

https://github.com/tank3-tk3/parallel-processing-cuda

Parallel processing with CUDA C / C++

c cpp cuda parallel-computing parallel-programming

Last synced: 09 May 2026

https://github.com/tky823/bitlinear158compression

Compare compression models for inference by BitLinear158

cuda pytorch quantization

Last synced: 12 Jun 2026

https://github.com/dereklstinson/nccl

golang wrapper for nccl

cuda deep-learning go nccl parallel-computing

Last synced: 14 May 2026

https://github.com/debowin/gpu-parallel-recommender-system

GPGPU Parallel User-User Collaborative Filtering System in CUDA C

collaborative-filtering cuda gpu-programming movielens-dataset recommender-system

Last synced: 24 Apr 2026

https://github.com/nixos-cuda/cuda-legacy

Select CUDA package sets which have aged out of Nixpkgs. [maintainers=@ConnorBaker, @SomeoneSerge]

cuda nixpkgs nixpkgs-overlay

Last synced: 15 May 2026

https://github.com/acrlakshman/gradient-augmented-levelset-cuda

Implementation of Gradient Augmented Levelset method for CPU and GPU

cfd cuda levelset

Last synced: 17 Feb 2026

https://github.com/nellogan/distributed_compy

Distributed_compy is a distributed computing library that offers multi-threading, heterogeneous (CPU + mult-GPU), and multi-node support

cluster cuda heterogeneous-parallel-programming multi-threading multigpu openmp openmpi

Last synced: 16 Aug 2025

https://github.com/scarfy-sysu/rtx5060-pytorch-cuda129

Run PyTorch with CUDA 12.9 on RTX 50 series (e.g. RTX 5060)

cuda deep-learning pytorch rtx5060

Last synced: 20 Jul 2025

https://github.com/xlite-dev/HGEMM

⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API. 🎉🎉

cuda hgemm tensor-cores

Last synced: 30 Jul 2025

https://github.com/yingding/applyllm

A python package for applying LLM with LangChain and Hugging Face on local CUDA/MPS host

accelerator batch cuda framework inference kubeflow langchain llm mps pipeline slurm transformers

Last synced: 24 Aug 2025

https://github.com/xkevio/cuda-raytracer

A simple ray tracer written with CUDA that saves its output in a .ppm file, CPU version included for reference.

cpu cuda cuda-raytracer gpu

Last synced: 25 Aug 2025

https://github.com/shreyansh26/mlsys-experiments

A collection of scripts on experimenting and implementing MLSys-related stuff

cuda cuda-kernel gpu gpu-programming llm-inference profiling pytorch triton

Last synced: 30 Aug 2025

https://github.com/kim-hwiwon/t-espresso

A CUDA Library for Low-overhead Host-to-Device Transmission of Patterned Profile Data

cuda profiler

Last synced: 04 May 2026

https://github.com/navdeep-g/dimreduce4gpu

Dimensionality reduction ("dimreduce") on GPUs ("4gpu")

cplusplus cuda dimensionality-reduction gpu linear-algebra pca python svd unsupervised-learning

Last synced: 14 Apr 2025

https://github.com/pelayo-felgueroso/tensorflow-gpu-setup

Step-by-step guide to installing TensorFlow with GPU support on Conda.

artificial-intelligence cuda deep-learning gpu machine-learning nvidia nvidia-gpu setup-guide tensorflow

Last synced: 17 Feb 2026

https://github.com/aiday-mar/mpi-cuda-project

Using MPI and CUDA in order to accelerate the conjugate gradient algorithm execution in C++

c-plus-plus cuda gpu mpi university-project

Last synced: 02 May 2026

https://github.com/mu7annad0/100gpu

100 Days of CUDA: Optimizing My Life, One Kernel at a Time. 🔄🔥

cuda gpu

Last synced: 08 Mar 2026

https://github.com/B1-663R/docker-mining

Dockerfiles to build docker images to start mining with an NVIDIA Docker architecture

cryptocurrency cuda docker-image docker-nvidia mining

Last synced: 28 Mar 2025

https://github.com/ginkgo-project/cudaarchitectureselector

A CMake module simplifying the specification of CUDA architectures

cmake cmake-modules cuda

Last synced: 05 Nov 2025

https://github.com/cklxx/arle

Rust-native inference runtime for Qwen3 / Qwen3.5 — OpenAI-compatible serving + integrated agent, train, and self-evolution workflows. CUDA + Metal, no PyTorch on the hot path.

agent cuda flashinfer gspo inference infra kv-cache llm metal mlx openai-compatible qwen3 qwen35 rl rust

Last synced: 02 May 2026

https://github.com/dito97/gol

High-performance Computing (90535) final project at UniGe

cuda mpi openmp

Last synced: 02 May 2026

https://github.com/slesniew/parallel-processing-cpu-and-gpu-env-and-lib-with-powercap

(2024/2025) A library and environment for parallel processing in a power-limited CPU+GPU cluster environment.

c cpu cuda gpu mpi openmp parallel powercap

Last synced: 30 Mar 2025

https://github.com/superlinear-ai/scipy-notebook-gpu

jupyter/scipy-notebook with CUDA Toolkit, cuDNN, NCCL, and TensorRT

cuda cudnn docker nccl scipy-notebook tensorflow tensorrt

Last synced: 01 May 2026

https://github.com/bogdanminko/laperf

La Perf is a framework for AI performance benchmarking — covering LLMs, VLMs, embeddings, with power-metrics collection.

ai-benchmark ai-performance apple-silicon cuda lmstudio ml-benchmark mlx mps nvidia-gpu ollama open-source-benchmark

Last synced: 15 May 2026

https://github.com/lzyrapx/llm-grandmaster-notes

🎓The path to LLM mastery is paved with broken embeddings and resurrected gradients.

cuda deep-learning llm reinforcement-learning

Last synced: 14 May 2025

https://github.com/true-real-michael/python-plane-ransac

Parallel RANSAC for plane detection for multiple point clouds using Python and CUDA

cuda numba plane-detection python ransac

Last synced: 14 Mar 2025

https://github.com/galaxies99/inception-cuda

CUDA Implementation of Inception

cuda inception-v3

Last synced: 12 Apr 2025

https://github.com/huwzpf/parallel-processing-cpu-and-gpu-env-and-lib-with-powercap

(2024/2025) A library and environment for parallel processing in a power-limited CPU+GPU cluster environment.

c cpu cuda gpu mpi openmp parallel powercap

Last synced: 11 Apr 2025

https://github.com/dhruvsrikanth/cudann

A distributed implementation of a deep learning framework in CUDA.

cpp cuda deep-learning deep-learning-framework gpu-programming high-performance-computing hpc parallel-programming

Last synced: 01 May 2026

https://github.com/pnocera/cembedd

Embeddings rust API serving intfloat/multilingual-e5-large using huggingface/candle with CUDA enabled

bert cuda huggingface

Last synced: 12 Jan 2026

https://github.com/murrellgroup/conflux.jl

Single-node data parallelism in Julia with CUDA

cuda data-parallelism flux julia nccl

Last synced: 22 May 2026

https://github.com/steleman/pytorch-cuda-2.7.1

Clone of PyTorch: Tensors and Dynamic neural networks in Python and C++ with strong GPU acceleration.

cuda fedora macos pytorch sequoia

Last synced: 30 Apr 2026

https://github.com/dqbd/cuda-btree

Implementation of B-Trees on NVIDIA CUDA

b-tree cuda nvidia

Last synced: 30 Apr 2026

https://github.com/isazi/aoflagger

AOFlagger Radio Frequency Interference mitigation algorithm.

cuda gpu many-core rfi

Last synced: 30 Apr 2026

https://github.com/lintenn/cudaaddvectors-explicit-vs-unified-memory

Performance comparison of two different forms of memory management in CUDA

c cuda explicit memory memory-management performance unified-memory

Last synced: 17 May 2026

https://github.com/sthysel/jtx2-tools

nvidia jtx/xavier GPU monitor tool

cuda nvidia txt2 xavier

Last synced: 19 May 2026

https://github.com/xmas7/cudampi

A large hybrid CPU/GPU sorting network using CUDA and MPI. The sorting network uses a standard Quicksort for CPUs and a custom Bitonic Sort for GPUs. These two algorithms were the fastest in a number of prior benchmarks.

cpu cuda gpu hybrid mpi network

Last synced: 29 Apr 2026

https://github.com/capelliexp/sc2-im-pf-pathfinding-thesis

Master of science thesis project. Using CUDA to utilize a systems GPU to create pathfinding data (IM+PF), usable by multiple agents in the same environment.

ai cplusplus cuda gpgpu pathfinding starcraft2

Last synced: 15 May 2026

https://github.com/grakshith/parallel-k-means

K-Means clustering for Image Colour Quantization and Image Compression

cuda image-color-quantization image-compression k-means mpi opencv openmp

Last synced: 28 Apr 2026

https://github.com/neoblizz/spmv

Efficient Sparse Matrix-Vector Multiplication (SpMV) using ModernGPU (MTX + CSR formats).

csr cuda gpgpu load-balancing mtx spmv

Last synced: 28 Apr 2026

https://github.com/terrylindev/image-to-ASCII

🖼️ A command-line tool for converting images to ASCII art

ascii ascii-art cli command-line cpp cuda docker image-processing image-to-ascii mpi opencv terminal

Last synced: 12 Jul 2025

https://github.com/hansalemaos/nvidiacheck

Monitors NVIDIA GPU information and log the data into a pandas DataFrame - Windows only.

cuda log logging nvidia torch

Last synced: 27 Apr 2026

https://github.com/tmrob2/cuda2rust_sandpit

Minimal examples to get CUDA linear algebra programs working with Rust using CC & FFI.

cc clang cublas cuda cusparse rust

Last synced: 14 May 2025

https://github.com/neoblizz/cupti-plus-plus

CUPTI++ is a C++ interface to the CUDA Profiling Tools Interface (CUPTI).

cpp cuda cuda-profiler cupti profiler

Last synced: 26 Apr 2026

https://github.com/denzp/current

CUDA high-level Rust framework

cuda rust

Last synced: 26 Apr 2026

https://github.com/tiw302/mandelbrot-c

A simple Mandelbrot set explorer written in C. Crafted with SDL2 and multithreaded rendering for a smooth experience. ‹(•_•)›

c cuda fractal graphics mandelbrot multithreading sdl2 web webassembly

Last synced: 26 Apr 2026

https://github.com/lchsk/ney

A header-only parallel functions library for Intel Xeon/Xeon Phi/GPUs

cuda gpu linux parallel phi scientific xeon xeonphi

Last synced: 07 May 2026

https://github.com/csvancea/gpu-hashtable

GPU-backed linear-probing hash table implemented in CUDA. Supports batch operations such as insert and retrieval.

cuda hashtable

Last synced: 24 Apr 2026

https://github.com/teodutu/asc

Arhitectura Sistemelor de Calcul - UPB 2020

cache-optimization cuda parallel-programming profiling python-threading

Last synced: 24 Apr 2026

https://github.com/geekysuavo/gpufield

A CUDA-accelerated electromagnetostatics solver

cuda magnetic-fields magnetostatics

Last synced: 24 Dec 2025

https://github.com/puzzlef/pagerank-cuda-dynamic

Design of CUDA-based Parallel Dynamic PageRank algorithm for measuring importance.

algorithm cuda gpu graph pagerank static temporal

Last synced: 21 Feb 2026

https://github.com/kim-hwiwon/T-espresso

A CUDA Library for Low-overhead Host-to-Device Transmission of Patterned Profile Data

cuda profiler

Last synced: 10 Apr 2025

https://github.com/kishore-narendran/eecs221-highperformancecomputing

Assignments done during the graduate course EECS 221 - Introduction to HPC that I took in the Spring Quarter of 2016 at University of California, Irvine. Involves assignments that use OpenMP, MPI and CUDA.

cuda hpc mpi openmp

Last synced: 17 May 2026

https://github.com/amruthapatil/nyu-cudamatrixoperations

Optimizing CUDA programs for vector addition and matrix multiplication

cuda high-performance-computing

Last synced: 21 May 2026

https://github.com/markdtw/parallel-programming

Basic Pthread, OpenMP, CUDA examples

cuda openmp parallel-programming pthreads

Last synced: 20 Apr 2026

https://github.com/peri044/cuda

GPU implementations of algorithms

cuda gauss-jordan parallel-programming

Last synced: 14 Jul 2025

https://github.com/kohulan/tensorflow-2.0-installation-with-cuda-support

A detailed step by step guide to install Tensorflow-2.0-gpu with CUDA Drivers on Ubuntu Server/ Desktop LTS

cuda gpu nvidia ubuntu

Last synced: 07 May 2025

https://github.com/pothosware/pothosgpu

Pothos toolkit for ArrayFire API support

arrayfire cuda dataflow dataflow-programming gpu opencl pothos

Last synced: 19 Apr 2026

https://github.com/l30nardosv/reproduce-parcosi-moleculardocking

Reproducing paper: "Benchmarking the Performance of Irregular Computations in AutoDock-GPU Molecular Docking"

autodock-gpu cpu cuda gpu molecular-docking molecular-docking-scripts opencl paper reproducible-research

Last synced: 16 Feb 2026

https://github.com/avitase/fast_frechet

Comparison of different (fast) discrete Fréchet distance implementations in C++ and CUDA.

benchmark cpp cuda frechet-distance simd

Last synced: 18 May 2026

https://github.com/prithivsakthiur/vlm-parsing

VLM-Parsing is a Gradio-based web application for parsing documents and images into structured HTML and Markdown formats using advanced Vision Language Models (VLMs).

cuda gradio html huggingface-models huggingface-spaces huggingface-transformers logics markdown ocr-recognition pytorch qwen2-5-vl spaces vlm

Last synced: 05 Apr 2026

https://github.com/kilamper/matrix-multiplication

AC - Matrix multiplication using OpenMP, MPI and CUDA

cuda ms-mpi openmp

Last synced: 16 May 2026

https://github.com/artain-ai/ignite-ms

Fast self-hosted embedding engine for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust + TensorRT for teams that care about scale, cost, and control.

batch-inference batch-processing cuda embeddings gpu high-performance huggingface machine-learning multi-gpu nlp rag rust self-hosted semantic-search tensorrt text-embeddings vector-search

Last synced: 04 Jun 2026

https://github.com/szaghi/adam

Multi-physics AMR SDK and apps for High Performance Computing — from laptop to exascale device-accelerated superpc

amr cfd cuda fluid-dynamics fortran gas-dynamics hpc hydro-dynamics mpi openacc openmp plasma-dynamics

Last synced: 04 Apr 2026

https://github.com/agalue/sherpa-voice-assistant

Local AI-based voice assistant implemented using Sherpa, Whisper, Kokoro, and Ollama

coreml cuda golang kokoro-tts linux macos ollama onnx-runtime rust sherpa whisper-ai

Last synced: 04 Apr 2026