An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/shalithasuranga/cudaperformance

Compare the performance of matrix multiplication among GPU shared memory, GPU global memory and CPU

cuda cuda-demo matrix-multiplication nvidia

Last synced: 21 Jan 2026

https://github.com/maximedebarbat/dolphin

Dolphin is a python toolkit meant to speed up inference of TensorRT by providing CUDA-Accelerated processing.

cuda python tensorrt-inference

Last synced: 07 Jul 2025

https://github.com/hurricane1988/check-gpu-device

✨本项目是一个基于 Flask + Gunicorn + NVIDIA CUDA 的 API 服务,提供 CUDA 设备信息查询 和 健康检查 接口。支持 GPU 运行,可用于 深度学习推理环境 部署

cuda docker makefile nvidia python3 pytorch

Last synced: 10 Jul 2025

https://github.com/zhangge6/how-to-optimize-playground

High-performance computing (HPC) demos since I was a freshmen.

cuda gemm x86

Last synced: 15 May 2026

https://github.com/shmishtopher/cudnn-versions

A scoop bucket for installing NVIDIA cuDNN versions.

cuda cudnn scoop scoop-apps scoop-bucket

Last synced: 01 May 2026

https://github.com/kianenigma/pmms-heat-dissipation

A set of assignments with comprehensive documentation to demonstrate multiple approaches to parallel programming in multi-core and many-core systems

cuda openmp parallel-programming pthreads

Last synced: 11 Sep 2025

https://github.com/generic-matrix/node-js-cuda

Cuda Node JS binding using nan API with working example.

binding cuda cuda-node node-js node-js-cuda nodejs nodejs-gpu nodejs-modules

Last synced: 24 Jul 2025

https://github.com/simmsb/p4haskell

P4 backend in haskell

compiler cuda gpu p4 p4c p4language

Last synced: 13 May 2026

https://github.com/biodasturchi/gmx

🔬 Gromacs yordamida molekular modellashtirish

cuda gpu gromacs mdp topology tpr trr

Last synced: 12 May 2026

https://github.com/larygwil/ffmpeg-static-cuda

ffmpeg static binaries for Linux that work on some old Nvidia gpu (not tested)

avc cuda cuvid ffmpeg h264 h265 hevc nvdec nvenc

Last synced: 06 May 2026

https://github.com/misha-kis/python-plane-ransac

Parallel RANSAC for plane detection for multiple point clouds using Python and CUDA

cuda numba plane-detection python ransac

Last synced: 13 May 2026

https://github.com/deftruth/ptx-isa-8.2-zh

🎉持续更新:CUDA 12.2 PTX-ISA-8.2学习笔记,部分中文翻译 + 个人理解 + 内联汇编示例,讲解CUDA 12.2 PTX-ISA-8.2 汇编指令;进行中.....

asm cpp cuda ptx

Last synced: 13 May 2026

https://github.com/tthebc01/kawpow

Containerized KAWPOW miner.

cuda docker kawpow ravencoin

Last synced: 22 Jun 2026

https://github.com/lordmathis/cudanet

Convolutional Neural Network inference library running on CUDA

convolutional-neural-networks cpp cuda pytorch

Last synced: 08 May 2026

https://github.com/ran-2012/inversion

solve geophysics using CUDA & TensorFlow

cpp cuda geophysics inversion-method python

Last synced: 11 May 2026

https://github.com/pd2871/high-performance-computing

This repo contain the logs of High Performance Computing module's final Assignment

blurred-images c cuda gaussian-blur matrix-multiplication multi-threading parallel-computing pthreads pthreads-api

Last synced: 10 May 2026

https://github.com/tank3-tk3/parallel-processing-cuda

Parallel processing with CUDA C / C++

c cpp cuda parallel-computing parallel-programming

Last synced: 09 May 2026

https://github.com/nachovizzo/saxpy_openacc_cpp

My way of thinking about OpenACC, C++, and Parallel computing in general

cpp cuda gpu openacc

Last synced: 23 Jun 2026

https://github.com/dereklstinson/nccl

golang wrapper for nccl

cuda deep-learning go nccl parallel-computing

Last synced: 14 May 2026

https://github.com/tky823/bitlinear158compression

Compare compression models for inference by BitLinear158

cuda pytorch quantization

Last synced: 12 Jun 2026

https://github.com/mrglaster/cuda-acfcalc

Calculation of the smallest ACF for signals of length N using CUDA technology.

acf c calculations cpp cuda google-colaboratory google-colaboratory-notebooks isu

Last synced: 06 May 2026

https://github.com/l1cacheDell/CUDA_Code

Codes for learning cuda. Implementation of multiple kernels.

cuda cuda-programming

Last synced: 10 Mar 2025

https://github.com/avitase/fast_frechet

Comparison of different (fast) discrete Fréchet distance implementations in C++ and CUDA.

benchmark cpp cuda frechet-distance simd

Last synced: 18 May 2026

https://github.com/hadv/vaneth

GPU-accelerated CREATE2 vanity address miner for Ethereum

create2-contract-deployment cuda ethereum gpu gpu-acceleration gpu-programming open-cl vanity-address

Last synced: 21 Jan 2026

https://github.com/slesniew/parallel-processing-cpu-and-gpu-env-and-lib-with-powercap

(2024/2025) A library and environment for parallel processing in a power-limited CPU+GPU cluster environment.

c cpu cuda gpu mpi openmp parallel powercap

Last synced: 30 Mar 2025

https://github.com/yosh-matsuda/gpu-ptr

Cross-platform GPU smart pointer with C++20 range support

cpp cpp20 cuda gpu header-only hip

Last synced: 17 Jan 2026

https://github.com/lzyrapx/llm-grandmaster-notes

🎓The path to LLM mastery is paved with broken embeddings and resurrected gradients.

cuda deep-learning llm reinforcement-learning

Last synced: 14 May 2025

https://github.com/kim-hwiwon/t-espresso

A CUDA Library for Low-overhead Host-to-Device Transmission of Patterned Profile Data

cuda profiler

Last synced: 04 May 2026

https://github.com/seungjaelim/cuda.tutorial

References content from the OLCF CUDA Training Series. (https://github.com/olcf/cuda-training-series)

cuda gpu-programming nsight-compute nsight-systems

Last synced: 07 Feb 2026

https://github.com/elftausend/sliced

Array operations with automatic differentiation on CPU and GPU

autograd automatic-differentiation cuda custos matrix opencl

Last synced: 31 Jan 2026

https://github.com/lukasboettcher/msc-code

This is the repo for my master thesis on a GPU accelerated andersen analysis.

andersen-analysis clang cuda llvm static-analysis

Last synced: 16 Jan 2026

https://github.com/agalue/sherpa-voice-assistant

Local AI-based voice assistant implemented using Sherpa, Whisper, Kokoro, and Ollama

coreml cuda golang kokoro-tts linux macos ollama onnx-runtime rust sherpa whisper-ai

Last synced: 04 Apr 2026

https://github.com/pnocera/cembedd

Embeddings rust API serving intfloat/multilingual-e5-large using huggingface/candle with CUDA enabled

bert cuda huggingface

Last synced: 12 Jan 2026

https://github.com/zeloe/juce_cuda_convolution

GPU acceleration for efficient, high-quality audio processing.

audio audio-processing convolution cuda dsp juce

Last synced: 03 Mar 2026

https://github.com/murrellgroup/conflux.jl

Single-node data parallelism in Julia with CUDA

cuda data-parallelism flux julia nccl

Last synced: 22 May 2026

https://github.com/aiday-mar/mpi-cuda-project

Using MPI and CUDA in order to accelerate the conjugate gradient algorithm execution in C++

c-plus-plus cuda gpu mpi university-project

Last synced: 02 May 2026

https://github.com/huwzpf/parallel-processing-cpu-and-gpu-env-and-lib-with-powercap

(2024/2025) A library and environment for parallel processing in a power-limited CPU+GPU cluster environment.

c cpu cuda gpu mpi openmp parallel powercap

Last synced: 11 Apr 2025

https://github.com/dpbm/qml-course

Minicurso de quantum Machine learning

cuda cuda-q cuquantum docker ml python qml quantum quantum-computing tensorflow

Last synced: 31 Jan 2026

https://github.com/gjbex/gpu-programming

Material for a training on portable GPU programming

cuda gpu kokkos openmp openmp-off stl thrust

Last synced: 08 Feb 2026

https://github.com/cklxx/arle

Rust-native inference runtime for Qwen3 / Qwen3.5 — OpenAI-compatible serving + integrated agent, train, and self-evolution workflows. CUDA + Metal, no PyTorch on the hot path.

agent cuda flashinfer gspo inference infra kv-cache llm metal mlx openai-compatible qwen3 qwen35 rl rust

Last synced: 02 May 2026

https://github.com/dito97/gol

High-performance Computing (90535) final project at UniGe

cuda mpi openmp

Last synced: 02 May 2026

https://github.com/sthysel/jtx2-tools

nvidia jtx/xavier GPU monitor tool

cuda nvidia txt2 xavier

Last synced: 19 May 2026

https://github.com/superlinear-ai/scipy-notebook-gpu

jupyter/scipy-notebook with CUDA Toolkit, cuDNN, NCCL, and TensorRT

cuda cudnn docker nccl scipy-notebook tensorflow tensorrt

Last synced: 01 May 2026

https://github.com/bogdanminko/laperf

La Perf is a framework for AI performance benchmarking — covering LLMs, VLMs, embeddings, with power-metrics collection.

ai-benchmark ai-performance apple-silicon cuda lmstudio ml-benchmark mlx mps nvidia-gpu ollama open-source-benchmark

Last synced: 15 May 2026

https://github.com/orlandopalmeira/trabalho-cp-2023-2024

Repositório do trabalho prático no âmbito da UC de Computação Paralela (CP) - Mestrado em Engenharia Informática (MEI/MIEI) - Universidade do Minho (UMinho)

computacao-paralela cp cuda cuda-programming mei miei nvidia nvidia-cuda openmp optimization optimization-problem parallelism performance uminho uminho-mei uminho-miei

Last synced: 18 May 2026

https://github.com/tmrob2/cuda2rust_sandpit

Minimal examples to get CUDA linear algebra programs working with Rust using CC & FFI.

cc clang cublas cuda cusparse rust

Last synced: 14 May 2025

https://github.com/wendylabsinc/tensorrt-swift

TensorRT Swift 6.2 Bindings for Linux

cuda nvidia swift tensor tensorrt

Last synced: 01 Feb 2026

https://github.com/geekysuavo/gpufield

A CUDA-accelerated electromagnetostatics solver

cuda magnetic-fields magnetostatics

Last synced: 24 Dec 2025

https://github.com/dhruvsrikanth/cudann

A distributed implementation of a deep learning framework in CUDA.

cpp cuda deep-learning deep-learning-framework gpu-programming high-performance-computing hpc parallel-programming

Last synced: 01 May 2026

https://github.com/steleman/pytorch-cuda-2.7.1

Clone of PyTorch: Tensors and Dynamic neural networks in Python and C++ with strong GPU acceleration.

cuda fedora macos pytorch sequoia

Last synced: 30 Apr 2026

https://github.com/digimortl/libguess

Patches that give Bitcoin Core an ability of CUDA mining

bitcoin c-plus-plus cryptocurrency cuda

Last synced: 16 Apr 2026

https://github.com/artain-ai/ignite-ms

Fast self-hosted embedding engine for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust + TensorRT for teams that care about scale, cost, and control.

batch-inference batch-processing cuda embeddings gpu high-performance huggingface machine-learning multi-gpu nlp rag rust self-hosted semantic-search tensorrt text-embeddings vector-search

Last synced: 04 Jun 2026

https://github.com/dqbd/cuda-btree

Implementation of B-Trees on NVIDIA CUDA

b-tree cuda nvidia

Last synced: 30 Apr 2026

https://github.com/amruthapatil/nyu-cudamatrixoperations

Optimizing CUDA programs for vector addition and matrix multiplication

cuda high-performance-computing

Last synced: 21 May 2026

https://github.com/tawssie/zmpy3d_pt

Python implementation of 3D Zernike moments with PyTorch

3d-zernike cuda gpu protein-structure python pytorch structural-bioinformatics superposition zernike-moments

Last synced: 24 Oct 2025

https://github.com/isazi/aoflagger

AOFlagger Radio Frequency Interference mitigation algorithm.

cuda gpu many-core rfi

Last synced: 30 Apr 2026

https://github.com/trahay/mpi-wattmeter

MPI-Wattmeter measures the power consumption of MPI programs

carbon-emissions cuda energy-consumption energy-monitor gpu hpc mpi

Last synced: 17 May 2026

https://github.com/brocbyte/realtime-deformations

Snow simulation (Material Point Method)

cuda glm material-point-method opengl

Last synced: 10 Aug 2025

https://github.com/openspeedshop/cbtf-argonavis-gui

Baseline for next generation Open|SpeedShop Graphical User Interface (GUI). The primary focus of this GUI will be the processing and display of CUDA collector performance data. However, there will be refactoring phases to adopt the GUI to support the processing and display of any collector performance data.

cuda performance profiler profiling

Last synced: 18 Apr 2026

https://github.com/lchsk/ney

A header-only parallel functions library for Intel Xeon/Xeon Phi/GPUs

cuda gpu linux parallel phi scientific xeon xeonphi

Last synced: 07 May 2026

https://github.com/infotrend-inc/ctpo-demo_projects

Jupyter Notebook examples using CTPO as their source container.

cuda opencv pytroch tensorflow2

Last synced: 14 Apr 2026

https://github.com/pelayo-felgueroso/tensorflow-gpu-setup

Step-by-step guide to installing TensorFlow with GPU support on Conda.

artificial-intelligence cuda deep-learning gpu machine-learning nvidia nvidia-gpu setup-guide tensorflow

Last synced: 17 Feb 2026

https://github.com/headless-start/data-augmentation-impact

This repository contains effect of Data Augmentation of Training Set during Model Training.

augmented-images cuda data gpu keras matplotlib mnist opencv-python python3 tensorflow training-data

Last synced: 05 Apr 2026

https://github.com/nixos-cuda/cuda-legacy

Select CUDA package sets which have aged out of Nixpkgs. [maintainers=@ConnorBaker, @SomeoneSerge]

cuda nixpkgs nixpkgs-overlay

Last synced: 15 May 2026

https://github.com/xmas7/cudampi

A large hybrid CPU/GPU sorting network using CUDA and MPI. The sorting network uses a standard Quicksort for CPUs and a custom Bitonic Sort for GPUs. These two algorithms were the fastest in a number of prior benchmarks.

cpu cuda gpu hybrid mpi network

Last synced: 29 Apr 2026

https://github.com/grakshith/parallel-k-means

K-Means clustering for Image Colour Quantization and Image Compression

cuda image-color-quantization image-compression k-means mpi opencv openmp

Last synced: 28 Apr 2026

https://github.com/neoblizz/spmv

Efficient Sparse Matrix-Vector Multiplication (SpMV) using ModernGPU (MTX + CSR formats).

csr cuda gpgpu load-balancing mtx spmv

Last synced: 28 Apr 2026

https://github.com/alexjmercer/fractal-art

Generating Fractals in C++ using SFML. For the ultimate visual stimulation and in-depth code!

cmake cmakelists cpp20 cuda cuda-programming fractal-rendering graphics mandelbrot multithreading sfml2

Last synced: 05 Mar 2026

https://github.com/shreyansh26/mlsys-experiments

A collection of scripts on experimenting and implementing MLSys-related stuff

cuda cuda-kernel gpu gpu-programming llm-inference profiling pytorch triton

Last synced: 30 Aug 2025

https://github.com/kohulan/tensorflow-2.0-installation-with-cuda-support

A detailed step by step guide to install Tensorflow-2.0-gpu with CUDA Drivers on Ubuntu Server/ Desktop LTS

cuda gpu nvidia ubuntu

Last synced: 07 May 2025

https://github.com/fattorib/thunderkittens-simple-gemm

Simple Tensorcore GEMM in ThunderKittens

cuda gemm gpu thunderkittens

Last synced: 09 Feb 2026

https://github.com/toxy4ny/artaxerxes

Artaxerxes - Adaptive High-Performance Stress Tester v.1.0. Rebuild old version Xerxes DDoS. Supports GPU+io_uring, DPDK, eBPF/XDP with intelligent fallbacks. Educational tool for advanced cybersecurity labs

cuda cuda-programming cybersecurity cybersecurity-education cybersecurity-tools dpdk ebpf educational high-performance network-security network-security-tool penetration-testing penetration-testing-framework penetration-testing-tools security-tools stress-testing

Last synced: 08 Oct 2025

https://github.com/hansalemaos/nvidiacheck

Monitors NVIDIA GPU information and log the data into a pandas DataFrame - Windows only.

cuda log logging nvidia torch

Last synced: 27 Apr 2026

https://github.com/scarfy-sysu/rtx5060-pytorch-cuda129

Run PyTorch with CUDA 12.9 on RTX 50 series (e.g. RTX 5060)

cuda deep-learning pytorch rtx5060

Last synced: 20 Jul 2025

https://github.com/babak2/optimizedsum

Optimized Parallel Sum program demonstrating CPU vs GPU performance

cuda cuda-programming gpu-acceleration gpu-computing gpu-parallelism visual-studio

Last synced: 27 Mar 2025

https://github.com/hanzhi713/bitonic-sort

In-place GPU sort with bitonic sort

bitonic-sort cuda gpu in-place sorting

Last synced: 09 Feb 2026

https://github.com/mazharuddin-mohammed/semidgfem

High-performance TCAD Simulator Using Discontinuous Galerkin FEM

cuda discontinuous-galerkin-method tcad tcad-device-simulator

Last synced: 15 Jun 2025

https://github.com/neoblizz/cupti-plus-plus

CUPTI++ is a C++ interface to the CUDA Profiling Tools Interface (CUPTI).

cpp cuda cuda-profiler cupti profiler

Last synced: 26 Apr 2026

https://github.com/denzp/current

CUDA high-level Rust framework

cuda rust

Last synced: 26 Apr 2026

https://github.com/xkevio/cuda-raytracer

A simple ray tracer written with CUDA that saves its output in a .ppm file, CPU version included for reference.

cpu cuda cuda-raytracer gpu

Last synced: 25 Aug 2025

https://github.com/maliknaik16/parallel-computing

CUDA programming in C++ for high-performance computing using Nvidia GPUs, optimized for tasks like machine learning, or image processing

cores cpp cuda gpu makefile matrix nvcc optimization

Last synced: 10 Jun 2025

https://github.com/tiw302/mandelbrot-c

A simple Mandelbrot set explorer written in C. Crafted with SDL2 and multithreaded rendering for a smooth experience. ‹(•_•)›

c cuda fractal graphics mandelbrot multithreading sdl2 web webassembly

Last synced: 26 Apr 2026

https://github.com/csvancea/gpu-hashtable

GPU-backed linear-probing hash table implemented in CUDA. Supports batch operations such as insert and retrieval.

cuda hashtable

Last synced: 24 Apr 2026

https://github.com/teodutu/asc

Arhitectura Sistemelor de Calcul - UPB 2020

cache-optimization cuda parallel-programming profiling python-threading

Last synced: 24 Apr 2026

https://github.com/puzzlef/pagerank-cuda-dynamic

Design of CUDA-based Parallel Dynamic PageRank algorithm for measuring importance.

algorithm cuda gpu graph pagerank static temporal

Last synced: 21 Feb 2026

https://github.com/yingding/applyllm

A python package for applying LLM with LangChain and Hugging Face on local CUDA/MPS host

accelerator batch cuda framework inference kubeflow langchain llm mps pipeline slurm transformers

Last synced: 24 Aug 2025