An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/ophoperhpo/dcgan-lentach-logo-generator

The Lentach logo generator. #MachineLearningFun

cuda dcgan dcgan-tensorflow keras lentach machinelearning ml

Last synced: 26 Jun 2026

https://github.com/willigarneau/object-detection-cuda

🕺 Put my knowledge of OpenCV and Cuda into practice to create an object detection system. 💻

camera cplusplus cuda detector filter opencv

Last synced: 08 May 2026

https://github.com/kayuii/ironfish-miner

docker nvidia/amd Gpu hpool-dev/ironfish-miner ironfish-miner

amdgpu cuda docker gpu nvidia rocm

Last synced: 07 May 2026

https://github.com/igorcosta/deep-docker

Docker image for Deep Learning on AWS Cloud

cuda deep-learning docker docker-image tensorflow

Last synced: 05 May 2026

https://github.com/daaboulex/unsloth-nix

Unsloth (git main) packaged for NixOS — CPU/CUDA/ROCm LoRA fine-tuning envs

cuda fine-tuning flake lora machine-learning nix nixos nixos-module pytorch rocm unsloth

Last synced: 10 Jun 2026

https://github.com/garciparedes/cuda-examples

Cuda examples who I develop to learn HPC based on GPU

c c-plus-plus cuda examples gpgpu gpu hpc

Last synced: 09 May 2026

https://github.com/abhans/archdev

Container that is built with Arch Linux with NVIDIA Driver & CUDA support, PyTorch and TensorFlow built in.

archlinux container cuda docker

Last synced: 07 May 2026

https://github.com/gmfatcat/ai-photoviewer

AI幫你分類你的舊照片

ai cuda local-first photo

Last synced: 16 Jun 2026

https://github.com/abdulfatir/subkmeans

Numpy and pyCUDA implementation of subKmeans

clustering cuda kdd kmeans numpy pycuda python subspace-clustering

Last synced: 09 May 2026

https://github.com/thesupercd/rainbow_table_builder

A high performance CUDA-based GPU accelerated Rainbow-Table maker, written in C++ without any external libraries or dependencies needed.

cpp cryptography cuda hash-table hashing parallel-processing rainbow-table sha3 sha3-512 uuid

Last synced: 12 May 2026

https://github.com/vishalanandv/small_scale_parallel_programming

The project describes the design and development of a sparse matrixvector product kernel, implemented using super computer.

clanguage cuda kernel

Last synced: 12 May 2026

https://github.com/brocbyte/cuball

CUDA-based implementation of "Real-Time Rigid Body Simulation on GPUs" [from GPU Gems 3]

cpp cuda

Last synced: 12 May 2026

https://github.com/aspragueumkc/hydra2dgpu

GPU-accelerated 2D shallow water equation solver for QGIS — CUDA finite-volume method with unstructured mesh support

cuda finite-volume-method gis gpu-computing hydraulic-modeling hydrodynamics qgis shallow-water-equations

Last synced: 11 Jun 2026

https://github.com/programmergnome/cuda-codes

Snippet repository for learning parallel GPU programming with CUDA.

c cpp-programming cuda cuda-kernel gpu-programming learning-materials parallel-programming parallelization

Last synced: 13 May 2026

https://github.com/rossbates/rummage

Rummage is a GPU accelerated npub miner for Nostr

cuda identity mining nostr

Last synced: 13 May 2026

https://github.com/nyxflower/mosaics-cuda-openmp

Simple image mosaic command line too (CUDA-OpenMP-C Implementation)

c cuda gpu-programming mosaic mosaic-images openmp parallel-computing parallel-processing

Last synced: 13 May 2026

https://github.com/gianmariaromano/pmc-translated-notes

The repository contains translated notes for the course "Programmazione di Sistemi Multicore" given by Professor De Sensi for the "Informatica" course at Sapienza Università di Roma.

cuda cuda-programming mpi multicore openmp parallel-computing parallel-programming pthreads

Last synced: 14 May 2026

https://github.com/gcol33/resolve

Neural network framework for species distribution modelling (PyTorch/C++/CUDA)

cpp cuda deep-learning ecology machine-learning neural-network pytorch species-distribution

Last synced: 12 Jun 2026

https://github.com/kaierikniermann/hpc-uzh-notes

These are some notes for the High Performance Computing course taught at UZH

cuda high-performance-computing mpi openacc openmp

Last synced: 13 Jun 2026

https://github.com/g023/cuda_inf

A self-contained CUDA inference engine for LiquidAI/LFM2.5-8B-A1B (hybrid conv + GQA-attention MoE, 8.5B params, 1B active) targeting a single RTX 3060 (12 GB). No Python, no frameworks at runtime: a single .cu engine + a header-only byte-level BPE tokenizer.

3060 ai c cpp cuda fast-inference gpu inference inference-engine large-language-models lfm25 liquidai llm moe nvidia open-source rtx token

Last synced: 15 Jun 2026

https://github.com/p4suta/mojiokoshi

Local audio transcription tool with real-time progress, powered by faster-whisper and CUDA

audio-transcription cuda docker fastapi faster-whisper gpu python self-hosted speech-to-text sveltekit transcription whisper

Last synced: 16 Jun 2026

https://github.com/hailiang-wang/cuda-get-started

Get started with CUDA

cuda machine-learning nvidia

Last synced: 17 Jun 2026

https://github.com/angchen0325/cuda-learn

Ang's CUDA-learn project

cuda gpu-computing

Last synced: 18 Jun 2026

https://github.com/rurumimic/cuda

compute unified device architecture

cuda deep-learning gpu nvidia

Last synced: 18 Jun 2026

https://github.com/acuoci/pbe-fixed-pivot-cuda

Fast CUDA implementation of aggregation and breakage terms in Population Balance Equations using the fixed pivot sectional method

aggregation breakage cuda fixed-pivot pbe

Last synced: 18 Jun 2026

https://github.com/farukalamai/jetson-yolo-cpp

Real-time object detection, segmentation and tracking on NVIDIA Jetson using YOLO + TensorRT in C++

cpp cuda jetson object-detection tensorrt yolo26

Last synced: 19 Jun 2026

https://github.com/aeyage/intraday-prices

gpu-accelerated portfolio optimisation

cuda cupy nvidia-gpu

Last synced: 19 Jun 2026

https://github.com/sbstndb/neural_k

A simple Neural Network library using Kokkos enabling CUDA or OpenMP backend

ai cuda kokkos library neural-network openmp

Last synced: 22 Jun 2026

https://github.com/sebsop/kmeans-thesis-segmentation

Real-time hybrid quantum-classical K-means segmentation using C++ and CUDA. Bachelor's Thesis at BBU bridging HPC and Quantum Machine Learning (QML).

cpp cuda hpc imgui kmeans opencv quantum-computing

Last synced: 23 Jun 2026

https://github.com/sebsop/realtime-parallel-kmeans-segmentation

Real-time C++ K-means image segmentation on live video streams, using OpenCV, RCC trees, and 5D features, optimized for consumer hardware with Sequential, Multi-threaded, MPI, and CUDA backends.

cpp cuda k-means-clustering mpi multithreading opencv rcc real-time-stream-processing

Last synced: 23 Jun 2026

https://github.com/cfregly/claude-gpu-perf-tune

31 GPU inference profiling and optimization skills for Claude Code, with a bundled MCP server

agent-skills claude-code cuda gpu inference llm mcp performance

Last synced: 23 Jun 2026

https://github.com/yablokolabs/bendkernels

Pure Bend parallel algorithm kernels and GPU-scaling examples

algorithms bend cuda gpu hvm parallel-computing

Last synced: 24 Jun 2026

https://github.com/awaldis/cuda-experiments

A place to explore the capabilities and limits of CUDA parallel processing.

cuda cuda-kernels cuda-programming

Last synced: 25 Jun 2026

https://github.com/angelnicolasc/meridian

Phase-aware vLLM scheduler for reasoning models: output-first dispatch, entropy-gated think termination, tiered KV eviction, and TTOT-focused benchmarking.

cuda inference kv-cache llm observability pyo3 python reasoning-models rust scheduler vllm

Last synced: 27 Jun 2026

https://github.com/kamb-code/sha256-r19-preimage

Oracle-free preimage attack on 19-round reduced SHA-256 — paper, solver, and independent verifier

cryptanalysis cryptography cuda gpu hash-functions preimage-attack security-research sha256

Last synced: 27 Jun 2026

https://github.com/huggon1/ml-algorithm-implementations

Educational implementations for ML, DL, LLM blocks, ViT, and CUDA.

cuda machine-learning numpy pytorch vision-transformer

Last synced: 28 Jun 2026

https://github.com/lk/gpu-nbody

GPU-accelerated n-body engine for t-SNE and physics simulation

cuda gpu n-body n-body-simulator

Last synced: 29 Jun 2026

https://github.com/thc1006/ct2-maxwell-final

Frozen CTranslate2 build re-introducing NVIDIA Compute Capability 5.0 (sm_50, Maxwell) so faster-whisper runs on Maxwell GPUs (Quadro K2200, GTX 750/9xx, 940MX, 960M). Packages upstream PR #1766; CUDA 12.9 + cuDNN 9.10; validated on a K2200.

ctranslate2 cuda faster-whisper maxwell sm50 speech-to-text whisper

Last synced: 29 Jun 2026

https://github.com/hurbalurba/quick-llama.cpp-server

The framework for posting a more modern cuda image for llama.cpp with cuda13 for just newer cards with RPC support. Started as just learning how to compile llama.cpp custom.

cuda cuda13 devops docker dockerbuild gguf llamacpp llm rpc

Last synced: 05 May 2026

https://github.com/xaionaro/cufft-grpc

Export cuFFT through gRPC

cmake cuda cufft fft fourier go golang gpu grpc transformation

Last synced: 05 May 2026

https://github.com/j89103138/yolov11-traffic-sign

This repository contains a YOLOv11 project for training, detection, and benchmarking of traffic signs. The project utilizes CUDA acceleration to enhance performance and efficiency in real-time traffic sign detection and evaluation.

cuda opencv python pytorch traffic traffic-sign yolov11

Last synced: 05 May 2026

https://github.com/barrrry1/claymore-s-dual-miner

Claymore's Dual Miner is a powerful GPU mining software designed for Ethereum (ETH) and simultaneous dual mining of coins like Decred, Siacoin, Pascal, and Lbry. It supports AMD and NVIDIA GPUs, leveraging OpenCL and CUDA optimization for maximum hashrate. Features include automatic GPU tuning, detailed statistics, and stability watchdog.

blockchain crypto-mining cryptocurrency cuda eth ethereum gpu-mining mining mining-pool opencl

Last synced: 05 May 2026

https://github.com/jakubfr4czek/concurrent-gauss-elimination

Concurrent gaussian elimination algorithm implemented using traces theory. Parallelism has been achieved employing CUDA cores.

agh agh-ust agh-wi conda cuda cuda-kernels cuda-toolkit diekert-graph graphviz java python python3 traces-theory

Last synced: 05 May 2026

https://github.com/abdelrahman-amen/active_learning_with_different_query_strategies

This project explores the implementation of active learning techniques, focusing on various query strategies to optimize the selection of informative data points for model training. It aims to reduce the amount of labeled data required while improving model performance, especially in scenarios with limited labeled data.

activelearning cuda entropy kldivergence margin numpy python pyto uncertainty

Last synced: 06 May 2026

https://github.com/insanelywicked1/literate-dollop

A fully automated PowerShell script to compile PyTorch from source with CUDA 12.1 support for NVIDIA RTX 50-series GPUs, optimized for Windows 11.

blackwell cuda gpu-build pytorch rtx5080 rtx5090 windows

Last synced: 06 May 2026

https://github.com/hritiksauw199/human-face-to-cartoon-conversion-using-optimized-cyclegan

Transform real human faces into cartoon-style images using a reduced CycleGAN architecture optimized for efficiency and quality.

cuda cyclegan data-science deep-learning deep-neural-networks gan human-cartoon matplotlib neural-network python pytorch torchvision

Last synced: 06 May 2026

https://github.com/iglee/jax-cuda-eicl-exp-docker

Docker for getting jax to work with cuda, for reproducing ml experiments like eicl. Sure, let's NOT make a compatibility matrix and let people fight for their lives on cuda

cuda docker jax jaxline ml-engineering ml-experiments tensorflow

Last synced: 06 May 2026

https://github.com/smilu97/system-hyu

한양대 시스템 프로그래밍 과제 제출용 레포지터리

c cuda linux matrix

Last synced: 06 May 2026

https://github.com/raiszo/cs334

Journey thorugh Intro to Parallel Programming

cmake cs334 cuda msbuild

Last synced: 06 May 2026

https://github.com/r00tens/text-classifier

Naive Bayes classifier for text classification with CPU and GPU (CUDA)

classification classifier cpp cuda machine-learning naive-bayes

Last synced: 06 May 2026

https://github.com/iamfaham/model-inference-profiler

A PyTorch-based tool for profiling deep learning model inference performance, analyzing computational bottlenecks, and visualizing resource utilization.

cuda memory pytorch visualizations

Last synced: 06 May 2026

https://github.com/rosnavigator/parallelkmeansimagecompressor

Parallel KMeans-based image quantization compressor that reduces the number of colors in an image while preserving visual quality. It uses KMeans clustering for color quantization and supports sequential, OpenMP, MPI, and CUDA implementations for performance and scalability. PoliMi - Advanced Methods for Scientific Computing (2023-2024)

boost clustering colors compression cuda image-quantization kmeans kmeans-clustering lossy-compression mpi odette opencv openmp parallel-computing parallel-programming performance polimi scalability sl-train

Last synced: 06 May 2026

https://github.com/jamesnulliu/learning-programming-massively-parallel-processors

Leaning notes of Programming Massively Parallel Processors, 4-th edition.

cuda notes pytorch

Last synced: 06 May 2026

https://github.com/mka-codelake/wispy

Minimalist push-to-talk dictation tool for Windows. Faster Whisper, local, offline.

cuda dictation faster-whisper local offline portable push-to-talk python speech-to-text stt transcription voice-input whisper windows

Last synced: 06 May 2026

https://github.com/sebp/vscode-sycl-dpcpp-cuda

Sample project to use the VS Code Remote - Containers extension to develop SYCL applications for NVIDIA GPUs using the oneAPI DPC++ compiler.

cuda dpcpp fedora gpu-computing podman sycl vscode

Last synced: 06 May 2026

https://github.com/jpuigcerver/prob-phoc

Probabilistic relevance scores from PHOC embeddings

cuda keyword-spotting kws phoc pytorch

Last synced: 07 May 2026

https://github.com/drilonaliu/parallel-sierpinski-triangle

GPU-accelerated Sierpinski Triangle generation with CUDA and OpenGL interoperability.

cuda fractals gpu parallel-programming sierpinski-triangle

Last synced: 07 May 2026

https://github.com/yuuuuurei/yolo-sibi

Real-time SIBI hand gesture detection using YOLOv8 and deep learning classifiers.

bahasa-indonesia bahasa-isyarat cuda deep-learning hand-gesture hand-gesture-recognition pytorch real-time sibi sign-language yolo yolov8

Last synced: 07 May 2026

https://github.com/drilonaliu/parallel-koch-snowflake

GPU-accelerated Koch Snowflake generation with CUDA and OpenGL interoperability.

cuda fractals gpu koch-snowflake parallel-programming

Last synced: 07 May 2026

https://github.com/noorkhokhar99/how-to-setup-nvidia-gpu-for-object-detection-installing-cuda-toolkit-and-cudnn

How to Setup NVIDIA GPU For object detection | Installing Cuda Toolkit And cuDNN

computer cuda nividia opencv python roboflow vision

Last synced: 07 May 2026

https://github.com/muhamadajiw/parallel-matrix-inversion

A parallel program for matrix inversion using MPI, OpenMP, and CUDA

cpp cuda mpi openmp

Last synced: 07 May 2026

https://github.com/shreya888/learning-cuda-with-cpp-and-pytorch

My notes, code, & insights will be recorded here while learning CUDA with C++ and PyTorch

cpp cuda pytorch

Last synced: 07 May 2026

https://github.com/stevenchang5/canny_edge

Implementation of canny edge detection, with option to use cuda to improve performance

cuda edge-detection opencv

Last synced: 07 May 2026

https://github.com/rssr25/cuda

Following Cuda By Example book.

cpp cuda cuda-programming hpc shaders

Last synced: 07 May 2026

https://github.com/wpjunior/cuda-numba-playground

Some uses of cuda with numba framework

cuda numba python

Last synced: 07 May 2026

https://github.com/not-ml/ml-3

A PyTorch-based Convolutional Neural Network (CNN) for image classification using the CIFAR-10 dataset, featuring advanced architecture, data augmentation, GPU support, and dynamic learning rate scheduling.

ai cifar10 cnn cuda gpu image-classification machine-learning modeltraining python pytorch torchvision

Last synced: 08 May 2026

https://github.com/jimmygizmo/tensorpup

Machine-learning model training using parallelization strategies on multiple serverless GPU instances.

ai cuda cudnn distributed gpu serverless tensorflow

Last synced: 08 May 2026

https://github.com/popke523/rybki

A 3D shoal of fish animation using the boids algorithm, OpenGL for rendering and CUDA for parallel processing.

boids cuda opengl

Last synced: 08 May 2026

https://github.com/sydney-informatics-hub/computer-vision-fine-tuning

Fine tune a computer vision to solve your task locally, on HPC, in a container, or in the cloud!

computer-vision cuda deep-learning python

Last synced: 09 May 2026

https://github.com/sugarcane-mk/finetuning_wav2vec2

This repo provides step by step process from sctatch to fine tune facebook's wav2vec2-large model using transformers

asr asr-model cuda facebook fairseq fine-tuning finetuning huggingface librosa python torch transformers wav2vec2 wav2vec2-large-960h

Last synced: 09 May 2026

https://github.com/ginkobalboa/parfis

Particles and field simulator. Written in C++ with Python bindings. The algorithm is based on the particle-in-cell (PIC) method used for interacting many-particle systems.

cpp cuda physics-simulation python

Last synced: 09 May 2026

https://github.com/dbklim/optimized_tensorflow_wheels

Optimized versions TensorFlow and TensorFlow-GPU for specific CPUs and GPUs (for both old and new).

cuda nvidia-cuda nvidia-gpu tensorflow tensorflow-community-wheels tensorflow-gpu tensorflow-packages tensorflow-whells wheels

Last synced: 09 May 2026

https://github.com/lfrati/subpair

Fast pairwise cosine distance calculation and numba accelerated evolutionary matrix subset extraction 🍐🚀

cosine-distance cuda numba

Last synced: 09 May 2026

https://github.com/starlitdreams/lunar-landing

This project implements a DQN agent using PyTorch to solve the LunarLander-v2 environment from OpenAI Gym. The agent learns to control the lunar lander using experience replay and a target network, aiming to maximize rewards by landing smoothly. Uses CUDA for computation.

artificial-intelligence cuda deep-learning gymnasium neural-network neural-networks numpy nvidia-gpu python python3 torch

Last synced: 09 May 2026

https://github.com/donaurelio/ansible-playbooks

A Bunch of ansible-playbooks that automate computer infraestruture provisioning

ansible-playbooks cuda docker gromacs openmpi

Last synced: 09 May 2026

https://github.com/michaelfranzl/image_fah-client

Dockerfile for Folding@home client with AMD and Nvidia GPGPU support

container cuda debian docker foldingathome gpu-computing opencl

Last synced: 09 May 2026

https://github.com/edumucelli/build-tensorflow

Build Tensorflow from source using a Dockerfile

cuda cudnn docker tensorflow

Last synced: 10 May 2026

https://github.com/sebftw/interp2gpu

GPU-accelerated 2D spline interpolation, à la interp2(..., "spline"), in MATLAB.

cuda gpu gpu-acceleration matlab spline spline-interpolation

Last synced: 10 May 2026

https://github.com/cashcon57/open-supersampling

OpenSuperSampling (OSS) — vendor-agnostic open-source RT denoising, upscaling, and frame extrapolation

cuda deep-learning dlss frame-generation fsr game-engine gaussian-splatting open-source real-time-rendering super-resolution upscaling

Last synced: 10 Jun 2026

https://github.com/dlr-amr/t8gpu

Header-only finite volume library targetting GPUs using t8code as meshing backend.

adaptive-mesh-refinement cuda finite-volume gpgpu-computing hpc mesh mpi parallel-computing simulation

Last synced: 10 May 2026

https://github.com/jeremywildsmith/shadowhash-distributed

Elixir distributed Shadow File password cracker with GPU accelerated cracking for md5crypt hashing algorithm.

cracking-hash cracking-hashes cracking-password cuda distributed-systems elixir erlang hashing nx security

Last synced: 11 May 2026

https://github.com/fabulani/360ip-with-cuda

360° Image Processing with CUDA and OpenCV.

360-image 360-video cpp cuda image-processing opencv

Last synced: 11 May 2026

https://github.com/apws25/accelmoe

This repository is for CUDA kernel re-implementation of CPU-based MoE model.

cpp cuda mixture-of-experts

Last synced: 11 May 2026

https://github.com/daniilvorontsov/fourier-option-pricing

MSc thesis project concerned with option pricing for Levy Jump models. Package includes pricing implementations for European Call and Put options for Carr-Madan, COS and Fourier Time Stepping.

carr-madan cuda fourier-transform monte-carlo option-pricing

Last synced: 11 May 2026

https://github.com/theogravity/dual-rtx-6000-blackwell-gemma-4-31b-it-nvfp4

Optimized vLLM setup for Gemma 4 31B NVFP4 with MTP on dual RTX PRO 6000 Blackwell using vllm and docker: native FP4 Tensor Cores, Multi-Token Prediction (96.5% acceptance rate), and prefix caching. Includes benchmark results and replication scripts.

am5 amd blackwell cuda docker fp4 gemma gemma4 llm-inference multi-token-prediction nvfp4 prefix-caching rtx-6000 speculative-decoding tensor-parallel vllm

Last synced: 11 May 2026

https://github.com/realdougeubanks/unmanic.plugin.encoder_video_hevc_nvenc_gpu

Unmanic plugin: H.265/HEVC encoder using NVIDIA hevc_nvenc with a true end-to-end GPU pipeline. Fork of Josh5/unmanic.plugin.encoder_video_hevc_nvenc that adds -hwaccel_output_format cuda when NVDEC HW decoding is enabled, keeping decoded frames in GPU memory through NVENC. Drop-in replacement with sensible defaults and full settings parity.

cuda ffmpeg hardware-acceleration nvdec nvenc nvidia unmanic unmanic-plugin video-transcoding

Last synced: 12 May 2026

https://github.com/tomaszrewak/csgpathtracer

A constructive solid geometry path tracer.

computer-graphics cuda path-tracing rendering

Last synced: 12 May 2026