An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/mhaseeb123/gcb

GCB includes a suite of benchmarks and basic tests for CUDA-aware MPI and C++ compilers.

cpp cpp23 cuda mpi partitioned-communication st-mpi

Last synced: 17 May 2026

https://github.com/pjueon/cuda_intellisense

A simple python script to fix cuda C++ intellisense for visual studio.

cuda visual-studio

Last synced: 09 Apr 2026

https://github.com/antonioberna/nn-gpu-logic-gates

Neural Network implementation on GPU using CUDA C++ to learn logic gates operations

cpp cuda gpu logic-gates neural-networks nvidia

Last synced: 01 May 2026

https://github.com/rajarsheya/real-time-traffic-analysis-with-cuda-object-detection

Implemented CUDA-accelerated object detection (YOLO) to analyze a sample image dataset. Performed vehicle counting and simulated speed estimation to demonstrate real-time traffic analysis capabilities.

cpp cuda opencv python yolo

Last synced: 12 Apr 2026

https://github.com/0xhilsa/pynum

a small python library for 1D and 2D arrays with GPU support

array c cuda nvcc python3

Last synced: 18 Apr 2026

https://github.com/enriquebdel/clases-cuda-programacion-paralela-en-c-

En este repositorio encontrarás varias lecciones creadas por mí sobre la librería CUDA en C. El programa que utilizo para programar es MobaXterm.

c cuda cuda-programming gnu-linux googlecolab mobaxterm nvidia parallel-programming ubuntu university

Last synced: 19 May 2026

https://github.com/satyajitghana/gpu-programming

Contains the contents of GPU Architecture and Programming course done on NPTEL

c cpp cuda cuda-programming gpu-programming nptel nvidia

Last synced: 09 Mar 2026

https://github.com/brendanbignell/cuda_montecarlooptionpricer

CUDA Monte Carlo Barrier Option Pricing Demo & Jupyer lab ML models

cuda deep-learning ml pytorch quantitative-finance xgboost-regression

Last synced: 19 Apr 2026

https://github.com/romaingrx/ml-nix-flake

A simple nix flake to start ML env with uv and cuda out of the box

cuda ml nix nix-flake uv

Last synced: 30 Apr 2026

https://github.com/tensorbfs/cutropicalgemm.jl

The fastest Tropical number matrix multiplication on GPU

cuda gemm tropical-algebra

Last synced: 20 Jan 2026

https://github.com/graiphic/graiphic-documentation

Graiphic Toolkits for LabVIEW provide advanced AI, GPU, and graph-oriented computing capabilities directly inside LabVIEW. Built on ONNX Runtime, they enable seamless integration of SOTA, Accelerator, and Deep Learning Toolkit for high-performance execution across CPUs, GPUs, and edge devices.

accelerator-toolkit ai-orchestration computer-vision cuda deep-learning directml edge-ai graph-computing hardware-acceleration high-performance-computing inference labview neural-networks onednn onnx onnxruntime openvino sota tensorrt training

Last synced: 22 Nov 2025

https://github.com/rhysdg/whisper-onnx-python

A low-footprint GPU accelerated Speech to Text Python package for the Jetpack 5 era bolstered by an optimized graph

ai chatbot cuda machine-learning onnxruntime speech-to-text whisper

Last synced: 16 Feb 2026

https://github.com/dhruvsrikanth/fastconv

Distributed and serial implementations of the 2D Convolution operation in c++ and CUDA.

convolution-filters cpp cuda gpu-programming high-performance-computing hpc image-editor image-processing nvidia parallel-programming

Last synced: 04 May 2026

https://github.com/aliyoussef97/triton-hub

A container of various PyTorch neural network modules written in Triton.

cuda deep-learning openai pytorch triton triton-lang

Last synced: 30 Mar 2025

https://github.com/thomasonzhou/minitorch

rebuilding pytorch: from autograd to convolutions in CUDA

cuda numba numpy

Last synced: 02 Feb 2026

https://github.com/han-minhee/sgemm_hip

SGEMM implementations in HIP for NVIDIA / AMD GPUs

cuda gpgpu gpu hip rocm

Last synced: 27 Apr 2026

https://github.com/gabrielmaialva33/enton

Autonomous AI Robot Assistant — Vision, Voice, and Soul

ai autonomous-agent computer-vision cuda llm python pytorch robot stt tts whisper yolo

Last synced: 01 Apr 2026

https://github.com/mcp-tool-shop-org/backpropagate

Headless LLM fine-tuning in 3 lines — smart defaults, VRAM-aware batch sizing, multi-run SLAO, GGUF export for Ollama.

api cuda fine-tuning headless llm lora machine-learning ollama python qlora training unsloth web-security windows

Last synced: 31 May 2026

https://github.com/manishklach/intent-attention-kernel

Intent-aware attention research prototype that treats long-context inference as structured semantic blocks instead of a flat token stream, proving CPU-first correctness and analytical KV/FLOP savings before GPU kernel implementation.

agentic-ai ai-infrastructure attention block-attention cost-model cuda gpu-kernels inference kernel-research kv-cache llm-inference long-context python pytorch research semantic-attention sparse-attention systems transformers triton

Last synced: 28 May 2026

https://github.com/vietdoo/seam-carving-cuda

CUDA Seam Carving: Accelerating Image Resizing with GPU Computing

cc cuda cuda-programming gpu-computing parrallel-computing seam-carving

Last synced: 02 May 2026

https://github.com/rajarsheya/real-time-audio-feature-extraction-with-cuda-for-speech-recognition

This project accelerates MFCC extraction using CUDA for real-time speech recognition. Offloading the process to the GPU reduces latency and speeds up processing, enabling fast, local speech-to-text transcription for applications like virtual assistants, without cloud reliance.

audio-processing cpp cuda fourier-transform python

Last synced: 10 May 2026

https://github.com/Programmer-RD-AI/DetectX

A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.

coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet

Last synced: 04 May 2025

https://github.com/hartorn/docker-python

Repository to build python image, based on ubuntu and CUDA

cuda docker mkl-dnn onednn python3 ubuntu ubuntu1804

Last synced: 05 May 2026

https://github.com/xza85hrf/ml-framework_checker

ML Framework and CUDA Checker is a Python-based GUI application for checking PyTorch, TensorFlow, and CUDA installations. It provides detailed system specs, compatibility checks, advanced GPU management, and offers options to view instructions, export logs, and update machine learning frameworks.

compatibility cuda gpu-management gui-application machine-learning python pytorch system-checker system-specs tensorflow

Last synced: 28 Apr 2026

https://github.com/ezroot/gacc

GIACC - Generate Images, Art, Code and Conversations

ai codegen cuda huggingface image imagegeneration python rust stablediffusion

Last synced: 06 Apr 2026

https://github.com/programmer-rd-ai/digivis

A PyTorch-based deep learning implementation for MNIST digit recognition featuring CNNs, GPU acceleration, experiment tracking, and comprehensive testing capabilities.

cnn computer-vision cuda data-science deep-learning digit-recognition image-classification machine-learning mnist neural-networks python pytorch wandb

Last synced: 10 Jun 2025

https://github.com/ergonomech/comfyui-windows-installer

Automated setup for ComfyUI on Windows with CUDA, custom plugins, and optimized PyTorch settings. Made to Run as Server and Error Correct,. Easy installation and launch using Miniconda.

automation comfy conda conda-environment cuda hosting-deployment setup windows

Last synced: 31 Mar 2025

https://github.com/tommaso-dognini/polimi_gpu101_courseproject

Polimi Passion In Action GPU101 course project. Implementation in CUDA of BFS algorithm

cpp cuda cuda-programming parallel-computing

Last synced: 10 Apr 2026

https://github.com/sleeepyjack/multisplit

Simple multisplit for CUDA accelerators

cpp cuda gpu nvidia parallel-programming primitive split

Last synced: 20 May 2026

https://github.com/mala13f/statistical-learning-in-finance

This Repository contains all the codes, papers and related data for assignments done during the course.

cuda gpu-acceleration jupyter-notebook machine-learning python statistical-learning

Last synced: 12 Apr 2026

https://github.com/bjornmelin/deep-learning-evolution

🧠 Deep-Learning Evolution: Unified collection of TensorFlow & PyTorch projects, featuring custom CUDA kernels, distributed training, memory‑efficient methods, and production‑ready pipelines. Showcases advanced GPU optimizations, from foundational models to cutting‑edge architectures. 🚀

ai-research cuda data-science deep-learning distributed-training gan gpu-acceleration machine-learning model-optimization neural-networks python pytorch tensorflow training-pipeline transformers

Last synced: 09 May 2026

https://github.com/tortillazhawaii/fishes_cuda

3D boid simulation with GPU.

cuda opengl

Last synced: 04 May 2026

https://github.com/vishwamartur/btc_recovery

High-performance Bitcoin wallet password recovery system with GPU acceleration and integrated graphics support. Recover Bitcoin Core wallet.dat files without blockchain download using advanced algorithms and blockchain APIs.

bitcoin bitcoin-core blockchain blockchain-api cpp cryptocurrency cuda electrum gpu-acceleration integrated-graphics multithreading opencl password-recovery private-keys recovery-tools wallet-dat wallet-recovery

Last synced: 14 Apr 2026

https://github.com/zhangjun/my_notes

Daily stuffs

cuda gpu

Last synced: 17 Apr 2026

https://github.com/saiccoumar/cuda-programming-exercises

Brief collection of GPU exercises (my reimplementation). Comes with relevant resources.

cuda cuda-programming nvcc nvidia

Last synced: 25 May 2026

https://github.com/bl33h/pythagoreantheorem

A program that calculates the Pythagorean theorem for a large number of elements using GPU parallel processing.

arrays cuda kernel parallel-programming pythagoras pythagorean-theorem

Last synced: 19 May 2026

https://github.com/matx64/rs-netbot

Old School Runescape bot with CNN for object identification

cuda numpy python pytorch

Last synced: 04 May 2026

https://github.com/straightchlorine/quantum-pipeline

A Python module for executing and monitoring quantum algorithms across local simulators and IBM Quantum platforms. Seamlessly handles data collection, organization, and streaming to Apache Kafka

apache-kafka apache-spark aws-s3 cuda docker gpu-acceleration ibm-cloud ibm-quantum minio qiskit qiskit-aer qiskit-nature quantum-computing visualizations vqe

Last synced: 08 Oct 2025

https://github.com/haleelrah/Vision-pro-MAX

A Raspberry Pi-based object detection system for assisting visually impaired individuals. This project utilizes YOLO object detection and a Hailo 8L TPU to identify obstacles like manholes, potholes, and bumps, providing real-time audio feedback to aid navigation.

bash computer-vision cuda fine-tuning jupyter-notebook object-detection opencv python pytorch raspberry-pi rpi-camera ssh text-to-speech ultralytics yolo yolov8

Last synced: 30 Dec 2025

https://github.com/pintamonas4575/tfg-diffusion-model-customdataset

Creación en Pytorch de un modelo de difusión para generación incondicional de imágenes con un dataset propio.

attention-mechanism cnn cosine-scheduler cuda custom-dataset ddim deep-learning diffusion-models gpu image-generation pytorch

Last synced: 17 Apr 2026

https://github.com/kchristin22/ising_model

Implementation of a cellular automaton on GPU using different features of CUDA

cellular-automaton cuda gpu-programming hpc ising-model parallel-computing

Last synced: 15 Mar 2025

https://github.com/dvhh/masscorrelation

An exercise in writing an efficient correlation calculator

calculations correlation-calculation cuda matrix multi-threading openmp

Last synced: 15 May 2026

https://github.com/microo8/micronn

Simple neural network library with backpropagation using CUDA

c cuda neural-network

Last synced: 19 May 2026

https://github.com/jtompuri/weighted-voronoi-stippling

High-performance weighted Voronoi stippling implementation. Exports PNG and TSP files. Visualizes TSP tours as continuous line drawings.

computer-graphics cuda gpu-acceleration lloyd-relaxation numba python stippling traveling-salesman tsp voronoi

Last synced: 18 May 2026

https://github.com/rnabla/cuda-des

Bruteforcing DES using CUDA

bruteforce cuda data des encryption gpu parallel standard

Last synced: 27 Oct 2025

https://github.com/shahed-chy-suzan/psd-to-html--cuda

Cuda is a single page creative portfolio psd to html template which is built with HTML5 & CSS3. The site can be customized easily to suit your needs.

cuda portfolio psd-to-html

Last synced: 18 Jan 2026

https://github.com/liuyuweitarek/pytorch-docker-builder

Automate PyTorch Docker image builds with compatible Python, CUDA, and Poetry versions, including CI/CD for testing.

cicd containerd cuda docker docker-image poetry-python python python3 pytorch pytorch-docker

Last synced: 06 Feb 2026

https://github.com/bhattbhavesh91/rapids-cudf-cuml-example

Running KNN algorithm much faster on GPU for free using RAPIDS packages like cuML and cuDF

cuda cuml deep-learning nvidia-gpu rapids rapidsai

Last synced: 17 Apr 2026

https://github.com/hubenchang0515/fft-benchmark

一些 FFT 库的性能测试

cuda fft

Last synced: 27 Oct 2025

https://github.com/alwaysai/jetpack-46-hacky-hour

NVIDIA’s Jetpack 4.6 capabilities and how to use them with EdgeIQ, alwaysAI Computer Vision framework.

alwaysai computer-vision cuda edge-computing jetpack tensorrt

Last synced: 01 May 2026

https://github.com/xihuai18/image-processing-in-cuda

Implementation of Image Processing Method

cuda imageprocessing

Last synced: 04 Oct 2025

https://github.com/ayoussf/triton-hub

A container of various PyTorch neural network modules written in Triton.

cuda deep-learning openai pytorch triton triton-lang

Last synced: 14 Apr 2025

https://github.com/andrewboessen/bitonic-merge-sort

Bitonic Merge Sort algorithm optimized for GPU execution

bitonic-merge-sort cuda sorting-network

Last synced: 16 May 2026

https://github.com/dansolombrino/gphungarian

A GPU-accelerated implementation of the Hungarian Algorithm, written in CUDA

cuda gpu hpc opencl

Last synced: 31 Aug 2025

https://github.com/bl33h/productoftwovectors

This code utilizes CUDA for parallel vector multiplication on a GPU, demonstrating GPU's acceleration capabilities.

cuda gpu kernel paralelism parallel-programming product vector

Last synced: 16 May 2026

https://github.com/nikolaydubina/basic-openai-pytorch-server

Minimal HTTP inference server in OpenAI API with Pytorch and CUDA

cuda docker llm openai pytorch server

Last synced: 12 Apr 2026

https://github.com/ehsanmok/cs-521

UBC CS 521: Parallel Computing and Architectures

cuda erlang parallel-algorithm parallel-computing

Last synced: 16 May 2026

https://github.com/alekseyscorpi/vacancies_server

This is a server for vacancies generation using LLM (Saiga3)

code cuda cuda-toolkit docker dockerfile flask llama3 llamacpp llm ngrok pydantic saiga

Last synced: 06 Feb 2026

https://github.com/ashwani-rathee/imagesgpu.jl

Image Processing on GPU in Julia

cuda gpu image image-processing julia

Last synced: 11 Jul 2025

https://github.com/gunrock/template

Template repository for essentials applications to get you started asap!

cpp cuda essentials gpu graph-algorithms graph-analytics gunrock

Last synced: 15 May 2026

https://github.com/bjornmelin/ml-vision-lab

👁️ Production-grade computer vision implementations. Real-world applications in image processing, object detection, and video analytics with GPU acceleration. 📸

computer-vision cuda deep-learning image-processing object-detection opencv pytorch video-analytics

Last synced: 04 Apr 2026

https://github.com/emmanuelmess/firstcollisiontimesteprarefiedgassimulator

This simulator computes all possible intersections for a very small timestep for a particle model

cpp20 cuda simulator

Last synced: 17 Apr 2026

https://github.com/dafadey/GPGPU_OpenCL_vs_CUDA

This is a repository with sample codes for testing memory bandwidth, arithmetic latency hiding and shared/local memory performance on AMD and nVidia devices

cuda gpgpu gpgpu-computing opencl

Last synced: 16 May 2025

https://github.com/michaelfranzl/image_debian-gpgpu

Dockerfile for a Debian base image with AMD and Nvidia GPGPU support

amd container container-image cuda debian docker gpgpu nvidia opencl

Last synced: 10 May 2026

https://github.com/shivendrra/axgrad

lightweight tensor library that contains it's own auto-diff engine like pytorch

autograd cuda pytorch scratch-implementation tinygrad

Last synced: 08 May 2026

https://github.com/vipaka2/sdforge-docker

latest sd forge docker image.

cuda docker nvidia python

Last synced: 24 Jul 2025

https://github.com/enp1s0/curand_fp16

FP16 pseudo random number generator on GPU

cuda gpu half-precision random-number-generators

Last synced: 20 Aug 2025

https://github.com/mre/talks

...mostly Computer Science related.

computer-science cuda talks tech-talks

Last synced: 28 Apr 2026

https://github.com/andih/cuda-fortran-stream

Variant of STREAM Benchmark in CUDA Fortran

cuda cuda-fortran gpu stream-benchmarks variants

Last synced: 02 Mar 2025

https://github.com/iag-geo/image-classification

Image classification scripts using YOLOv5 with aerial imagery

cuda image-classification python pytorch swimming-pools yolov5

Last synced: 22 Feb 2026

https://github.com/ezamagni/knapsack-simd

A genetic 01-Knapsack problem solver in CUDA

cuda knapsack-problem knapsack01

Last synced: 09 May 2026

https://github.com/poyea/lollipop

🍭 Sweet GPU compute kernels in CUDA, wrapped via CuPy

cuda cuda-kernel cuda-kernels cuda-programming gpu-kernels gpu-programming python

Last synced: 17 Jun 2026

https://github.com/seieric/gst-dsobjectsmask

📀NVIDIA DeepStream integrated GStreamer Plugin. Mask objects with cuda cores on Jetson boards. Fast and smooth since everything is done on NVMM.🏎

cuda cuda-programming deepstream gpu gstreamer gstreamer-plugins instance-segmentation jetson-agx-orin jetson-agx-xavier jetson-tx1 jetson-tx2 jetson-xavier maskrcnn nvidia-jetson nvidia-jetson-nano opencv opencv4 resnet resnet50

Last synced: 06 May 2026

https://github.com/manishklach/gpu-resident-inference-lab

Research lab for GPU-resident LLM inference loops: persistent kernels, sparse KV selection, tiered residency, speculative decode, and trace-driven scheduling.

cuda gpu-systems kv-cache llm-inference mega-kernel model-systems persistent-kernel runtime speculative-decoding

Last synced: 19 Jun 2026

https://github.com/skillfulelectro/integral-solver

Simple integral solver

c cpp cuda math mathematics

Last synced: 08 May 2026

https://github.com/jayemscript/llm-systems-from-scratch

A hands-on learning project for building the core systems behind Large Language Models using C++, Rust, and optional Python/JavaScript bindings. Includes tensor operations, autograd, neural networks, tokenization, and a minimal transformer pipeline.

ai-systems autograd c-language cpp cuda educational-project high-performance-computing inference-engine machine-learning neural-networks-from-scratch pybind11 tensor-library tokenization transformers wasm

Last synced: 19 Jun 2026

https://github.com/speedcell4/torchdevice

Setup CUDA_VISIBLE_DEVICES

cuda deep-learning gpu machine-learning pytorch

Last synced: 07 May 2026

https://github.com/seongwon980/htop-gpu

Terminal dashboard for NVIDIA GPUs, system CPU/memory, and processes — clickable, with conda env / docker container / cwd info per process.

btop cli cuda dashboard gpu htop machine-learning monitor nvidia nvtop python sysadmin terminal tui

Last synced: 22 Jun 2026

https://github.com/alextmjugador/rust-cuda-quickstart

Bring the Rust-CUDA project back to life under modern Linux environments.

cuda cuda-programming cuda-rust cuda-support docker rust

Last synced: 06 May 2026

https://github.com/daelsepara/hipslm

CPU and GPU (using HIP) implementations of phase pattern generators for use with spatial light modulators

computer-generated-holography cuda gpu hip hologram holography phase phase-pattern slm spatial-light-modulator

Last synced: 22 Jun 2026

https://github.com/poodarchu/vision-lab

Computer Vision Experiments in all.

computer-vision cuda object-detection

Last synced: 07 May 2026

https://github.com/pedro-avalos/cuda-samples-snap

Unofficial snap for CUDA Samples

cuda gpu gpu-test linux nvidia package snap snapcraft

Last synced: 08 May 2026

https://github.com/kibotu/llm-windows-server

Turn your Windows GPU into a private, low-latency LLM server. Docker-based, OpenAI-compatible API.

agentic cuda docker gguf llma-cpp local-llm nvidia-gpu openai-api opencode qwen self-hosted windows

Last synced: 10 Jun 2026

https://github.com/jblaschke/pynvtx

Thin pybind11 wrapper for NVTX wrappers -- with some bells and whistles attached.

cuda nvtx nvtx-markers

Last synced: 23 Jun 2026

https://github.com/xebastex/sfw-python

Python package designed to provide the essentials tools for off-the-grid inverse problem. This is the bedrock for future GUI implementation.

blasso cuda frank-wolfe pytorch

Last synced: 09 May 2026

https://github.com/uefi-code/msra_thepracticespaceproject_pytorchcuda

My repo to attend MSRA the Practice Space Project 2022, CUDA Implement and Optimize

ann cuda pytorch

Last synced: 06 May 2026

https://github.com/timothystewart6/ubuntu-gb10

Ubuntu 24.04 + NVIDIA stack setup guide for GB10 / DGX Spark systems

ansible ansible-playbook arm64 blackwell cuda dgx gpu grace-blackwell homelab nvidia nvidia-driver ubuntu

Last synced: 26 Jun 2026

https://github.com/sun-zhenxing/fast-neural-style

快速风格迁移部署

cuda cv2 fast-neural-style opencv

Last synced: 05 May 2026