An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/rajarsheya/real-time-traffic-analysis-with-cuda-object-detection

Implemented CUDA-accelerated object detection (YOLO) to analyze a sample image dataset. Performed vehicle counting and simulated speed estimation to demonstrate real-time traffic analysis capabilities.

cpp cuda opencv python yolo

Last synced: 12 Apr 2026

https://github.com/bokutotu/cudnn_graph_api_example

cudnn graph api example

cuda cudnn cudnn-v8

Last synced: 04 May 2026

https://github.com/alekseyscorpi/vacancies_server

This is a server for vacancies generation using LLM (Saiga3)

code cuda cuda-toolkit docker dockerfile flask llama3 llamacpp llm ngrok pydantic saiga

Last synced: 06 Feb 2026

https://github.com/brendanbignell/cuda_montecarlooptionpricer

CUDA Monte Carlo Barrier Option Pricing Demo & Jupyer lab ML models

cuda deep-learning ml pytorch quantitative-finance xgboost-regression

Last synced: 19 Apr 2026

https://github.com/tortillazhawaii/fishes_cuda

3D boid simulation with GPU.

cuda opengl

Last synced: 04 May 2026

https://github.com/senli1073/docker-gpu-monitor

A lightweight GPU monitor designed for real-time web-based viewing of GPU server status.

container cuda docker flask gpu gpu-monitoring linux memory-usage nvidia-smi web

Last synced: 05 Apr 2026

https://github.com/hubenchang0515/fft-benchmark

一些 FFT 库的性能测试

cuda fft

Last synced: 27 Oct 2025

https://github.com/0xhilsa/pynum

a small python library for 1D and 2D arrays with GPU support

array c cuda nvcc python3

Last synced: 18 Apr 2026

https://github.com/5had3z/torch-discounted-cumsum-nd

PyTorch Discounted Cumsum with Autograd (CPU + CUDA)

cuda machine-learning pytorch

Last synced: 18 Apr 2026

https://github.com/sohhamseal/scalable-systems-programs

A little less effort to learn parallel programming...

cuda mpi openmp

Last synced: 18 Apr 2026

https://github.com/franciscoda/psvm

R package and C++ library that allows training SVM models in a GPU using CUDA and predicting out-of-sample data. A support vector machine (SVM) is a type of machine learning model that is trained using supervised data to classify samples.

cpp cpp17 cuda machine-learning r svm-classifier svm-training

Last synced: 18 Apr 2026

https://github.com/ergonomech/comfyui-windows-installer

Automated setup for ComfyUI on Windows with CUDA, custom plugins, and optimized PyTorch settings. Made to Run as Server and Error Correct,. Easy installation and launch using Miniconda.

automation comfy conda conda-environment cuda hosting-deployment setup windows

Last synced: 31 Mar 2025

https://github.com/ashwani-rathee/imagesgpu.jl

Image Processing on GPU in Julia

cuda gpu image image-processing julia

Last synced: 11 Jul 2025

https://github.com/sleeepyjack/multisplit

Simple multisplit for CUDA accelerators

cpp cuda gpu nvidia parallel-programming primitive split

Last synced: 20 May 2026

https://github.com/dafadey/GPGPU_OpenCL_vs_CUDA

This is a repository with sample codes for testing memory bandwidth, arithmetic latency hiding and shared/local memory performance on AMD and nVidia devices

cuda gpgpu gpgpu-computing opencl

Last synced: 16 May 2025

https://github.com/rjected/cuda-timelock

Solving a large number of timelock puzzles in parallel using GPU acceleration

c cgbn concurrent cpp cuda gmp graphics nvidia parallel puzzle timelock

Last synced: 14 Apr 2026

https://github.com/lhldev/rust-neural-network

neural network implementation in rust

cuda feedforward-neural-network

Last synced: 16 May 2026

https://github.com/saiccoumar/cuda-programming-exercises

Brief collection of GPU exercises (my reimplementation). Comes with relevant resources.

cuda cuda-programming nvcc nvidia

Last synced: 25 May 2026

https://github.com/himeyama/cuda-nmf

NMF calculations are performed on NVIDIA GPUs using the Cuda API. (GEM released)

cublas cuda gem nmf ruby

Last synced: 13 Apr 2026

https://github.com/liuyuweitarek/pytorch-docker-builder

Automate PyTorch Docker image builds with compatible Python, CUDA, and Poetry versions, including CI/CD for testing.

cicd containerd cuda docker docker-image poetry-python python python3 pytorch pytorch-docker

Last synced: 06 Feb 2026

https://github.com/adamczykpiotr/cudamatrixlibrary

Matrix operation library using single, n-threads or CUDA supported GPU

agh agh-ust cpp cuda cuda-library matrix matrix-computations matrix-functions matrix-multiplication

Last synced: 19 Apr 2026

https://github.com/rnabla/cuda-des

Bruteforcing DES using CUDA

bruteforce cuda data des encryption gpu parallel standard

Last synced: 27 Oct 2025

https://github.com/dvhh/masscorrelation

An exercise in writing an efficient correlation calculator

calculations correlation-calculation cuda matrix multi-threading openmp

Last synced: 15 May 2026

https://github.com/graiphic/graiphic-documentation

Graiphic Toolkits for LabVIEW provide advanced AI, GPU, and graph-oriented computing capabilities directly inside LabVIEW. Built on ONNX Runtime, they enable seamless integration of SOTA, Accelerator, and Deep Learning Toolkit for high-performance execution across CPUs, GPUs, and edge devices.

accelerator-toolkit ai-orchestration computer-vision cuda deep-learning directml edge-ai graph-computing hardware-acceleration high-performance-computing inference labview neural-networks onednn onnx onnxruntime openvino sota tensorrt training

Last synced: 22 Nov 2025

https://github.com/dotblueshoes/robertscross

The Roberts cross operator is used in image processing and computer vision for edge detection.

cuda edge-detection image-processing

Last synced: 30 Mar 2025

https://github.com/rajarsheya/real-time-audio-feature-extraction-with-cuda-for-speech-recognition

This project accelerates MFCC extraction using CUDA for real-time speech recognition. Offloading the process to the GPU reduces latency and speeds up processing, enabling fast, local speech-to-text transcription for applications like virtual assistants, without cloud reliance.

audio-processing cpp cuda fourier-transform python

Last synced: 10 May 2026

https://github.com/inventwithdean/cuda_mlp

Implementation of a simple Multilayer Perceptron in pure CUDA

cuda cuda-programming deep-learning neural-networks

Last synced: 30 Mar 2025

https://github.com/sartajbhuvaji/cuda

Deloped CUDA kernel functions to load and train a Convolution Neural Network from scratch.

cuda cuda-programming gpu-programming neural-network nvidia-cuda

Last synced: 30 Mar 2025

https://github.com/manishklach/intent-attention-kernel

Intent-aware attention research prototype that treats long-context inference as structured semantic blocks instead of a flat token stream, proving CPU-first correctness and analytical KV/FLOP savings before GPU kernel implementation.

agentic-ai ai-infrastructure attention block-attention cost-model cuda gpu-kernels inference kernel-research kv-cache llm-inference long-context python pytorch research semantic-attention sparse-attention systems transformers triton

Last synced: 28 May 2026

https://github.com/vishwamartur/btc_recovery

High-performance Bitcoin wallet password recovery system with GPU acceleration and integrated graphics support. Recover Bitcoin Core wallet.dat files without blockchain download using advanced algorithms and blockchain APIs.

bitcoin bitcoin-core blockchain blockchain-api cpp cryptocurrency cuda electrum gpu-acceleration integrated-graphics multithreading opencl password-recovery private-keys recovery-tools wallet-dat wallet-recovery

Last synced: 14 Apr 2026

https://github.com/varun-1703/eu-act-navigator-rag-qabot

An interactive, privacy-first application for querying the European Union’s AI Act using a local Retrieval-Augmented Generation (RAG) pipeline. Combines semantic search (FAISS) and a quantized TinyLlama LLM for fast, accurate, and context-aware answers—all running on your own hardware.

cuda faiss hugging-face-transformers langchain legal-tech local-slm machine-learning nlp open-source privacy rag-chatbot sentence-transformers streamlit tinyllama

Last synced: 03 May 2026

https://github.com/croko22/vit-cpp

An implementation of the Transformer model architecture ("Attention Is All You Need") in pure C++17 from scratch

cpp cuda deep-learning machine-learning neural-network transformer

Last synced: 17 Jan 2026

https://github.com/fandreuz/parallel-programming-for-hpc

Scientific codes in C/C++ with CUDA, OpenACC, FFTW, (cu)BLAS

cpp cuda hpc mpi

Last synced: 20 Apr 2026

https://github.com/renatomaynard/a-multiple-population-coarse-grained-genetic-algorithm-to-solve-the-quadratic-assignment-problem-

A Multiple-population coarse-grained Genetic Algorithm to solve the Quadratic Assignment Problem

c cuda genetic-algorithm quadratic-assignment-problem

Last synced: 09 May 2026

https://github.com/jonathanraiman/mini_cuda_rtc

Miniature CUDA Array library with Runtime Compilation

cpp11 cuda jit runtime-compilation

Last synced: 14 Apr 2026

https://github.com/jakubriegel/game_of_life_3d

3D game of life implemented in CUDA

concurency cuda gameoflife nvidia put-poznan

Last synced: 21 Apr 2026

https://github.com/mark0011astra/simplecuda

CUDAを使用したGPU演算をNumPyと同様のインターフェースで簡単行えるライブラリ。A library that allows users to easily perform GPU operations using CUDA with a NumPy-like interface.

cuda cupy gpu machine-learning numpy python vector

Last synced: 02 May 2026

https://github.com/patrickm663/localglmnet.jl

This is a WIP implementation of Richman & Wüthrich (2022) using Julia's Flux.jl + CUDA.jl

cuda deep-learning flux julia neural-networks symbolic-regression xai

Last synced: 22 Apr 2026

https://github.com/ayoussf/triton-hub

A container of various PyTorch neural network modules written in Triton.

cuda deep-learning openai pytorch triton triton-lang

Last synced: 14 Apr 2025

https://github.com/alpha74/hungarianalgocuda

Hungarian Algorithm for Linear Assignment Problem implemented using CUDA.

cuda nvcc parallel-computing parallel-programming

Last synced: 01 Jun 2026

https://github.com/programmer-rd-ai/digivis

A PyTorch-based deep learning implementation for MNIST digit recognition featuring CNNs, GPU acceleration, experiment tracking, and comprehensive testing capabilities.

cnn computer-vision cuda data-science deep-learning digit-recognition image-classification machine-learning mnist neural-networks python pytorch wandb

Last synced: 10 Jun 2025

https://github.com/hariprashad-ravikumar/accelerated-computing-in-cuda-c

This repo contains my codes for problem sets in NVIDIA Getting Started with Accelerated Computing in CUDA C/C++

c cuda cuda-kernels cuda-toolkit

Last synced: 24 Apr 2026

https://github.com/perl-openmp/p5-openmp-environment

Perl interface for manipulating OpenMP's environmental runtime execution variables

compiler cuda gcc gpu hpc openmp perl pthreads

Last synced: 19 Feb 2026

https://github.com/Programmer-RD-AI/DetectX

A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.

coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet

Last synced: 04 May 2025

https://github.com/alegau03/parallel-k-means

Implementation of C programs for the K-Means algorithm for parallel computing.

c c-programming cuda parallel parallel-programming

Last synced: 24 Apr 2026

https://github.com/mcp-tool-shop-org/backpropagate

Headless LLM fine-tuning in 3 lines — smart defaults, VRAM-aware batch sizing, multi-run SLAO, GGUF export for Ollama.

api cuda fine-tuning headless llm lora machine-learning ollama python qlora training unsloth web-security windows

Last synced: 31 May 2026

https://github.com/torotoki/simple-paged-attention

A simple implementation of PagedAttention purely written in CUDA and C++.

attention cpp cuda llm transformer

Last synced: 18 May 2026

https://github.com/emilienmendes/gpgpu

Parallélisation et optimisation de reconnaissance de point dans une image

cuda gpgpu parallel-programming

Last synced: 28 Oct 2025

https://github.com/jxlarrea/homeassistant-voice-recipes

GPU/CUDA-accelerated voice control stack for Home Assistant. Runs on x86/x64 and ARM64 (including the NVIDIA DGX Spark). 100% Local - No Cloud, No Subscriptions.

arm64 cuda dgx-spark gb10 gpu-acceleration home-assistant local-llm qwen3 speech-to-text text-to-speech voice-assistant x86-64

Last synced: 26 May 2026

https://github.com/ophoperhpo/dcgan-lentach-logo-generator

The Lentach logo generator. #MachineLearningFun

cuda dcgan dcgan-tensorflow keras lentach machinelearning ml

Last synced: 23 Feb 2025

https://github.com/enp1s0/curand_fp16

FP16 pseudo random number generator on GPU

cuda gpu half-precision random-number-generators

Last synced: 20 Aug 2025

https://github.com/a-nau/python-cuda-envs

Script to automatically map a specific CUDA version to a Conda Python environment.

anaconda anaconda-environment cuda installation installation-script python python-environment python3

Last synced: 18 Apr 2026

https://github.com/david-palma/cuda-programming

Educational CUDA C/C++ programming repository with commented examples on GPU parallel computing, matrix operations, and performance profiling. Requires a CUDA-enabled NVIDIA GPU.

c-cpp cpp cuda cuda-toolkit education gpu gpu-programming kernel matrix-operations nvcc nvidia parallel-computing parallel-programming practice profiling threads

Last synced: 25 Apr 2026

https://github.com/davidalgis/godot_cuda

Demonstration that it is possible to use CUDA directly from Godot engine.

cuda godot modules

Last synced: 03 May 2026

https://github.com/crcrpar/dev-chainer

Dockerfile for Chainer Development in VSCode

chainer cuda docker nvidia-docker vscode

Last synced: 26 Apr 2026

https://github.com/lightshade12/kittlespt

A hobby CUDA pathtracing renderer.

3d-graphics computer-graphics cuda gpu path-tracing ray-tracing

Last synced: 18 Mar 2025

https://github.com/mala13f/statistical-learning-in-finance

This Repository contains all the codes, papers and related data for assignments done during the course.

cuda gpu-acceleration jupyter-notebook machine-learning python statistical-learning

Last synced: 12 Apr 2026

https://github.com/gravitytwog/electromagneticfield

Electro-magnetic field simulation made with CUDA

c cuda cuda-kernels cuda-programming

Last synced: 26 Apr 2026

https://github.com/pvdberg1998/cufft_rust

A safe Rust wrapper around a subset of cuFFT.

cuda cufft fft rust

Last synced: 19 Apr 2025

https://github.com/thunder-compute/thunder-compute-documentation

Documentation for Thunder Compute, a cloud platform creating technology to virtualize GPUs over TCP

ai artificial-intelligence cloud cloud-computing cuda gpu llm machine-learning nvidia pytorch tensorflow thunder-compute virtualization

Last synced: 15 Oct 2025

https://github.com/mre/talks

...mostly Computer Science related.

computer-science cuda talks tech-talks

Last synced: 28 Apr 2026

https://github.com/vietdoo/seam-carving-cuda

CUDA Seam Carving: Accelerating Image Resizing with GPU Computing

cc cuda cuda-programming gpu-computing parrallel-computing seam-carving

Last synced: 02 May 2026

https://github.com/rkv0id/automata-vtk

Multi-dimensional Cellular Automata visualization using Python's VTK bindings on top of a CUDA-parallel grid updates.

cellular-automata cuda game-of-life python vtk

Last synced: 19 Apr 2026

https://github.com/pharmcat/metidacu.jl

CUDA solver for Metida.jl

cuda julia-language metida mixed-models

Last synced: 27 Apr 2026

https://github.com/codingrule/cuda-mbrot

Just another mandlebrot with cuda

cuda cuda-toolkit cupy fractal mandelbrot mathematics nvidia

Last synced: 27 Apr 2026

https://github.com/axel-ex/seame-ads-autonomous-lane-detection-24-25

🚗 Real-time lane detection and autonomous steering for JetRacer, powered by ROS2 and GPU-accelerated CV on Jetson Nano.

cuda jetson-nano ros2 tensorrt

Last synced: 27 Apr 2026

https://github.com/r3tr056/loc-ai-ly

Locaily - Making Large Language Model Inference Accessible on Consumer Hardware

cuda deepseek inference llama3 llamacpp llm

Last synced: 13 Apr 2026

https://github.com/tudasc/cusan-tests

A test suite for CUDA-aware MPI race detection

cuda dataracebench-cuda mpi

Last synced: 03 May 2026

https://github.com/satyajitghana/gpu-programming

Contains the contents of GPU Architecture and Programming course done on NPTEL

c cpp cuda cuda-programming gpu-programming nptel nvidia

Last synced: 09 Mar 2026

https://github.com/jtompuri/weighted-voronoi-stippling

High-performance weighted Voronoi stippling implementation. Exports PNG and TSP files. Visualizes TSP tours as continuous line drawings.

computer-graphics cuda gpu-acceleration lloyd-relaxation numba python stippling traveling-salesman tsp voronoi

Last synced: 18 May 2026

https://github.com/shahed-chy-suzan/psd-to-html--cuda

Cuda is a single page creative portfolio psd to html template which is built with HTML5 & CSS3. The site can be customized easily to suit your needs.

cuda portfolio psd-to-html

Last synced: 18 Jan 2026

https://github.com/dansolombrino/gphungarian

A GPU-accelerated implementation of the Hungarian Algorithm, written in CUDA

cuda gpu hpc opencl

Last synced: 31 Aug 2025

https://github.com/maelstrom6/mandelpy

A Mandelbrot and Buddhabrot viewer with GPU acceleration

buddhabrot cuda gpu mandelbrot python3

Last synced: 27 Apr 2026

https://github.com/xusworld/tars

Tars is a cool deep learning framework.

avx2 avx512 cuda deep-learning

Last synced: 27 Apr 2026

https://github.com/katpercent/raytracing

A foundation for ray tracing using CUDA and parallel computing techniques.

3d cuda engine game parrallel-computing ray raytracing

Last synced: 01 Nov 2025

https://github.com/mortafix/quickshift

A working implementation of Quickshift algorithm in CUDA, GPU-compatible.

cuda gpu-computing quickshift

Last synced: 08 May 2026

https://github.com/le-ander/msc_bioinfo-experimental_design

Using information theory to inform experimental design with GPU acceleration. Computing group project as part of the MSc in Bioinformatics and Theorectical Systems Biology at Imperial College London 2016/2017.

cuda experimental-design gpu-computing information-theory pycuda systems-biology

Last synced: 26 Apr 2026

https://github.com/shivendrra/axgrad

lightweight tensor library that contains it's own auto-diff engine like pytorch

autograd cuda pytorch scratch-implementation tinygrad

Last synced: 08 May 2026

https://github.com/dolongbien/cuda

CUDA and Caffe/Caffe2 installation Ubuntu 16.04

c3d-intel-caffe caffe caffe2 cuda cudnn deep-learning ubuntu

Last synced: 28 Apr 2026

https://github.com/tensorbfs/cutropicalgemm.jl

The fastest Tropical number matrix multiplication on GPU

cuda gemm tropical-algebra

Last synced: 20 Jan 2026

https://github.com/vipaka2/sdforge-docker

latest sd forge docker image.

cuda docker nvidia python

Last synced: 24 Jul 2025

https://github.com/bolner/totally-diffused

Debian/NVIDIA Docker image for AUTOMATIC1111's Stable Diffusion application.

automatic1111 cuda debian docker-image nvidia stable-diffusion xformers

Last synced: 11 Apr 2026

https://github.com/daelsepara/hipmandelbrot

GPU Implementation of Mandelbrot Fractal Generator with Benchmarking

amd cuda fractal gpu gpu-compute gpu-computing hip mandelbrot parallel-computing rocm sdk

Last synced: 20 Feb 2026

https://github.com/lcsb-biocore/cufluxsampler.jl

GPU-accelerated algorithms for flux sampling in CUDA.jl

cobra cuda gpu julia metabolic-network metabolism sampling

Last synced: 02 May 2026

https://github.com/abhisheknair10/occupancy.nn

An multi-step pipeline to train and inference Occupancy Networks

3d-reconstruction cuda vision

Last synced: 20 Jul 2025