An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/jakubriegel/game_of_life_3d

3D game of life implemented in CUDA

concurency cuda gameoflife nvidia put-poznan

Last synced: 21 Apr 2026

https://github.com/ehsanmok/cs-521

UBC CS 521: Parallel Computing and Architectures

cuda erlang parallel-algorithm parallel-computing

Last synced: 16 May 2026

https://github.com/patrickm663/localglmnet.jl

This is a WIP implementation of Richman & Wüthrich (2022) using Julia's Flux.jl + CUDA.jl

cuda deep-learning flux julia neural-networks symbolic-regression xai

Last synced: 22 Apr 2026

https://github.com/michaelfranzl/image_debian-gpgpu

Dockerfile for a Debian base image with AMD and Nvidia GPGPU support

amd container container-image cuda debian docker gpgpu nvidia opencl

Last synced: 10 May 2026

https://github.com/hariprashad-ravikumar/accelerated-computing-in-cuda-c

This repo contains my codes for problem sets in NVIDIA Getting Started with Accelerated Computing in CUDA C/C++

c cuda cuda-kernels cuda-toolkit

Last synced: 24 Apr 2026

https://github.com/saiccoumar/cuda-programming-exercises

Brief collection of GPU exercises (my reimplementation). Comes with relevant resources.

cuda cuda-programming nvcc nvidia

Last synced: 25 May 2026

https://github.com/alegau03/parallel-k-means

Implementation of C programs for the K-Means algorithm for parallel computing.

c c-programming cuda parallel parallel-programming

Last synced: 24 Apr 2026

https://github.com/dansolombrino/gphungarian

A GPU-accelerated implementation of the Hungarian Algorithm, written in CUDA

cuda gpu hpc opencl

Last synced: 31 Aug 2025

https://github.com/mayukhdeb/patrick

Tiny neural net library written from scratch with cupy :warning: under construction :warning:

cuda deep-learning gpu-computing machine-learning neural-network regression

Last synced: 08 May 2026

https://github.com/david-palma/cuda-programming

Educational CUDA C/C++ programming repository with commented examples on GPU parallel computing, matrix operations, and performance profiling. Requires a CUDA-enabled NVIDIA GPU.

c-cpp cpp cuda cuda-toolkit education gpu gpu-programming kernel matrix-operations nvcc nvidia parallel-computing parallel-programming practice profiling threads

Last synced: 25 Apr 2026

https://github.com/steleman/llvm-21.1.8

LLVM 21.1.8 on Fedora 41/43 and some additions for Torch-MLIR, ONNX-MLIR and IREE

clang cuda fedora fedora-41 fedora-43 iree llvm mlir onnx-mlir torch-mlir

Last synced: 07 Apr 2026

https://github.com/crcrpar/dev-chainer

Dockerfile for Chainer Development in VSCode

chainer cuda docker nvidia-docker vscode

Last synced: 26 Apr 2026

https://github.com/pvdberg1998/cufft_rust

A safe Rust wrapper around a subset of cuFFT.

cuda cufft fft rust

Last synced: 19 Apr 2025

https://github.com/gravitytwog/electromagneticfield

Electro-magnetic field simulation made with CUDA

c cuda cuda-kernels cuda-programming

Last synced: 26 Apr 2026

https://github.com/programmer-rd-ai/object-detection-framework

A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.

coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet

Last synced: 24 Sep 2025

https://github.com/r3tr056/loc-ai-ly

Locaily - Making Large Language Model Inference Accessible on Consumer Hardware

cuda deepseek inference llama3 llamacpp llm

Last synced: 13 Apr 2026

https://github.com/pharmcat/metidacu.jl

CUDA solver for Metida.jl

cuda julia-language metida mixed-models

Last synced: 27 Apr 2026

https://github.com/codingrule/cuda-mbrot

Just another mandlebrot with cuda

cuda cuda-toolkit cupy fractal mandelbrot mathematics nvidia

Last synced: 27 Apr 2026

https://github.com/axel-ex/seame-ads-autonomous-lane-detection-24-25

🚗 Real-time lane detection and autonomous steering for JetRacer, powered by ROS2 and GPU-accelerated CV on Jetson Nano.

cuda jetson-nano ros2 tensorrt

Last synced: 27 Apr 2026

https://github.com/le-ander/msc_bioinfo-experimental_design

Using information theory to inform experimental design with GPU acceleration. Computing group project as part of the MSc in Bioinformatics and Theorectical Systems Biology at Imperial College London 2016/2017.

cuda experimental-design gpu-computing information-theory pycuda systems-biology

Last synced: 26 Apr 2026

https://github.com/maelstrom6/mandelpy

A Mandelbrot and Buddhabrot viewer with GPU acceleration

buddhabrot cuda gpu mandelbrot python3

Last synced: 27 Apr 2026

https://github.com/xusworld/tars

Tars is a cool deep learning framework.

avx2 avx512 cuda deep-learning

Last synced: 27 Apr 2026

https://github.com/dafadey/GPGPU_OpenCL_vs_CUDA

This is a repository with sample codes for testing memory bandwidth, arithmetic latency hiding and shared/local memory performance on AMD and nVidia devices

cuda gpgpu gpgpu-computing opencl

Last synced: 16 May 2025

https://github.com/dolongbien/cuda

CUDA and Caffe/Caffe2 installation Ubuntu 16.04

c3d-intel-caffe caffe caffe2 cuda cudnn deep-learning ubuntu

Last synced: 28 Apr 2026

https://github.com/vipaka2/sdforge-docker

latest sd forge docker image.

cuda docker nvidia python

Last synced: 24 Jul 2025

https://github.com/xavierjiezou/gpu-compute-capability

An application for querying the computing power of each gpu released by NVIDIA.

cuda gpu nvidia

Last synced: 28 Apr 2026

https://github.com/quantum-integrated-technologies/deepforge

DeepForge : framework for working with machine learning.

ai artificial-intelligence cuda library machine-learning ml neural-network

Last synced: 31 Jul 2025

https://github.com/abhisheknair10/occupancy.nn

An multi-step pipeline to train and inference Occupancy Networks

3d-reconstruction cuda vision

Last synced: 20 Jul 2025

https://github.com/leocelente/basic_cuda

My CUDA source files while learning

cpp cuda gpgpu

Last synced: 29 Apr 2026

https://github.com/asadiahmad/gesture-detection

Real-time Gesture Detection using CUDA-accelerated OpenCV in Python.

computer-vision cuda gesture-recognition gpu-acceleration open-pose opencv opencv-cuda pose-detection real-time

Last synced: 29 Apr 2026

https://github.com/anras5/parallel-computing

Comparing CPU and GPU

cuda gpu openmp

Last synced: 29 Apr 2026

https://github.com/nofaralfasi/parallel-sequence-alignment

A parallelized version of multiple DNA sequence alignment algorithm with MPI, OpenMP and CUDA

cuda mpi openmp sequence-alignment

Last synced: 29 Apr 2026

https://github.com/nickolasrm/gpuvscpumatrixmultiplication

CPU and GPU optimized matrix multiplication (AVX, transposition, CUDA and other)

avx comparison cuda hpc matrix multiplication

Last synced: 06 Sep 2025

https://github.com/mhaseeb123/gcb

GCB includes a suite of benchmarks and basic tests for CUDA-aware MPI and C++ compilers.

cpp cpp23 cuda mpi partitioned-communication st-mpi

Last synced: 17 May 2026

https://github.com/ismailtekin05/caloriedetectingai

🍎🔍 Smart AI system that identifies food items in photos and calculates their calorie content automatically. Built with TensorFlow, YOLOv8, CUDA and computer vision for accurate nutrition tracking.

ai aimodel calorie-calculator computer-vision cuda data-analysis data-science data-segmentation data-visualization dataset dataset-generation image-processing image-recognition python segmentation-models tensorflow ultralytics yaml yolo yolov8

Last synced: 29 Apr 2026

https://github.com/steleman/openai-triton

Fork of OpenAI's Triton compiler v3.4.0 using LLVM 21.1.0 / 21.1.1 on Fedora 41+

cuda fedora linux llvm mlir mlir-dialect openai rocm triton

Last synced: 08 Apr 2026

https://github.com/kartavyaantani/cuda_image_processing

A CUDA-accelerated image processing project featuring multiple GPU-based filters and enhancement techniques. Implements convolution, edge detection, Non-Local Means (NLM) denoising, K-Nearest Neighbors (KNN), and pixelization. Each operation is optimized using CUDA kernels for real-time performance on large images. The project supports command-line

cuda cuda-kernels cuda-programming cuda-toolkit gpu-programming high-performance-computing image-manipulation image-processing nvidia-cuda nvidia-gpu

Last synced: 30 Apr 2026

https://github.com/blazekill/hello-cuda

Cpp + Vcpkg + CUDA + VsCode starter project.

cpp cuda vcpkg vscode

Last synced: 18 May 2026

https://github.com/graiphic/graiphic-documentation

Graiphic Toolkits for LabVIEW provide advanced AI, GPU, and graph-oriented computing capabilities directly inside LabVIEW. Built on ONNX Runtime, they enable seamless integration of SOTA, Accelerator, and Deep Learning Toolkit for high-performance execution across CPUs, GPUs, and edge devices.

accelerator-toolkit ai-orchestration computer-vision cuda deep-learning directml edge-ai graph-computing hardware-acceleration high-performance-computing inference labview neural-networks onednn onnx onnxruntime openvino sota tensorrt training

Last synced: 22 Nov 2025

https://github.com/romaingrx/ml-nix-flake

A simple nix flake to start ML env with uv and cuda out of the box

cuda ml nix nix-flake uv

Last synced: 30 Apr 2026

https://github.com/tensorbfs/cutropicalgemm.jl

The fastest Tropical number matrix multiplication on GPU

cuda gemm tropical-algebra

Last synced: 20 Jan 2026

https://github.com/eric900115/parallelprogramming

The repository contains the coursework for CS5422, NTHU's Parallel Programming Course.

cuda mpi openmp ucx

Last synced: 26 May 2026

https://github.com/jessetg/cuda-practice

Working through the chapters of Cuda by Example

c cpp cuda cuda-by-example gpgpu

Last synced: 01 May 2026

https://github.com/maawad/ptx_bcht

Bucketed Cuckoo hash set written in PTX and JIT-compiled.

cuckoo cuda gpu hash hashset ptx

Last synced: 01 May 2026

https://github.com/antonioberna/nn-gpu-logic-gates

Neural Network implementation on GPU using CUDA C++ to learn logic gates operations

cpp cuda gpu logic-gates neural-networks nvidia

Last synced: 01 May 2026

https://github.com/alwaysai/jetpack-46-hacky-hour

NVIDIA’s Jetpack 4.6 capabilities and how to use them with EdgeIQ, alwaysAI Computer Vision framework.

alwaysai computer-vision cuda edge-computing jetpack tensorrt

Last synced: 01 May 2026

https://github.com/dhruvsrikanth/monte-carlo-ray-tracing

In this repository, you will find a serial and distributed GPU-based implementation of the ray tracing simulation.

c cpp cuda gpu-computing gpu-programming high-performance-computing parallel-programming raytracing unified-memory-parallelism

Last synced: 01 May 2026

https://github.com/aliyoussef97/triton-hub

A container of various PyTorch neural network modules written in Triton.

cuda deep-learning openai pytorch triton triton-lang

Last synced: 30 Mar 2025

https://github.com/mala13f/statistical-learning-in-finance

This Repository contains all the codes, papers and related data for assignments done during the course.

cuda gpu-acceleration jupyter-notebook machine-learning python statistical-learning

Last synced: 12 Apr 2026

https://github.com/linux-alex/geep

GEEP (Genetic Evolutionary Engineering Platform) - a C++/Qt framework for genetic programming, optimized with CUDA acceleration. GEEP enables large-scale population-based optimization, ideal for solving high-dimensional problems using evolutionary algorithms and GPU computing.

cpp cuda framework genetic-programming

Last synced: 18 May 2026

https://github.com/fblupi/grado_informatica-ppr

Prácticas de la asignatura Programación Paralela de la UGR

cuda mpi openmp parallel-computing

Last synced: 22 Apr 2026

https://github.com/katpercent/raytracing

A foundation for ray tracing using CUDA and parallel computing techniques.

3d cuda engine game parrallel-computing ray raytracing

Last synced: 01 Nov 2025

https://github.com/thisalmandula/gpu_accelerated_lpt_cfd_code

This repository contains GPU accelerated version of the particle tracking model developed by Merel Kooi for biofouled microplastic particles ( available at: https://pubs.acs.org/doi/10.1021/acs.est.6b04702) written in CUDA Fortran and CUDA Python. This repository is intended as a learning tool for GPU programming.

biofouling computational-fluid-dynamics cuda fortran lagrangian-particle-tracking microplastics python

Last synced: 02 May 2026

https://github.com/xlisp/learn-vllm

vllm learning

cuda nvidia pytorch vllm

Last synced: 10 May 2026

https://github.com/gvvsnrnaveen/cuda

this repository contains the various programs that can written using CUDA Toolkit.

c cpp cuda nvcc nvidia-cuda nvidia-gpu

Last synced: 17 Jan 2026

https://github.com/straightchlorine/quantum-pipeline

A Python module for executing and monitoring quantum algorithms across local simulators and IBM Quantum platforms. Seamlessly handles data collection, organization, and streaming to Apache Kafka

apache-kafka apache-spark aws-s3 cuda docker gpu-acceleration ibm-cloud ibm-quantum minio qiskit qiskit-aer qiskit-nature quantum-computing visualizations vqe

Last synced: 08 Oct 2025

https://github.com/erosiv/silt

simple immediate lightweight tensors

cmake cuda simulation tensor

Last synced: 31 Oct 2025

https://github.com/han-minhee/sgemm_hip

SGEMM implementations in HIP for NVIDIA / AMD GPUs

cuda gpgpu gpu hip rocm

Last synced: 27 Apr 2026

https://github.com/xihuai18/image-processing-in-cuda

Implementation of Image Processing Method

cuda imageprocessing

Last synced: 04 Oct 2025

https://github.com/emilienmendes/gpgpu

Parallélisation et optimisation de reconnaissance de point dans une image

cuda gpgpu parallel-programming

Last synced: 28 Oct 2025

https://github.com/nikolaydubina/basic-openai-pytorch-server

Minimal HTTP inference server in OpenAI API with Pytorch and CUDA

cuda docker llm openai pytorch server

Last synced: 12 Apr 2026

https://github.com/jtompuri/weighted-voronoi-stippling

High-performance weighted Voronoi stippling implementation. Exports PNG and TSP files. Visualizes TSP tours as continuous line drawings.

computer-graphics cuda gpu-acceleration lloyd-relaxation numba python stippling traveling-salesman tsp voronoi

Last synced: 18 May 2026

https://github.com/miniex/maidenx

Rust-based CUDA library designed for learning purposes and building my AI engines named Maiden Engine

ai cuda rust

Last synced: 20 Mar 2025

https://github.com/duskvirkus/ofxarrayfire

An openFrameworks addon with pre-compiled binaries of ArrayFire.

arrayfire cuda ofxaddon openframeworks openframeworks-addon

Last synced: 09 May 2026

https://github.com/mortafix/quickshift

A working implementation of Quickshift algorithm in CUDA, GPU-compatible.

cuda gpu-computing quickshift

Last synced: 08 May 2026

https://github.com/sarah627/horus_eye_fcih_graduation_project

An AI-powered tourism website using YOLOv7 for real-time landmark detection in images. Built with Flask, PyTorch, and Roboflow for seamless tourist interaction.

computer-vision cuda flask jupyter-notebook kaggle matplotlib object-detection opencv python pytorch roboflow

Last synced: 14 Apr 2026

https://github.com/tudasc/cusan-tests

A test suite for CUDA-aware MPI race detection

cuda dataracebench-cuda mpi

Last synced: 03 May 2026

https://github.com/rjected/cuda-timelock

Solving a large number of timelock puzzles in parallel using GPU acceleration

c cgbn concurrent cpp cuda gmp graphics nvidia parallel puzzle timelock

Last synced: 14 Apr 2026

https://github.com/lhldev/rust-neural-network

neural network implementation in rust

cuda feedforward-neural-network

Last synced: 16 May 2026

https://github.com/poyea/lollipop

🍭 Sweet GPU compute kernels in CUDA, wrapped via CuPy

cuda cuda-kernel cuda-kernels cuda-programming gpu-kernels gpu-programming python

Last synced: 17 Jun 2026

https://github.com/ezamagni/knapsack-simd

A genetic 01-Knapsack problem solver in CUDA

cuda knapsack-problem knapsack01

Last synced: 09 May 2026

https://github.com/sun-zhenxing/fast-neural-style

快速风格迁移部署

cuda cv2 fast-neural-style opencv

Last synced: 05 May 2026

https://github.com/manishklach/gpu-resident-inference-lab

Research lab for GPU-resident LLM inference loops: persistent kernels, sparse KV selection, tiered residency, speculative decode, and trace-driven scheduling.

cuda gpu-systems kv-cache llm-inference mega-kernel model-systems persistent-kernel runtime speculative-decoding

Last synced: 19 Jun 2026

https://github.com/jayemscript/llm-systems-from-scratch

A hands-on learning project for building the core systems behind Large Language Models using C++, Rust, and optional Python/JavaScript bindings. Includes tensor operations, autograd, neural networks, tokenization, and a minimal transformer pipeline.

ai-systems autograd c-language cpp cuda educational-project high-performance-computing inference-engine machine-learning neural-networks-from-scratch pybind11 tensor-library tokenization transformers wasm

Last synced: 19 Jun 2026

https://github.com/speedcell4/torchdevice

Setup CUDA_VISIBLE_DEVICES

cuda deep-learning gpu machine-learning pytorch

Last synced: 07 May 2026

https://github.com/skillfulelectro/integral-solver

Simple integral solver

c cpp cuda math mathematics

Last synced: 08 May 2026

https://github.com/seongwon980/htop-gpu

Terminal dashboard for NVIDIA GPUs, system CPU/memory, and processes — clickable, with conda env / docker container / cwd info per process.

btop cli cuda dashboard gpu htop machine-learning monitor nvidia nvtop python sysadmin terminal tui

Last synced: 22 Jun 2026

https://github.com/daaboulex/unsloth-nix

Unsloth (git main) packaged for NixOS — CPU/CUDA/ROCm LoRA fine-tuning envs

cuda fine-tuning flake lora machine-learning nix nixos nixos-module pytorch rocm unsloth

Last synced: 10 Jun 2026

https://github.com/daelsepara/hipslm

CPU and GPU (using HIP) implementations of phase pattern generators for use with spatial light modulators

computer-generated-holography cuda gpu hip hologram holography phase phase-pattern slm spatial-light-modulator

Last synced: 22 Jun 2026

https://github.com/alextmjugador/rust-cuda-quickstart

Bring the Rust-CUDA project back to life under modern Linux environments.

cuda cuda-programming cuda-rust cuda-support docker rust

Last synced: 06 May 2026

https://github.com/abhans/archdev

Container that is built with Arch Linux with NVIDIA Driver & CUDA support, PyTorch and TensorFlow built in.

archlinux container cuda docker

Last synced: 07 May 2026

https://github.com/xebastex/sfw-python

Python package designed to provide the essentials tools for off-the-grid inverse problem. This is the bedrock for future GUI implementation.

blasso cuda frank-wolfe pytorch

Last synced: 09 May 2026

https://github.com/pedro-avalos/cuda-samples-snap

Unofficial snap for CUDA Samples

cuda gpu gpu-test linux nvidia package snap snapcraft

Last synced: 08 May 2026

https://github.com/jblaschke/pynvtx

Thin pybind11 wrapper for NVTX wrappers -- with some bells and whistles attached.

cuda nvtx nvtx-markers

Last synced: 23 Jun 2026

https://github.com/kibotu/llm-windows-server

Turn your Windows GPU into a private, low-latency LLM server. Docker-based, OpenAI-compatible API.

agentic cuda docker gguf llma-cpp local-llm nvidia-gpu openai-api opencode qwen self-hosted windows

Last synced: 10 Jun 2026

https://github.com/timothystewart6/ubuntu-gb10

Ubuntu 24.04 + NVIDIA stack setup guide for GB10 / DGX Spark systems

ansible ansible-playbook arm64 blackwell cuda dgx gpu grace-blackwell homelab nvidia nvidia-driver ubuntu

Last synced: 26 Jun 2026