An open API service indexing awesome lists of open source software.

CUDA

CUDAยฎ is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/lanl/stcuda

StCUDA allows Smalltalk to call CUDA Driver APIs to do GPU computing

cuda smalltalk visualworks

Last synced: 12 Apr 2025

https://github.com/bencardoen/singularity_slurm_cuda

Example on how to get started with Singularity and CUDA on a SLURM cluster

cuda nvidia singularity-container slurm-cluster tensorflow

Last synced: 15 Oct 2025

https://github.com/nyo16/llama_cpp_ex

Elixir bindings for llama.cpp โ€” run LLMs locally with Metal, CUDA, Vulkan, or CPU. Streaming, chat templates, embeddings, structured output, and concurrent batched inference.

cuda elixir llamacpp llm

Last synced: 04 Jun 2026

https://github.com/aresio/cupsoda

cupSODA is CUDA-powered coarse-grain deterministic simulator of mass-action kinetics models

biochemical cuda gpu-computing mass-action simulation

Last synced: 21 Feb 2026

https://github.com/abus-aikorea/aria-coversong

The best gradio web-ui for creating cover song that uses mdx-net and rvc. Easy one click installation. Fully portable.

cuda demucs gradio karaoke mdx-net nvidia python pytorch rvc song-covers uvr vocal-remover voice-conversion

Last synced: 25 Apr 2025

https://github.com/giovaneiwamoto/cuda-shortest-paths

๐Ÿงฉ Cuda Shortest Paths - Parallel Dijkstra and Floyd algorithms using Nvidia CUDA to calculate All-Pairs Shortest Path (APSP) in a given graph represented by its adjacency matrix.

all-pairs-shortest-path cuda nvidia

Last synced: 29 Apr 2025

https://github.com/radenmuaz/slope-ad

A small automatic differentiation engine, supporting higher-order derivatives

array autograd automatic-differentiation cuda gradient iree jvp machine-learning metal mlir onnx onnxruntime tensor vjp

Last synced: 26 Jun 2025

https://github.com/ashvardanian/scaling-democracy

GPU-accelerated Schulze voting method in Python, Numba, and CUDA, using ideas from Algebraic Graph Theory

cuda cuda-kernels dynamic-programming gpgpu graph-algorithms graph-theory pybind11 python voting

Last synced: 12 Apr 2025

https://github.com/phineas-pta/nvidia-win

NVIDIAโ€™s deep learning stack on Windows: CUDA toolkit + cuDNN + TensorRT

cuda cudnn guide tensorrt tutorial windows

Last synced: 12 Apr 2025

https://github.com/potato3d/grid

GPU-accelerated uniform grid construction for ray tracing

cuda glsl gpu grid ray-tracing

Last synced: 06 May 2025

https://github.com/neoheartbeats/neoheartbeats-kernel

An architecture for LLMs' continual-learning and long-term memories

cuda fine-tuning llama-factory llm

Last synced: 05 May 2025

https://github.com/rocm/hipmm

HIP Memory Manager (ROCm-DS)

amd cuda gpu hip memory-management radeon-instinct-mi-series rocm

Last synced: 12 Apr 2025

https://github.com/almirneeto99/leetgpu-challenges

This repository contains the solution for LeetGPU Challenges

cpp cuda gpu hpc

Last synced: 18 Apr 2026

https://github.com/sbaldu/neural_network_hep

Implementation of a neural network framework from scratch in C++ applied to particle physics

cpp cuda high-energy-physics neural-networks

Last synced: 20 Jul 2025

https://github.com/pyhf/cuda-images

pyhf Docker images built on Nvidia Container Toolkit enabled base images

cuda jax nvidia nvidia-cuda nvidia-docker pyhf

Last synced: 15 Jul 2025

https://github.com/lukoshkin/hpc

Skoltech HPC course

cuda curand hpc mpi omp

Last synced: 10 Apr 2025

https://github.com/neoblizz/cudagl

CUDA based Graphics Library for NVIDIA's GPUs.

cuda graphics-library graphics-programming opengl

Last synced: 18 Jun 2025

https://github.com/rapidsai/cugraph-docs

cuGraph Docs - RAPIDS Graph Analytics Documentation

cuda cugraph documentation graph rapids

Last synced: 12 Sep 2025

https://github.com/coderonion/cuda-beginner-course-rust-version

bilibili่ง†้ข‘ใ€CUDA 12.x ๅนถ่กŒ็ผ–็จ‹ๅ…ฅ้—จ(Rust็‰ˆ)ใ€‘้…ๅฅ—ไปฃ็ 

candle cpp cublas cuda cuda-programming cudarc cudnn gpu gpu-programming nvcc nvidia parellel-programming python rust

Last synced: 15 Jun 2025

https://github.com/taeguk/dist-prog-assignment

Sogang Univ. Distributed Programming (CSE5414) Assignments.

assignment cuda distributed mpi-library openmp parallel pthreads sogang

Last synced: 13 Jun 2025

https://github.com/belval/raytracing

Using CUDA to implement "Raytracing in one weekend" by Peter Shirley

cuda raytracing raytracing-in-one-weekend

Last synced: 12 Apr 2025

https://github.com/antoniopelusi/lu-solver

Assignments for High Performace Computing exam at Unimore, Modena, IT.

cuda lu-decomposition openmp

Last synced: 27 Feb 2026

https://github.com/pratikvn/schwarz-lib

Repository for testing asynchronous schwarz methods.

asynchronous cuda domain-decomposition ginkgo schwarz

Last synced: 14 Apr 2025

https://github.com/eunomia-bpf/basic-cuda-tutorial

A collection of CUDA programming examples to learn GPU programming

cuda tutorial

Last synced: 15 Jun 2025

https://github.com/anantzoid/cuda-genetic-algorithm-travelling-salesman-problem

Implementation of Parallel Genetic Algorithm in CUDA to solve TSP (Berlin52)

c cuda genetic-algorithm tsp tsp-solver

Last synced: 25 Jul 2025

https://github.com/rogerallen/smandelbrotr

SDL2 CUDA OpenGL Mandelbrot explorer.

cuda mandelbrot-viewer opengl sdl2

Last synced: 08 Mar 2026

https://github.com/harrydobbs/torch_ransac3d

A high-performance implementation of 3D RANSAC (Random Sample Consensus) algorithm using PyTorch and CUDA.

3d cloud cubiod cuda cylinder plane plane-detection point point-cloud ransac segmentation

Last synced: 03 Oct 2025

https://github.com/skizzy-create/ayurvedic_his

๐Ÿฉบ A personalized app that serves as your personal Ayurvedic assistant, providing tailored advice and guidance based on Ayurvedic principles. ๐Ÿฉบ

cuda gpt python pytorch transformers

Last synced: 04 Oct 2025

https://github.com/pietroglyph/argustag

A C++17 wrapper for Nvidia Argus with support for zero-copy frame transfers to CUDA kernels and CUDA-accelerated AprilTag detection with ISAAC (no ROS required).

apriltag argus cuda isaac nvargus

Last synced: 04 Oct 2025

https://github.com/tudasc/cusan

A data race detector for CUDA C and C++ based on ThreadSanitizer

c cpp cuda datarace threadsanitizer

Last synced: 12 Aug 2025

https://github.com/nikhilrout/thegemmcoreproject

SystemVerilog Implementation of Nvidia's CUDA/Tensor Core GEMM Operations

cuda floating-point gemm gpgpu hybrid-precision-training sparse-matrix systolic-array tensorcore tpu

Last synced: 17 Aug 2025

https://github.com/pkestene/tsp

traveling salesman problem solved with different programing models

cea cpp cuda kokkos nvidia-gpu openacc openmp performance-portability stdpar sycl

Last synced: 19 Aug 2025

https://github.com/coderonion/cuda-beginner-course-python-version

bilibili่ง†้ข‘ใ€CUDA 12.x ๅนถ่กŒ็ผ–็จ‹ๅ…ฅ้—จ(Python็‰ˆ)ใ€‘้…ๅฅ—ไปฃ็ 

cpp cublas cuda cuda-programming cudnn cupy gpu gpu-programming nvcc nvidia parallel-programming python rust

Last synced: 19 Oct 2025

https://github.com/rocm/numba-hip

HIP backend patch for Numba, the NumPy aware dynamic Python compiler using LLVM.

ai compiler cuda gpu hip hpc jit ml numba python radeon-instinct-mi-series rocm

Last synced: 31 Aug 2025

https://github.com/jpuigcerver/nnutils

CPU & CUDA implementation of several neural network utils

cuda deep-learning neural-networks openmp pytorch

Last synced: 11 Apr 2025

https://github.com/lanzani/opencv-cuda-docker

Docker with opencv with cuda support.

cuda docker nvidia-docker nvidia-gpu opencv opencv-cuda opencv-dnn

Last synced: 12 Oct 2025

https://github.com/k-hengzhou/hphoto

ไธ€ไธชๅŸบไบŽAI็š„ๆ™บ่ƒฝ็…ง็‰‡็ฎก็†ๅทฅๅ…ท๏ผŒๆ”ฏๆŒไบบ่„ธ่ฏ†ๅˆซใ€็›ธไผผไบบ่„ธ่‡ชๅŠจ่š็ฑปๅ’Œnsfwๆฃ€ๆต‹

cuda insightface nsfw nsfw-detection nudenet photos

Last synced: 26 Feb 2025

https://github.com/jmuwrobotics/libbicos

GPU-Accelerated Binary Correspondence Search for Multishot Stereo Vision

computer-vision cuda depth-map stereo-camera stereo-matching stereo-vision

Last synced: 14 Oct 2025

https://github.com/fatlipp/cuda-tree

CUDA-based Tree builder

algorithms cpp cuda octree quadtree tree

Last synced: 19 Jun 2025

https://github.com/cascadingradium/cuda-hungarian-clustering

A GPU-Accelerated Clustering Algorithm that uses the Hungarian method

clustering cpp cuda gpu hungarian-algorithm parallel-computing

Last synced: 16 May 2025

https://github.com/neural-bits/ai-programming-hub

Learn and experiment with new techniques and programming languages with a focus on ML

cpp cuda cython openai-triton python rust

Last synced: 12 Apr 2025

https://github.com/gapi505/sparky-2

This is a discord bot running on llama cpp with the llama 3 model and image geneartion

ai cuda llama3 llamacpp stable-diffusion torch transformers

Last synced: 07 Oct 2025

https://github.com/axnjr/snn_be_pro

A state of the art AI framework for no/low-code (visually - drag & drop) building, testing, deploying, integrating latest deep learning models with privacy & security compliance using ollama, as a final year project!

ai cplusplus cpp cuda deep-neural-networks kernel-driver ml mlops python

Last synced: 06 Oct 2025

https://github.com/pinto0309/realsense-cuda-opengl-docker

RealSense execution environment built on a Docker container on Ubuntu 20.04. NIVIDA GPU and OpenGL capable. CUADA 11.4.

cuda docker opengl realsense realsense2 ubuntu wsl2

Last synced: 24 Mar 2025

https://github.com/andydevs/cudafractal

Fractal Generator using Nvidia's CUDA framework

cplusplus cuda nvidia

Last synced: 23 Apr 2025

https://github.com/ROCm/hipMM

HIP Memory Manager (ROCm-DS)

amd cuda gpu hip memory-management radeon-instinct-mi-series rocm

Last synced: 12 Apr 2025

https://github.com/wzqvip/jetson-pytorch-builder

build PyTorch with CUDA for Jetson Orin and Thor.

cuda jetson pytorch

Last synced: 01 Dec 2025

https://github.com/abaksy/cuda-examples

A repository of examples coded in CUDA C/C++

cuda

Last synced: 31 May 2026

https://github.com/hbseong97/tf-c-api

Using tensorflow c api, c++ api, tf lite, tf js, model conversion in Windows

bazel checkpoint cuda cudnn tensorflow

Last synced: 09 Apr 2025

https://github.com/timothystewart6/vllm-gb10

Bleeding edge vLLM Docker image for the NVIDIA DGX Spark (GB10 / sm_121a).

arm64 cuda dgx-spark docker gb10 inference llm nvidia pytorch vllm

Last synced: 26 Jun 2026

https://github.com/hope2333/tsac-ng

็ฅž็ป้Ÿณ้ข‘็ผ–่งฃ็ ๅ™จ โ€” Multi-backend neural audio codec. CPU (AVX/AVX2/AVX-512, NEON/SVE, RVV), GPU (CUDA, HIP/ROCm, Vulkan), LLVM JIT. Clean-room implementation.

arm64 audio-codec avx c cuda dac hip llvm-jit neural-audio riscv simd vulkan

Last synced: 29 Jun 2026

https://github.com/donpablonows/coin

๐Ÿช™ Crypto Optimization Interface Network (aka COIN) is a high-performance Bitcoin address generator using CUDA acceleration and multi-threading. It optimizes GPU and CPU resources for fast address generation, ensures secure private key creation, and includes real-time monitoring and automatic system optimizations.

bitcoin blockchain cryptography cuda gpu-acceleration

Last synced: 07 May 2026

https://github.com/fynv/curandrtc

CURandRTC is a GPU random number generation module based on ThrustRTC.

cuda nvrtc random-number-generators thrust

Last synced: 05 May 2025

https://github.com/zeloe/rtconvolver

A realtime convolution VST3

c convolution cplusplus cuda juce

Last synced: 22 Apr 2025

https://github.com/jackeylea/cuda_linux

linuxไธ‹cuda/qtๆ•™็จ‹

cpp cuda cudnn qt5

Last synced: 26 Jul 2025

https://github.com/pnnl/cuvite

Multi-GPU Graph Community Detection using CUDA

community-detection cuda graph-clustering mpi

Last synced: 25 Jul 2025

https://github.com/official-imvoiid/portable-miniconda-setup-for-window

Portable Miniconda Setup for Windows ๐Ÿ Easily create a portable Conda environment with automated scripts for flexible Python version management and CUDA support. ๐Ÿš€

conda conda-environment cuda datascience machinelearning nvidia nvidia-cuda portable python

Last synced: 16 Apr 2026

https://github.com/rogerallen/qtmandelbrotr

Qt CUDA Mandelbrot explorer

cuda cuda-opengl mandelbrot-viewer qt5

Last synced: 02 Aug 2025

https://github.com/sean-bradley/cudalookupsha256

SHA256 Lookup using parallel processing on a NVidia CUDA Compatible Graphics card

cuda parallel-processing sha256

Last synced: 05 May 2025

https://github.com/648trindade/sbac-pad-marathon-problems

Repository containing problems of the SBAC-PAD Marathon of Parallel Programming and some parallel solutions to them.

cuda high-performance-computing mpi openmp parallel-computing

Last synced: 01 May 2025

https://github.com/mre/cudampi

Large hybrid CPU/GPU sorting network using CUDA and MPI

algorithms bucket bucketsort cuda filesystem gpu hybrid-cpu mpi parallel sorting-network

Last synced: 18 Apr 2026

https://github.com/rfsantacruz/mycudasamples

This is a series of CUDA C++ programming samples developed to study CUDA technology and its parallel programming model.

cpp cuda gpgpu

Last synced: 13 Apr 2025

https://github.com/sean-bradley/cudalookupripemd60

RipeMD160 Lookup using parallel processing on NVidia CUDA Graphics card

cuda parallel-processing ripemd160

Last synced: 05 May 2025

https://github.com/webis-de/pytorch-window-matmul

a custom CUDA kernel for windowed matrix multiplication

cuda cuda-kernel pytorch

Last synced: 31 Oct 2025

https://github.com/luismisanve/gguf-to-pytorchtensor

Simple Python Script that converts the Weight of a GGUF Model to a PyTorch Tensor

cuda gguf-models huggingface llamacpp numpy python pytorch tensor

Last synced: 20 Apr 2026

https://github.com/jaxony/pynvidia

โš™๏ธ NVIDIA GPU utilities for Python ๐Ÿ”ง

cuda deep-learning nvidia-gpu pip python utility

Last synced: 07 May 2025

https://github.com/caps-umu/fideslib

A server-side CKKS GPU library fully interoperable with OpenFHE.

ckks cuda gpu homomorphic-encryption openfhe

Last synced: 08 Oct 2025

https://github.com/xiaohaoo/yolo_tensorrt

Deploy the YOLOv8 model for inference using OpenCV and TensorRT in C/C++.

c cuda opencv tensorrt yolov8

Last synced: 16 Jul 2025

https://github.com/brosnanyuen/raybnn_raytrace

Ray tracing library using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI

arrayfire cuda gpu gpu-computing opencl parallel parallel-computing ray ray-tracing raybnn raylib raytracer raytracing rust

Last synced: 26 Aug 2025

https://github.com/kareimgazer/mat-transpose-cuda

series of trials for optimizing matrix transpose with CUDA

cuda hpc matrix parallel-computing simd

Last synced: 29 Mar 2025

https://github.com/aespinosadev/opengl-renderer

OpenGL renderer showcasing all basic functionality to render 3D scenes.

computer-graphics cuda gpgpu graphics-engine graphics-programming opengl rendering rendering-3d-graphics shaders video-game

Last synced: 24 Jul 2025

https://github.com/rupeshs/anomalydetection

Anomaly Detection Using Anomalib and OpenVINO โ€“ Step by Step by Guide

anomalib anomaly anomalydetection computer-vision cpu cuda gpu intel onnx opencv pytorch

Last synced: 13 Apr 2025

https://github.com/ellite/anchor-sub-sync

Anchor: A universal, hardware-accelerated CLI tool for subtitle synchronization (Whisper) and context-aware translation (NLLB)

ai audio-transcription automation cli cuda nllb python pytorch srt subtitle-sync subtitle-translation subtitles synchronization translation whisper

Last synced: 24 Feb 2026

https://github.com/cascadingradium/air-traffic-distribution

A GPU-Accelerated Multi-Objective Genetic Algorithm for Air Traffic Management

air-traffic-control air-traffic-management c cuda genetic-algorithm gpu-acceleration

Last synced: 16 May 2025

https://github.com/hyeonsangjeon/pdf2llm-tuning-studio

PDF ๋ฌธ์„œ์—์„œ GPU ๊ฐ€์† ์ฒ˜๋ฆฌ๋กœ ๊ณ ํ’ˆ์งˆ ์งˆ์˜์‘๋‹ต(QA) ๋ฐ์ดํ„ฐ๋ฅผ ์ž๋™ ์ƒ์„ฑํ•˜๊ณ  LLM์„ ํšจ์œจ์ ์œผ๋กœ ํŒŒ์ธํŠœ๋‹ํ•˜๋Š” ์†”๋ฃจ์…˜์ž…๋‹ˆ๋‹ค. Unstructured ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ AWS Bedrock Claude๋กœ ๋„๋ฉ”์ธ ํŠนํ™” QA ์Œ์„ ์ƒ์„ฑํ•˜๊ณ , LoRA ๊ธฐ๋ฒ•์œผ๋กœ ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค.

aws bedrock claude cuda data-argumantation data-extraction distillation docker finetuning gpu llm pdf-generation pdf-text-extraction processing processing-job sagemaker text-disti unsloth unstructured

Last synced: 15 Jun 2025

https://github.com/ergus/gpukalmanfilter

Kalman Filter test code using C, C++, Cuda and OpenCL.

cpp cuda gpgpu kalman-filter makefile opencl performance vectorization

Last synced: 28 Oct 2025

https://github.com/skailasa/pyrsvd

Accelerated Randomised SVD in Python

cuda numba python3 randomised-algorithms svd

Last synced: 07 May 2025

https://github.com/pkestene/cuda_mpi_autotools_proj_template

A template project for CUDA+MPI with autotools build system

automake autotools cuda cuda-mpi mpi

Last synced: 25 Oct 2025

https://github.com/bokutotu/curs

cuda&cublas&cudnn wrapper for Rust

cuda deep-learning high-performance-computing hpc rust

Last synced: 20 May 2026

https://github.com/amirhoseinmasoumi/onnx-cuda-inference

A C++ project for running CUDA-accelerated ONNX model inference, using ONNX Runtime and OpenCV for image segmentation tasks.

cpp cuda inference onnxruntime onnxruntime-gpu opencv segmentation

Last synced: 12 Apr 2025

https://github.com/kuroko1t/gocuda

Go binding for Cuda Driver API

cuda go golang

Last synced: 02 May 2026

https://github.com/matthewhaynesonline/ai-server-setup

Setup AWS EC2 instance from scratch with NVIDIA CUDA, Docker, Packer for AI / ML.

ai ami aws cuda devops docker ml mlops packer

Last synced: 12 Apr 2025

https://github.com/arminms/p2rng

A modern header-only C++ library for parallel algorithmic (pseudo) random number generation supporting OpenMP, CUDA, ROCm and oneAPI

cpp cuda cxx gpu header-only library linux macos multiplatorm oneapi openmp parallel pcg-random prng pseudorandom-number-generator random-number-distributions random-number-generation rocm stl-algorithms windows

Last synced: 04 Apr 2025