An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/abus-aikorea/aria-coversong

The best gradio web-ui for creating cover song that uses mdx-net and rvc. Easy one click installation. Fully portable.

cuda demucs gradio karaoke mdx-net nvidia python pytorch rvc song-covers uvr vocal-remover voice-conversion

Last synced: 25 Apr 2025

https://github.com/ncar/micm

A model-independent chemistry module for atmosphere models

atmospheric-chemistry atmospheric-modeling atmospheric-science cuda gpu gpu-acceleration hpc ode-solver

Last synced: 11 Apr 2025

https://github.com/raymondcm/blockmatching

CPU and CUDA implementation of Full Exhaustive Block Matching Algorithm using Integral Images

block-matching-algorithm cuda integral-image parallel vision

Last synced: 27 Apr 2025

https://github.com/torinos-yt/nnonnx

Using CUDA for Faster Machine Learning Inference on Unity

cuda machine-learning onnxruntime unity

Last synced: 09 Jul 2025

https://github.com/ashvardanian/scaling-democracy

GPU-accelerated Schulze voting method in Python, Numba, and CUDA, using ideas from Algebraic Graph Theory

cuda cuda-kernels dynamic-programming gpgpu graph-algorithms graph-theory pybind11 python voting

Last synced: 12 Apr 2025

https://github.com/giovaneiwamoto/cuda-shortest-paths

🧩 Cuda Shortest Paths - Parallel Dijkstra and Floyd algorithms using Nvidia CUDA to calculate All-Pairs Shortest Path (APSP) in a given graph represented by its adjacency matrix.

all-pairs-shortest-path cuda nvidia

Last synced: 29 Apr 2025

https://github.com/bencardoen/singularity_slurm_cuda

Example on how to get started with Singularity and CUDA on a SLURM cluster

cuda nvidia singularity-container slurm-cluster tensorflow

Last synced: 15 Oct 2025

https://github.com/pliablepixels/simpleyolo

A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

cuda darknet opencv python3 yolov3

Last synced: 26 Oct 2025

https://github.com/jonasricker/autocvd

Tool to automatically set CUDA_VISIBLE_DEVICES based on GPU utilization. Usable from command line and code.

cuda cuda-visible-devices gpu keras machine-learning nvidia python pytorch tensorflow

Last synced: 26 Feb 2026

https://github.com/nyo16/llama_cpp_ex

Elixir bindings for llama.cpp — run LLMs locally with Metal, CUDA, Vulkan, or CPU. Streaming, chat templates, embeddings, structured output, and concurrent batched inference.

cuda elixir llamacpp llm

Last synced: 04 Jun 2026

https://github.com/pinto0309/realsense-cuda-opengl-docker

RealSense execution environment built on a Docker container on Ubuntu 20.04. NIVIDA GPU and OpenGL capable. CUADA 11.4.

cuda docker opengl realsense realsense2 ubuntu wsl2

Last synced: 24 Mar 2025

https://github.com/rocm/hipmm

HIP Memory Manager (ROCm-DS)

amd cuda gpu hip memory-management radeon-instinct-mi-series rocm

Last synced: 12 Apr 2025

https://github.com/taeguk/dist-prog-assignment

Sogang Univ. Distributed Programming (CSE5414) Assignments.

assignment cuda distributed mpi-library openmp parallel pthreads sogang

Last synced: 13 Jun 2025

https://github.com/axnjr/snn_be_pro

A state of the art AI framework for no/low-code (visually - drag & drop) building, testing, deploying, integrating latest deep learning models with privacy & security compliance using ollama, as a final year project!

ai cplusplus cpp cuda deep-neural-networks kernel-driver ml mlops python

Last synced: 06 Oct 2025

https://github.com/skizzy-create/ayurvedic_his

🩺 A personalized app that serves as your personal Ayurvedic assistant, providing tailored advice and guidance based on Ayurvedic principles. 🩺

cuda gpt python pytorch transformers

Last synced: 04 Oct 2025

https://github.com/pratikvn/schwarz-lib

Repository for testing asynchronous schwarz methods.

asynchronous cuda domain-decomposition ginkgo schwarz

Last synced: 14 Apr 2025

https://github.com/andydevs/cudafractal

Fractal Generator using Nvidia's CUDA framework

cplusplus cuda nvidia

Last synced: 23 Apr 2025

https://github.com/rogerallen/smandelbrotr

SDL2 CUDA OpenGL Mandelbrot explorer.

cuda mandelbrot-viewer opengl sdl2

Last synced: 08 Mar 2026

https://github.com/nikhilrout/thegemmcoreproject

SystemVerilog Implementation of Nvidia's CUDA/Tensor Core GEMM Operations

cuda floating-point gemm gpgpu hybrid-precision-training sparse-matrix systolic-array tensorcore tpu

Last synced: 17 Aug 2025

https://github.com/jpuigcerver/nnutils

CPU & CUDA implementation of several neural network utils

cuda deep-learning neural-networks openmp pytorch

Last synced: 11 Apr 2025

https://github.com/k-hengzhou/hphoto

一个基于AI的智能照片管理工具,支持人脸识别、相似人脸自动聚类和nsfw检测

cuda insightface nsfw nsfw-detection nudenet photos

Last synced: 26 Feb 2025

https://github.com/rapidsai/cugraph-docs

cuGraph Docs - RAPIDS Graph Analytics Documentation

cuda cugraph documentation graph rapids

Last synced: 12 Sep 2025

https://github.com/coderonion/cuda-beginner-course-python-version

bilibili视频【CUDA 12.x 并行编程入门(Python版)】配套代码

cpp cublas cuda cuda-programming cudnn cupy gpu gpu-programming nvcc nvidia parallel-programming python rust

Last synced: 19 Oct 2025

https://github.com/ROCm/hipMM

HIP Memory Manager (ROCm-DS)

amd cuda gpu hip memory-management radeon-instinct-mi-series rocm

Last synced: 12 Apr 2025

https://github.com/cascadingradium/cuda-hungarian-clustering

A GPU-Accelerated Clustering Algorithm that uses the Hungarian method

clustering cpp cuda gpu hungarian-algorithm parallel-computing

Last synced: 16 May 2025

https://github.com/lukoshkin/hpc

Skoltech HPC course

cuda curand hpc mpi omp

Last synced: 10 Apr 2025

https://github.com/pyhf/cuda-images

pyhf Docker images built on Nvidia Container Toolkit enabled base images

cuda jax nvidia nvidia-cuda nvidia-docker pyhf

Last synced: 15 Jul 2025

https://github.com/neural-bits/ai-programming-hub

Learn and experiment with new techniques and programming languages with a focus on ML

cpp cuda cython openai-triton python rust

Last synced: 12 Apr 2025

https://github.com/rocm/numba-hip

HIP backend patch for Numba, the NumPy aware dynamic Python compiler using LLVM.

ai compiler cuda gpu hip hpc jit ml numba python radeon-instinct-mi-series rocm

Last synced: 31 Aug 2025

https://github.com/anantzoid/cuda-genetic-algorithm-travelling-salesman-problem

Implementation of Parallel Genetic Algorithm in CUDA to solve TSP (Berlin52)

c cuda genetic-algorithm tsp tsp-solver

Last synced: 25 Jul 2025

https://github.com/neoblizz/cudagl

CUDA based Graphics Library for NVIDIA's GPUs.

cuda graphics-library graphics-programming opengl

Last synced: 18 Jun 2025

https://github.com/belval/raytracing

Using CUDA to implement "Raytracing in one weekend" by Peter Shirley

cuda raytracing raytracing-in-one-weekend

Last synced: 12 Apr 2025

https://github.com/lanzani/opencv-cuda-docker

Docker with opencv with cuda support.

cuda docker nvidia-docker nvidia-gpu opencv opencv-cuda opencv-dnn

Last synced: 12 Oct 2025

https://github.com/eunomia-bpf/basic-cuda-tutorial

A collection of CUDA programming examples to learn GPU programming

cuda tutorial

Last synced: 15 Jun 2025

https://github.com/fatlipp/cuda-tree

CUDA-based Tree builder

algorithms cpp cuda octree quadtree tree

Last synced: 19 Jun 2025

https://github.com/tudasc/cusan

A data race detector for CUDA C and C++ based on ThreadSanitizer

c cpp cuda datarace threadsanitizer

Last synced: 12 Aug 2025

https://github.com/sbaldu/neural_network_hep

Implementation of a neural network framework from scratch in C++ applied to particle physics

cpp cuda high-energy-physics neural-networks

Last synced: 20 Jul 2025

https://github.com/almirneeto99/leetgpu-challenges

This repository contains the solution for LeetGPU Challenges

cpp cuda gpu hpc

Last synced: 18 Apr 2026

https://github.com/pkestene/tsp

traveling salesman problem solved with different programing models

cea cpp cuda kokkos nvidia-gpu openacc openmp performance-portability stdpar sycl

Last synced: 19 Aug 2025

https://github.com/antoniopelusi/lu-solver

Assignments for High Performace Computing exam at Unimore, Modena, IT.

cuda lu-decomposition openmp

Last synced: 27 Feb 2026

https://github.com/gapi505/sparky-2

This is a discord bot running on llama cpp with the llama 3 model and image geneartion

ai cuda llama3 llamacpp stable-diffusion torch transformers

Last synced: 07 Oct 2025

https://github.com/harrydobbs/torch_ransac3d

A high-performance implementation of 3D RANSAC (Random Sample Consensus) algorithm using PyTorch and CUDA.

3d cloud cubiod cuda cylinder plane plane-detection point point-cloud ransac segmentation

Last synced: 03 Oct 2025

https://github.com/wzqvip/jetson-pytorch-builder

build PyTorch with CUDA for Jetson Orin and Thor.

cuda jetson pytorch

Last synced: 01 Dec 2025

https://github.com/jmuwrobotics/libbicos

GPU-Accelerated Binary Correspondence Search for Multishot Stereo Vision

computer-vision cuda depth-map stereo-camera stereo-matching stereo-vision

Last synced: 14 Oct 2025

https://github.com/hbseong97/tf-c-api

Using tensorflow c api, c++ api, tf lite, tf js, model conversion in Windows

bazel checkpoint cuda cudnn tensorflow

Last synced: 09 Apr 2025

https://github.com/abaksy/cuda-examples

A repository of examples coded in CUDA C/C++

cuda

Last synced: 31 May 2026

https://github.com/pietroglyph/argustag

A C++17 wrapper for Nvidia Argus with support for zero-copy frame transfers to CUDA kernels and CUDA-accelerated AprilTag detection with ISAAC (no ROS required).

apriltag argus cuda isaac nvargus

Last synced: 04 Oct 2025

https://github.com/caps-umu/fideslib

A server-side CKKS GPU library fully interoperable with OpenFHE.

ckks cuda gpu homomorphic-encryption openfhe

Last synced: 08 Oct 2025

https://github.com/hyeonsangjeon/pdf2llm-tuning-studio

PDF 문서에서 GPU 가속 처리로 고품질 질의응답(QA) 데이터를 자동 생성하고 LLM을 효율적으로 파인튜닝하는 솔루션입니다. Unstructured 라이브러리와 AWS Bedrock Claude로 도메인 특화 QA 쌍을 생성하고, LoRA 기법으로 경량 모델을 훈련합니다.

aws bedrock claude cuda data-argumantation data-extraction distillation docker finetuning gpu llm pdf-generation pdf-text-extraction processing processing-job sagemaker text-disti unsloth unstructured

Last synced: 15 Jun 2025

https://github.com/guilt/rocm-programming-masterclass

Udemy's CUDA programming Masterclass with Examples in ROCM/HIP.

cuda easy hip learning-by-doing masterclass rocm

Last synced: 04 Aug 2025

https://github.com/skailasa/pyrsvd

Accelerated Randomised SVD in Python

cuda numba python3 randomised-algorithms svd

Last synced: 07 May 2025

https://github.com/rogerallen/qtmandelbrotr

Qt CUDA Mandelbrot explorer

cuda cuda-opengl mandelbrot-viewer qt5

Last synced: 02 Aug 2025

https://github.com/pmeier/tox-ltt

Install PyTorch distributions with light-the-torch

cuda install light-the-torch pip plugin pytorch tox

Last synced: 25 Aug 2025

https://github.com/sean-bradley/cudalookupripemd60

RipeMD160 Lookup using parallel processing on NVidia CUDA Graphics card

cuda parallel-processing ripemd160

Last synced: 05 May 2025

https://github.com/xiaohaoo/yolo_tensorrt

Deploy the YOLOv8 model for inference using OpenCV and TensorRT in C/C++.

c cuda opencv tensorrt yolov8

Last synced: 16 Jul 2025

https://github.com/chiang-yuan/culsm

CUDA C++ code implementing GPU-accelerated Lattice Spring Model (CuLSM) simulations.

cuda gpu parallel-computing particles

Last synced: 07 Sep 2025

https://github.com/sean-bradley/cudalookupsha256

SHA256 Lookup using parallel processing on a NVidia CUDA Compatible Graphics card

cuda parallel-processing sha256

Last synced: 05 May 2025

https://github.com/official-imvoiid/portable-miniconda-setup-for-window

Portable Miniconda Setup for Windows 🐍 Easily create a portable Conda environment with automated scripts for flexible Python version management and CUDA support. 🚀

conda conda-environment cuda datascience machinelearning nvidia nvidia-cuda portable python

Last synced: 16 Apr 2026

https://github.com/rupeshs/anomalydetection

Anomaly Detection Using Anomalib and OpenVINO – Step by Step by Guide

anomalib anomaly anomalydetection computer-vision cpu cuda gpu intel onnx opencv pytorch

Last synced: 13 Apr 2025

https://github.com/yuvix25/py2cuda

Convert Python 3 code to CUDA code.

converter cuda gpu gpu-acceleration python python3

Last synced: 11 Sep 2025

https://github.com/cascadingradium/air-traffic-distribution

A GPU-Accelerated Multi-Objective Genetic Algorithm for Air Traffic Management

air-traffic-control air-traffic-management c cuda genetic-algorithm gpu-acceleration

Last synced: 16 May 2025

https://github.com/rfsantacruz/mycudasamples

This is a series of CUDA C++ programming samples developed to study CUDA technology and its parallel programming model.

cpp cuda gpgpu

Last synced: 13 Apr 2025

https://github.com/enfiskutensykkel/cuda-rdma-bench

NVIDIA GPU direct RDMA using SISCI API

cuda dma gpudirect-rdma pcie rdma sisci

Last synced: 30 Mar 2025

https://github.com/ergus/gpukalmanfilter

Kalman Filter test code using C, C++, Cuda and OpenCL.

cpp cuda gpgpu kalman-filter makefile opencl performance vectorization

Last synced: 28 Oct 2025

https://github.com/amirhoseinmasoumi/onnx-cuda-inference

A C++ project for running CUDA-accelerated ONNX model inference, using ONNX Runtime and OpenCV for image segmentation tasks.

cpp cuda inference onnxruntime onnxruntime-gpu opencv segmentation

Last synced: 12 Apr 2025

https://github.com/matthewhaynesonline/ai-server-setup

Setup AWS EC2 instance from scratch with NVIDIA CUDA, Docker, Packer for AI / ML.

ai ami aws cuda devops docker ml mlops packer

Last synced: 12 Apr 2025

https://github.com/bokutotu/curs

cuda&cublas&cudnn wrapper for Rust

cuda deep-learning high-performance-computing hpc rust

Last synced: 20 May 2026

https://github.com/pkestene/cuda_mpi_autotools_proj_template

A template project for CUDA+MPI with autotools build system

automake autotools cuda cuda-mpi mpi

Last synced: 25 Oct 2025

https://github.com/appsolves/lanepilot

The worlds first real-time AI-powered traffic management system, featuring automated vehicle detection, lane allocation optimization, and dynamic control for (autonomous) cars!

ai ai-traffic-management autonomous-driving computer-vision cuda edge-computing embedded-systems jetson-orin-nano-super lane-detection pytorch

Last synced: 29 Apr 2026

https://github.com/statikfintechllc/godcore

All-in-one local AI stack for Mistral-13B and Llama.cpp, with one-step CUDA wheel install, OpenAI-compatible API, and modern web dashboard. Switch between local and cloud chat, run on your own GPU, and deploy instantly—no API keys or paywalls. Designed for easy install, custom builds, and fast remote access. Enjoy!

ai chatbot chatgpt cuda dashboard fastapi llama-cpp llm local-ai mistral openai-compatible react selfhosted webui

Last synced: 25 Jun 2025

https://github.com/fynv/curandrtc

CURandRTC is a GPU random number generation module based on ThrustRTC.

cuda nvrtc random-number-generators thrust

Last synced: 05 May 2025

https://github.com/coderonion/moblas

BLAS (Basic Linear Algebra Subprograms) library written in mojo programming language.

blas blis cublas cuda eigen fortran gemm gonum hpc lapack linear-algebra math mkl mojo numpy openblas pytorch scientific-computing simd tensor

Last synced: 15 Jun 2025

https://github.com/vorticity-inc/vtensor

VTensor, a C++ library, facilitates tensor manipulation on GPUs, emulating the python-numpy style for ease of use. It leverages RMM (RAPIDS Memory Manager) for efficient device memory management. It also supports xtensor for host memory operations.

cublas cuda curand cusolver gpu numpy rmm tensor xarray xtensor

Last synced: 14 Apr 2025

https://github.com/aespinosadev/opengl-renderer

OpenGL renderer showcasing all basic functionality to render 3D scenes.

computer-graphics cuda gpgpu graphics-engine graphics-programming opengl rendering rendering-3d-graphics shaders video-game

Last synced: 24 Jul 2025

https://github.com/willigarneau/astar-pathfinding

🗺📌 Implementation of the A* pathfinding algorithm with OpenCV and Cuda in C++ 💪

a-star algorithm axis-camera cuda detection implementation opencv pathfinding

Last synced: 14 Jul 2025

https://github.com/tk-yoshimura/tensorshader

Deep Learning .NET library, For Regression.

complex cuda deep-learning dotnet6 gpgpu net6 quaternion

Last synced: 15 Oct 2025

https://github.com/mark1626/road-to-plus-plus

This repo is a list of experiments that I tried out to learn C++ and HPC

cpp cpp11 cuda hpc openmp simd

Last synced: 24 Apr 2026

https://github.com/codingonion/moblas

BLAS (Basic Linear Algebra Subprograms) library written in mojo programming language.

blas blis cublas cuda eigen fortran gemm gonum hpc lapack linear-algebra math mkl mojo numpy openblas pytorch scientific-computing simd tensor

Last synced: 25 Apr 2025

https://github.com/pnnl/cuvite

Multi-GPU Graph Community Detection using CUDA

community-detection cuda graph-clustering mpi

Last synced: 25 Jul 2025

https://github.com/mre/cudampi

Large hybrid CPU/GPU sorting network using CUDA and MPI

algorithms bucket bucketsort cuda filesystem gpu hybrid-cpu mpi parallel sorting-network

Last synced: 18 Apr 2026

https://github.com/648trindade/sbac-pad-marathon-problems

Repository containing problems of the SBAC-PAD Marathon of Parallel Programming and some parallel solutions to them.

cuda high-performance-computing mpi openmp parallel-computing

Last synced: 01 May 2025