An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/xiangronglin/grayscale-conversion

grayscale conversion optimized with OpenMP, SIMD and CUDA

cuda grayscale hpc openmp simd

Last synced: 23 Mar 2025

https://github.com/bhattbhavesh91/cudf-rapids-demo

A simple demo of cuDF which is a RAPIDS GPU-Accelerated Dataframe Library!

arrow cuda cudf demo gpu gpu-dataframe pandas python rapids

Last synced: 17 Apr 2025

https://github.com/aresio/cupsoda

cupSODA is CUDA-powered coarse-grain deterministic simulator of mass-action kinetics models

biochemical cuda gpu-computing mass-action simulation

Last synced: 21 Feb 2026

https://github.com/marcogarlet/cuda_cubeattack

CUDA implementation of Cube Attack

cryptography cubeattack cuda

Last synced: 28 Oct 2025

https://github.com/lanl/stcuda

StCUDA allows Smalltalk to call CUDA Driver APIs to do GPU computing

cuda smalltalk visualworks

Last synced: 12 Apr 2025

https://github.com/abus-aikorea/aria-coversong

The best gradio web-ui for creating cover song that uses mdx-net and rvc. Easy one click installation. Fully portable.

cuda demucs gradio karaoke mdx-net nvidia python pytorch rvc song-covers uvr vocal-remover voice-conversion

Last synced: 25 Apr 2025

https://github.com/raymondcm/blockmatching

CPU and CUDA implementation of Full Exhaustive Block Matching Algorithm using Integral Images

block-matching-algorithm cuda integral-image parallel vision

Last synced: 27 Apr 2025

https://github.com/giovaneiwamoto/cuda-shortest-paths

🧩 Cuda Shortest Paths - Parallel Dijkstra and Floyd algorithms using Nvidia CUDA to calculate All-Pairs Shortest Path (APSP) in a given graph represented by its adjacency matrix.

all-pairs-shortest-path cuda nvidia

Last synced: 29 Apr 2025

https://github.com/bencardoen/singularity_slurm_cuda

Example on how to get started with Singularity and CUDA on a SLURM cluster

cuda nvidia singularity-container slurm-cluster tensorflow

Last synced: 15 Oct 2025

https://github.com/pliablepixels/simpleyolo

A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

cuda darknet opencv python3 yolov3

Last synced: 26 Oct 2025

https://github.com/jonasricker/autocvd

Tool to automatically set CUDA_VISIBLE_DEVICES based on GPU utilization. Usable from command line and code.

cuda cuda-visible-devices gpu keras machine-learning nvidia python pytorch tensorflow

Last synced: 26 Feb 2026

https://github.com/nyo16/llama_cpp_ex

Elixir bindings for llama.cpp — run LLMs locally with Metal, CUDA, Vulkan, or CPU. Streaming, chat templates, embeddings, structured output, and concurrent batched inference.

cuda elixir llamacpp llm

Last synced: 04 Jun 2026

https://github.com/pkestene/tsp

traveling salesman problem solved with different programing models

cea cpp cuda kokkos nvidia-gpu openacc openmp performance-portability stdpar sycl

Last synced: 19 Aug 2025

https://github.com/anantzoid/cuda-genetic-algorithm-travelling-salesman-problem

Implementation of Parallel Genetic Algorithm in CUDA to solve TSP (Berlin52)

c cuda genetic-algorithm tsp tsp-solver

Last synced: 25 Jul 2025

https://github.com/harrydobbs/torch_ransac3d

A high-performance implementation of 3D RANSAC (Random Sample Consensus) algorithm using PyTorch and CUDA.

3d cloud cubiod cuda cylinder plane plane-detection point point-cloud ransac segmentation

Last synced: 03 Oct 2025

https://github.com/jmuwrobotics/libbicos

GPU-Accelerated Binary Correspondence Search for Multishot Stereo Vision

computer-vision cuda depth-map stereo-camera stereo-matching stereo-vision

Last synced: 14 Oct 2025

https://github.com/pinto0309/realsense-cuda-opengl-docker

RealSense execution environment built on a Docker container on Ubuntu 20.04. NIVIDA GPU and OpenGL capable. CUADA 11.4.

cuda docker opengl realsense realsense2 ubuntu wsl2

Last synced: 24 Mar 2025

https://github.com/hbseong97/tf-c-api

Using tensorflow c api, c++ api, tf lite, tf js, model conversion in Windows

bazel checkpoint cuda cudnn tensorflow

Last synced: 09 Apr 2025

https://github.com/skizzy-create/ayurvedic_his

🩺 A personalized app that serves as your personal Ayurvedic assistant, providing tailored advice and guidance based on Ayurvedic principles. 🩺

cuda gpt python pytorch transformers

Last synced: 04 Oct 2025

https://github.com/pyhf/cuda-images

pyhf Docker images built on Nvidia Container Toolkit enabled base images

cuda jax nvidia nvidia-cuda nvidia-docker pyhf

Last synced: 15 Jul 2025

https://github.com/pietroglyph/argustag

A C++17 wrapper for Nvidia Argus with support for zero-copy frame transfers to CUDA kernels and CUDA-accelerated AprilTag detection with ISAAC (no ROS required).

apriltag argus cuda isaac nvargus

Last synced: 04 Oct 2025

https://github.com/jpuigcerver/nnutils

CPU & CUDA implementation of several neural network utils

cuda deep-learning neural-networks openmp pytorch

Last synced: 11 Apr 2025

https://github.com/lukoshkin/hpc

Skoltech HPC course

cuda curand hpc mpi omp

Last synced: 10 Apr 2025

https://github.com/fatlipp/cuda-tree

CUDA-based Tree builder

algorithms cpp cuda octree quadtree tree

Last synced: 19 Jun 2025

https://github.com/belval/raytracing

Using CUDA to implement "Raytracing in one weekend" by Peter Shirley

cuda raytracing raytracing-in-one-weekend

Last synced: 12 Apr 2025

https://github.com/wzqvip/jetson-pytorch-builder

build PyTorch with CUDA for Jetson Orin and Thor.

cuda jetson pytorch

Last synced: 01 Dec 2025

https://github.com/almirneeto99/leetgpu-challenges

This repository contains the solution for LeetGPU Challenges

cpp cuda gpu hpc

Last synced: 18 Apr 2026

https://github.com/nikhilrout/thegemmcoreproject

SystemVerilog Implementation of Nvidia's CUDA/Tensor Core GEMM Operations

cuda floating-point gemm gpgpu hybrid-precision-training sparse-matrix systolic-array tensorcore tpu

Last synced: 17 Aug 2025

https://github.com/sbaldu/neural_network_hep

Implementation of a neural network framework from scratch in C++ applied to particle physics

cpp cuda high-energy-physics neural-networks

Last synced: 20 Jul 2025

https://github.com/eunomia-bpf/basic-cuda-tutorial

A collection of CUDA programming examples to learn GPU programming

cuda tutorial

Last synced: 15 Jun 2025

https://github.com/neural-bits/ai-programming-hub

Learn and experiment with new techniques and programming languages with a focus on ML

cpp cuda cython openai-triton python rust

Last synced: 12 Apr 2025

https://github.com/neoblizz/cudagl

CUDA based Graphics Library for NVIDIA's GPUs.

cuda graphics-library graphics-programming opengl

Last synced: 18 Jun 2025

https://github.com/rapidsai/cugraph-docs

cuGraph Docs - RAPIDS Graph Analytics Documentation

cuda cugraph documentation graph rapids

Last synced: 12 Sep 2025

https://github.com/rocm/numba-hip

HIP backend patch for Numba, the NumPy aware dynamic Python compiler using LLVM.

ai compiler cuda gpu hip hpc jit ml numba python radeon-instinct-mi-series rocm

Last synced: 31 Aug 2025

https://github.com/andydevs/cudafractal

Fractal Generator using Nvidia's CUDA framework

cplusplus cuda nvidia

Last synced: 23 Apr 2025

https://github.com/antoniopelusi/lu-solver

Assignments for High Performace Computing exam at Unimore, Modena, IT.

cuda lu-decomposition openmp

Last synced: 27 Feb 2026

https://github.com/cascadingradium/cuda-hungarian-clustering

A GPU-Accelerated Clustering Algorithm that uses the Hungarian method

clustering cpp cuda gpu hungarian-algorithm parallel-computing

Last synced: 16 May 2025

https://github.com/rocm/hipmm

HIP Memory Manager (ROCm-DS)

amd cuda gpu hip memory-management radeon-instinct-mi-series rocm

Last synced: 12 Apr 2025

https://github.com/pratikvn/schwarz-lib

Repository for testing asynchronous schwarz methods.

asynchronous cuda domain-decomposition ginkgo schwarz

Last synced: 14 Apr 2025

https://github.com/ROCm/hipMM

HIP Memory Manager (ROCm-DS)

amd cuda gpu hip memory-management radeon-instinct-mi-series rocm

Last synced: 12 Apr 2025

https://github.com/abaksy/cuda-examples

A repository of examples coded in CUDA C/C++

cuda

Last synced: 31 May 2026

https://github.com/tudasc/cusan

A data race detector for CUDA C and C++ based on ThreadSanitizer

c cpp cuda datarace threadsanitizer

Last synced: 12 Aug 2025

https://github.com/rogerallen/smandelbrotr

SDL2 CUDA OpenGL Mandelbrot explorer.

cuda mandelbrot-viewer opengl sdl2

Last synced: 08 Mar 2026

https://github.com/taeguk/dist-prog-assignment

Sogang Univ. Distributed Programming (CSE5414) Assignments.

assignment cuda distributed mpi-library openmp parallel pthreads sogang

Last synced: 13 Jun 2025

https://github.com/k-hengzhou/hphoto

一个基于AI的智能照片管理工具,支持人脸识别、相似人脸自动聚类和nsfw检测

cuda insightface nsfw nsfw-detection nudenet photos

Last synced: 26 Feb 2025

https://github.com/axnjr/snn_be_pro

A state of the art AI framework for no/low-code (visually - drag & drop) building, testing, deploying, integrating latest deep learning models with privacy & security compliance using ollama, as a final year project!

ai cplusplus cpp cuda deep-neural-networks kernel-driver ml mlops python

Last synced: 06 Oct 2025

https://github.com/gapi505/sparky-2

This is a discord bot running on llama cpp with the llama 3 model and image geneartion

ai cuda llama3 llamacpp stable-diffusion torch transformers

Last synced: 07 Oct 2025

https://github.com/coderonion/cuda-beginner-course-python-version

bilibili视频【CUDA 12.x 并行编程入门(Python版)】配套代码

cpp cublas cuda cuda-programming cudnn cupy gpu gpu-programming nvcc nvidia parallel-programming python rust

Last synced: 19 Oct 2025

https://github.com/lanzani/opencv-cuda-docker

Docker with opencv with cuda support.

cuda docker nvidia-docker nvidia-gpu opencv opencv-cuda opencv-dnn

Last synced: 12 Oct 2025

https://github.com/timothystewart6/vllm-gb10

Bleeding edge vLLM Docker image for the NVIDIA DGX Spark (GB10 / sm_121a).

arm64 cuda dgx-spark docker gb10 inference llm nvidia pytorch vllm

Last synced: 26 Jun 2026

https://github.com/hope2333/tsac-ng

神经音频编解码器 — Multi-backend neural audio codec. CPU (AVX/AVX2/AVX-512, NEON/SVE, RVV), GPU (CUDA, HIP/ROCm, Vulkan), LLVM JIT. Clean-room implementation.

arm64 audio-codec avx c cuda dac hip llvm-jit neural-audio riscv simd vulkan

Last synced: 29 Jun 2026

https://github.com/donpablonows/coin

🪙 Crypto Optimization Interface Network (aka COIN) is a high-performance Bitcoin address generator using CUDA acceleration and multi-threading. It optimizes GPU and CPU resources for fast address generation, ensures secure private key creation, and includes real-time monitoring and automatic system optimizations.

bitcoin blockchain cryptography cuda gpu-acceleration

Last synced: 07 May 2026

https://github.com/coderonion/moblas

BLAS (Basic Linear Algebra Subprograms) library written in mojo programming language.

blas blis cublas cuda eigen fortran gemm gonum hpc lapack linear-algebra math mkl mojo numpy openblas pytorch scientific-computing simd tensor

Last synced: 15 Jun 2025

https://github.com/brosnanyuen/raybnn_raytrace

Ray tracing library using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI

arrayfire cuda gpu gpu-computing opencl parallel parallel-computing ray ray-tracing raybnn raylib raytracer raytracing rust

Last synced: 26 Aug 2025

https://github.com/statikfintechllc/godcore

All-in-one local AI stack for Mistral-13B and Llama.cpp, with one-step CUDA wheel install, OpenAI-compatible API, and modern web dashboard. Switch between local and cloud chat, run on your own GPU, and deploy instantly—no API keys or paywalls. Designed for easy install, custom builds, and fast remote access. Enjoy!

ai chatbot chatgpt cuda dashboard fastapi llama-cpp llm local-ai mistral openai-compatible react selfhosted webui

Last synced: 25 Jun 2025

https://github.com/jaxony/pynvidia

⚙️ NVIDIA GPU utilities for Python 🔧

cuda deep-learning nvidia-gpu pip python utility

Last synced: 07 May 2025

https://github.com/rfsantacruz/mycudasamples

This is a series of CUDA C++ programming samples developed to study CUDA technology and its parallel programming model.

cpp cuda gpgpu

Last synced: 13 Apr 2025

https://github.com/jackeylea/cuda_linux

linux下cuda/qt教程

cpp cuda cudnn qt5

Last synced: 26 Jul 2025

https://github.com/gpuengineering/gputils

A C++ header-only library for parallel linear algebra on GPUs (CUDA/cuBLAS under the hood)

cplusplus-17 cplusplus-20 cpp cuda cuda-c cuda-cpp cuda-programming header-only linear-algebra

Last synced: 13 Aug 2025

https://github.com/amirhoseinmasoumi/onnx-cuda-inference

A C++ project for running CUDA-accelerated ONNX model inference, using ONNX Runtime and OpenCV for image segmentation tasks.

cpp cuda inference onnxruntime onnxruntime-gpu opencv segmentation

Last synced: 12 Apr 2025

https://github.com/matthewhaynesonline/ai-server-setup

Setup AWS EC2 instance from scratch with NVIDIA CUDA, Docker, Packer for AI / ML.

ai ami aws cuda devops docker ml mlops packer

Last synced: 12 Apr 2025

https://github.com/pnnl/cuvite

Multi-GPU Graph Community Detection using CUDA

community-detection cuda graph-clustering mpi

Last synced: 25 Jul 2025

https://github.com/vorticity-inc/vtensor

VTensor, a C++ library, facilitates tensor manipulation on GPUs, emulating the python-numpy style for ease of use. It leverages RMM (RAPIDS Memory Manager) for efficient device memory management. It also supports xtensor for host memory operations.

cublas cuda curand cusolver gpu numpy rmm tensor xarray xtensor

Last synced: 14 Apr 2025

https://github.com/bokutotu/curs

cuda&cublas&cudnn wrapper for Rust

cuda deep-learning high-performance-computing hpc rust

Last synced: 20 May 2026

https://github.com/BrosnanYuen/RayBNN_Raytrace

Ray tracing library using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI

arrayfire cuda gpu gpu-computing opencl parallel parallel-computing ray ray-tracing raybnn raylib raytracer raytracing rust

Last synced: 04 Apr 2025

https://github.com/caps-umu/fideslib

A server-side CKKS GPU library fully interoperable with OpenFHE.

ckks cuda gpu homomorphic-encryption openfhe

Last synced: 08 Oct 2025

https://github.com/arminms/p2rng

A modern header-only C++ library for parallel algorithmic (pseudo) random number generation supporting OpenMP, CUDA, ROCm and oneAPI

cpp cuda cxx gpu header-only library linux macos multiplatorm oneapi openmp parallel pcg-random prng pseudorandom-number-generator random-number-distributions random-number-generation rocm stl-algorithms windows

Last synced: 04 Apr 2025

https://github.com/sean-bradley/cudalookupsha256

SHA256 Lookup using parallel processing on a NVidia CUDA Compatible Graphics card

cuda parallel-processing sha256

Last synced: 05 May 2025

https://github.com/sean-bradley/cudalookupripemd60

RipeMD160 Lookup using parallel processing on NVidia CUDA Graphics card

cuda parallel-processing ripemd160

Last synced: 05 May 2025

https://github.com/aespinosadev/opengl-renderer

OpenGL renderer showcasing all basic functionality to render 3D scenes.

computer-graphics cuda gpgpu graphics-engine graphics-programming opengl rendering rendering-3d-graphics shaders video-game

Last synced: 24 Jul 2025

https://github.com/tk-yoshimura/tensorshader

Deep Learning .NET library, For Regression.

complex cuda deep-learning dotnet6 gpgpu net6 quaternion

Last synced: 15 Oct 2025

https://github.com/zeloe/rtconvolver

A realtime convolution VST3

c convolution cplusplus cuda juce

Last synced: 22 Apr 2025

https://github.com/webis-de/pytorch-window-matmul

a custom CUDA kernel for windowed matrix multiplication

cuda cuda-kernel pytorch

Last synced: 31 Oct 2025

https://github.com/mre/cudampi

Large hybrid CPU/GPU sorting network using CUDA and MPI

algorithms bucket bucketsort cuda filesystem gpu hybrid-cpu mpi parallel sorting-network

Last synced: 18 Apr 2026

https://github.com/pkestene/cuda_mpi_autotools_proj_template

A template project for CUDA+MPI with autotools build system

automake autotools cuda cuda-mpi mpi

Last synced: 25 Oct 2025

https://github.com/cloudmercato/python-fpb

Python Floating Point Benchmark

benchmark cuda floating-point numpy pandas python

Last synced: 19 Apr 2026

https://github.com/ivanrs297/pycuda-covariance-matrix

A PyCUDA covariance matrix parallel implementation

covariance-matrix cuda pycuda

Last synced: 25 Oct 2025

https://github.com/ellite/anchor-sub-sync

Anchor: A universal, hardware-accelerated CLI tool for subtitle synchronization (Whisper) and context-aware translation (NLLB)

ai audio-transcription automation cli cuda nllb python pytorch srt subtitle-sync subtitle-translation subtitles synchronization translation whisper

Last synced: 24 Feb 2026

https://github.com/rupeshs/anomalydetection

Anomaly Detection Using Anomalib and OpenVINO – Step by Step by Guide

anomalib anomaly anomalydetection computer-vision cpu cuda gpu intel onnx opencv pytorch

Last synced: 13 Apr 2025

https://github.com/ergus/gpukalmanfilter

Kalman Filter test code using C, C++, Cuda and OpenCL.

cpp cuda gpgpu kalman-filter makefile opencl performance vectorization

Last synced: 28 Oct 2025

https://github.com/cascadingradium/air-traffic-distribution

A GPU-Accelerated Multi-Objective Genetic Algorithm for Air Traffic Management

air-traffic-control air-traffic-management c cuda genetic-algorithm gpu-acceleration

Last synced: 16 May 2025

https://github.com/648trindade/sbac-pad-marathon-problems

Repository containing problems of the SBAC-PAD Marathon of Parallel Programming and some parallel solutions to them.

cuda high-performance-computing mpi openmp parallel-computing

Last synced: 01 May 2025

https://github.com/official-imvoiid/portable-miniconda-setup-for-window

Portable Miniconda Setup for Windows 🐍 Easily create a portable Conda environment with automated scripts for flexible Python version management and CUDA support. 🚀

conda conda-environment cuda datascience machinelearning nvidia nvidia-cuda portable python

Last synced: 16 Apr 2026

https://github.com/hyeonsangjeon/pdf2llm-tuning-studio

PDF 문서에서 GPU 가속 처리로 고품질 질의응답(QA) 데이터를 자동 생성하고 LLM을 효율적으로 파인튜닝하는 솔루션입니다. Unstructured 라이브러리와 AWS Bedrock Claude로 도메인 특화 QA 쌍을 생성하고, LoRA 기법으로 경량 모델을 훈련합니다.

aws bedrock claude cuda data-argumantation data-extraction distillation docker finetuning gpu llm pdf-generation pdf-text-extraction processing processing-job sagemaker text-disti unsloth unstructured

Last synced: 15 Jun 2025

https://github.com/pmeier/tox-ltt

Install PyTorch distributions with light-the-torch

cuda install light-the-torch pip plugin pytorch tox

Last synced: 25 Aug 2025