An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/lord-turmoil/cudacmakedemo

A demo for building CUDA program with CMake

cuda tutorial

Last synced: 16 Mar 2025

https://github.com/delusionary/histoptimizer

Solves a minimum variance cost of the partition problem.

cuda numba python

Last synced: 14 Jan 2026

https://github.com/dgcnz/nvtx-vscode

Create NVIDIA NVTX ranges directly in VS Code, then profile with Nsight Systems without modifying source code.

cuda nvtx pytorch vscode

Last synced: 13 Apr 2026

https://github.com/ran-2012/cuda-practice

cuda practice code for nvidia programming guide

cuda

Last synced: 27 Feb 2025

https://github.com/avicted/hip_fm_synthesis

This project demonstrates FM Synthesis (Frequency Modulation) using HIP (Heterogeneous Compute Interface), enabling high-performance sound generation on both AMD and NVIDIA GPUs.

amd audio-processing cuda fm-synthesis hip nvidia rocm

Last synced: 16 Mar 2025

https://github.com/nel-s/vein-cracker

Recovers which internal generator states could have generated a provided set of Minecraft Java b1.6-1.12.2 veins. Those can then be used to recover 3/4ths of any worldseeds that could have generated them.

cuda minecraft seedcracking veins

Last synced: 16 Mar 2025

https://github.com/maltsev-andrey/cuda-nn-inference

GPU-accelerated neural network inference using custom CUDA kernels. Achieves 97.82% accuracy on MNIST.

cuda deep-learning gpu-programming neural-networks numba nvidia parallel-computing parallel-programming performance-optimization python3 pytorch rhel9 tesla-p100

Last synced: 07 Mar 2026

https://github.com/gama1903/cuda_programming

Practice of cuda programming

cuda parallel-computing

Last synced: 01 Nov 2025

https://github.com/cripterhack/business-address-scrapper

Python+Scrapy - Distributed scraping system with cache for business information extraction.

cuda ollama postgresql python redis scraper scraping scrapy tesseract

Last synced: 14 Jun 2025

https://github.com/andreasholt/cuda-matmul-benchmarking

Implementing and benchmarking various matmul implementations in CUDA

cuda matrix-multiplication

Last synced: 01 Nov 2025

https://github.com/nxoti1/points-reader-ocr

🖥️ Extract text from images easily with POINTS-Reader OCR, a high-accuracy application for seamless document conversion and processing.

cuda gradio huggingface-transformers ocr open-source points-reader reportlab spaces tencent vision-language-model vlm

Last synced: 20 May 2026

https://github.com/rainlumostaipei/cuda-qnet-a2c

Qnet and A2C impl in cuda

a2c cuda qnet

Last synced: 26 Jun 2025

https://github.com/ludekcizinsky/fast-cg-solver

Implementation of Conjugate Gradient (CG) algorithm for solving sparse linear systems using MPI and CUDA.

conjugate-gradient cuda mpi

Last synced: 17 May 2026

https://github.com/myselfaryan/attention-mechanism

Accelerating Scaled Dot-Product Attention using OpenMP and CUDA

cuda openmp

Last synced: 27 Apr 2026

https://github.com/juliankarrer/reyn

CUDA-based Implementation of Smoothed Particle Hydrodynamics for Fluid Simulation

cuda fluid lagrangian simulation sph

Last synced: 31 Oct 2025

https://github.com/rugleb/cuda

A simple example of a program that uses parallel GPU computing on an NVIDIA graphics card using CUDA technology

cuda gpu nvidia

Last synced: 10 Apr 2025

https://github.com/nabilshadman/cuda-4-dummies

Lecture slides and exercise files of the CUDA 4 Dummies course (2025)

cuda gpu-computing high-performance-computing nsight-systems nvidia-gpu parallel-computing

Last synced: 31 Oct 2025

https://github.com/flosmume/cpp-cuda-streams-and-pinned-mem

A CUDA C++ demo showing how to overlap data transfer and kernel execution using multiple streams and pinned (page-locked) host memory. This project illustrates asynchronous memcpy, event timing, and performance benefits of concurrent GPU execution — essential for building high-throughput pipelines.

asynchronous-execution cuda cuda-streams gpu parallel-programming performance-optimization pinned-memory

Last synced: 13 May 2026

https://github.com/uva-trasgo/controllers

Read-only mirror of the official repository: https://gitlab.com/trasgo-group-valladolid/controllers. Controllers is a library written in C11 that provides a simplified way to program applications that can exploit heterogeneous computational platforms including accelerators and/or multi-core CPUs.

cuda heterogeneous-computing heterogeneous-parallel-programming hip opencl openmp

Last synced: 12 May 2026

https://github.com/zalo/matmul_cuda

A simple learning example for CUDA

cuda

Last synced: 07 Jul 2025

https://github.com/mahdi-hasan-shuvo/ml-opensource-project

is an open source repository focused on providing practical and educational machine learning resources. The project aims to make learning and applying machine learning more accessible through well-documented code, tutorials, and real-world examples.

cuda machine-learning machine-learning-algorithms ml-projects open-source python

Last synced: 19 May 2026

https://github.com/eastonman/tensorrt-pytorch-wrapper

A wrapper makes TensorRT engine accept PyTorch Cuda Tensor.

cuda pytorch tensorrt

Last synced: 06 May 2026

https://github.com/naetherm/derelictcublas

Dynamic bindings to the CuBLAS library for the D Programming Language.

cublas cuda d derelict dlang

Last synced: 25 Jun 2026

https://github.com/ramyacp14/document-based-question-and-answers

Developed a document question answering system that utilizes Llama and LangChain for contextual and accurate answers. The system supports .txt documents, intelligent text splitting, and context-aware querying through an easy-to-use Streamlit interface.

chroma cuda hugging-face langchain llama python recursivecharactertextsplitter streamlit

Last synced: 07 Mar 2026

https://github.com/storterald/neural-network

Simple neural network implementation in C++ and CUDA

asm asmx86 c-plus-plus cmake cpp cuda machine-learning neural-network

Last synced: 28 Mar 2025

https://github.com/naidezhujimo/cuda-learning-just-record-the-learning-process-

just record the learning process,There are notes,Welcome to learn.

cuda

Last synced: 26 Mar 2025

https://github.com/amypad/miutil

Basic functionality needed for AMYPAD

cuda matlab medical-imaging python

Last synced: 13 May 2025

https://github.com/ivanfioravanti/tflops_mps

TFLOPs testing on MPS and CUDA

cuda mps tflops

Last synced: 19 May 2026

https://github.com/grindelfp/cuda-n-body-simulation

Simulation of N-Body movement using CUDA.

cuda n-body-simulation

Last synced: 06 Apr 2025

https://github.com/drilonaliu/parallel-fractal-tree

GPU-accelerated fractal tree generation with CUDA and OpenGL interoperability.

cuda fractal-tree fractals gpu

Last synced: 19 May 2026

https://github.com/patriciobcs/mini-aevol

Parallel implementation of a reduced version of the Aevol simulator

aevol cuda simulation

Last synced: 19 May 2026

https://github.com/muneeb706/cuda

sample programs implemented using cuda (gpu)

cplusplus cuda gpu-programming

Last synced: 19 May 2026

https://github.com/andresvalle/ocr-extraction

Text extraction from images using EasyOCR and parallelization with PyTorch

cuda ocr pytorch

Last synced: 01 May 2026

https://github.com/chiragajain/gpu-optimization-roadmap

This repository is part of a structured curriculum designed to master GPU optimization, Triton, Deep Learning, and LLMs. This section focuses on GPU fundamentals, CUDA programming, and PyTorch optimizations.

cuda deeplearning gpu-acceleration learning python pytorch triton

Last synced: 18 Feb 2026

https://github.com/sevilze/folderesque

Python Script to process and upscale images in specified folders using RRDB models.

cuda esrgan scripts upscaler

Last synced: 02 Mar 2026

https://github.com/kar-dim/CAS-2D

Implementation of the AMD FidelityFX CAS (Contrast Adaptive Sharpening) algorithm on CUDA, for sharpening static images.

cpp cuda dll fidelityfx gpu image-processing parallel-computing sharpen

Last synced: 01 Nov 2025

https://github.com/kenwuqianghao/c4ai-cuda-birds

Homework assignments for C4AI Beginners in Research-Driven Studies

cuda machine-learning pytorch

Last synced: 18 Apr 2026

https://github.com/TheodoreAI/monte-carlo-simulator

CUDA application for Monte Carlo simulation is used to determine the range of outcomes for a series of parameters, each of which has a probability distribution showing how likely each option is to happen. This is using CUDA.

cuda gpu-computing monte-carlo-simulation parallel-computing

Last synced: 06 Oct 2025

https://github.com/mxm-tr/docker-darknet-opencv

Accelerated objects detection on streams and files, using a Docker darknet YOLO container

cuda docker docker-compose object-recognition opencv-python python3 yolo

Last synced: 10 Apr 2026

https://github.com/kirubhakaranm/vision-pipeline-cuda

High-performance camera processing pipeline with CUDA GPU acceleration, CPU multithreading, and real-time TCP/IP telemetry monitoring (1,200+ FPS, <1ms latency)

computer-vision cpp17 cuda edge-detection gpu-acceleration image-processing multithreading networking opencv performance-optimization real-time robotics tcp-ip telemetry

Last synced: 12 Apr 2026

https://github.com/hshshshshsh12e/gpumkat

Gpumkat is a shader debugger for metal which is designed to do what instruments can't do

alternative api control cuda darwin debugger debugging gpumkat macos management profiler release shaders threads

Last synced: 14 Apr 2026

https://github.com/sangioai/sph

CUDA and OpenMP versions of SPH (Smoothed Particle Hydrodynamics) serial algorithm.

cuda openmp

Last synced: 27 Apr 2026

https://github.com/kanttouchthis/cuda_schem

script for voxelization of 3d models to minecraft .schem schematics with texture support powered by numba cuda.

cuda minecraft numba voxelization

Last synced: 07 Oct 2025

https://github.com/lruizap/testcuda

Guide to install and use cuda for programming

cuda cudnn nvidia pytorch

Last synced: 12 May 2026

https://github.com/amitkumarj441/deep-learning-on-your-finger

A rich collection of dockerfiles for installing deep learning dependecies on your way :rocket:

cuda cudnn gcp

Last synced: 18 Apr 2026

https://github.com/debanjan06/spatial-streamio

An optimized, out-of-core asynchronous data streaming pipeline for high-throughput 3D point cloud training loops. Features low-level numpy.memmap zero-copy reads and multi-threaded ring prefetching to eliminate I/O bottlenecks, delivering a 33.33% throughput efficiency gain on PyTorch CUDA workloads.

asynchronous-programming cuda data-engineering deep-learning-pipelines io-optimization memory-mapping point-cloud pytorch

Last synced: 11 Jun 2026

https://github.com/matteopolak/stock-predict

Stock prediction with LSTM using TensorFlow and TypeScript.

ai artificial-intelligence cuda lstm machine-learning stock tensorflow typescript

Last synced: 09 May 2026

https://github.com/dreoporto/tensorflow-gpu-docker

An example project to run TensorFlow with CUDA-enabled GPU acceleration using Windows, Docker and WSL2.

artificial-intelligence cuda deep-learning docker docker-compose jupyter machine-learning nvidia-docker python windows wsl2

Last synced: 27 Jan 2026

https://github.com/xstupi00/N-Body-CUDA

PCG - Parallel Computations on GPU - Project - N-Body-CUDA

cuda gpu-acceleration gpu-computing nbody-simulation optimization parallel-computing pcg vut vut-fit

Last synced: 11 Mar 2025

https://github.com/marius311/cudadistributedtools.jl

A set of utility tools for multi-GPU + multi-process workflows

cuda distributed julia

Last synced: 01 May 2026

https://github.com/brendanm12345/simple_renderer_cs149

Simple CUDA renderer implementation. 19th most efficient out of 150+ submissions

cpp cuda

Last synced: 18 May 2026

https://github.com/rajshrestha86/kmeans-clusterize-cuda

Implementation of K-Means algorithm from scratch using CUDA.

c cuda kmeans-clustering

Last synced: 18 May 2026

https://github.com/artheioupfat/mini-gpt-wiki

Projet visant à créer un mini LLM entraîné sur des données Wikipédia et à interagir avec lui via une interface Streamlit.

cuda gpt language llm model mps nlp pytorch scraping streamlit transformer wikipedia

Last synced: 08 Oct 2025

https://github.com/amruthapatil/nyu-cudaconvolution

Implementing convolution operations on an image using CUDA, exploiting different methodologies - basic, tiled, and cuDNN

cuda high-performance

Last synced: 13 Mar 2025

https://github.com/thanduriel/cuda_hip_comparison

performance study of atomics on GPUs

atomics cuda hip

Last synced: 09 Oct 2025

https://github.com/jiriklepl/bits-knn-jpdc2024

Replication package for the paper Towards Optimal GPU-accelerated K-Nearest Neighbors Search

bitonic-sort cuda gpu k-nearest-neighbors knn-search top-k

Last synced: 21 Mar 2025

https://github.com/enesdoruk/opencv-cpp

Opencv CPP tutorials

computer-vision cpp cuda opencv

Last synced: 09 Oct 2025

https://github.com/edcalderin/huggingface_ragflow

This project implements a classic Retrieval-Augmented Generation (RAG) system using HuggingFace models with quantization techniques. The system processes PDF documents, extracts their content, and enables interactive question-answering through a Streamlit web application.

bitsandbytes cuda huggingface huggingface-embeddings langchain langchain-community large-language-models llm nf4 python qdrant quantization rag retrieval-augmented-generation ruff streamlit text-generation

Last synced: 15 Jul 2025