An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/ray-chew/modified_ch

Density functional theory (DFT) and self-consistent field theory (SCFT) simulation of diblock copolymers

cuda density-functional-theory diblock-copolymer numerical-analysis numerical-methods self-consistent-field-theory

Last synced: 11 May 2026

https://github.com/hr-fahim/transformer-model-optimization

Sample GPT Transformer Model from Scratch.

cuda few-shot-learning transfomers

Last synced: 02 May 2026

https://github.com/fedimser/aldyparen

Renders pictures and videos with algebraic fractals

cuda fractals graphics

Last synced: 29 Apr 2026

https://github.com/nikhilrout/thetensorcoreproject

Microarchitecture implementation of Nvidia's Tensor Cores

cuda floating-point gpgpu hybrid-precision-training tensorcore

Last synced: 01 Apr 2025

https://github.com/aurelienperez/gpu-heston-monte-carlo

GPU-accelerated Monte Carlo simulation for option pricing under the Heston model using CUDA.

cuda gpu heston-model

Last synced: 01 Apr 2025

https://github.com/drilonaliu/parallel-mandelbrot-set

GPU-accelerated Mandelbrot Set generation with CUDA and OpenGL interoperability.

cuda fractals gpu mandelbrot-fractal parallel-programming

Last synced: 12 Apr 2026

https://github.com/TeamBipartite/bipartite-gemm

High throughput data-parallel GEMM implementations in Cuda using Cuda cores and Tensor cores

cuda data-parallelism gemm

Last synced: 14 Jan 2026

https://github.com/sephiroth7712/k-nearest-neigbours

Implementation of K-Nearest Neighbors algorithm using multiple parallel computing approaches: CUDA (GPU), Hadoop, Spark, MPI, OpenMP, and PThreads. Demonstrates scalable machine learning across different parallel computing paradigms from GPU to distributed frameworks.

cuda cuda-programming hadoop-mapreduce java mpi multiprocessing multithreading openmp pthreads scala spark

Last synced: 12 Apr 2026

https://github.com/tomtolleson/cuda-kernel-benchmarking-tool

A benchmarking tool in C++ that creates Cuda kernels and tests the overall system performance between CPU and GPU

cuda cuda-kernels cuda-support cuda-toolkit nvidia nvidia-cuda nvidia-gpu

Last synced: 30 Mar 2025

https://github.com/githubfoam/cuda-travisci

cuda miniconda pytorch

cuda miniconda pytroch

Last synced: 30 Mar 2025

https://github.com/ysl1016/cudadigitfilter

CUDA-based parallel image filtering system for MNIST dataset

computer-vision cuda deep-learning gpu-acceleration image-processing mnist parallel-computing

Last synced: 28 Mar 2025

https://github.com/asadiahmad/100_sports_image_classification

A deep learning project for sport image classification using a custom VGG19-based architecture with integrated Grad-CAM heatmap visualization for model interpretability.

computer-vision cuda data-augmentation deep-learning explainable-ai gpu-acceleration grad-cam heatmap-visualization image-classification mixed-precision-training pytorch pytorch-grad-cam sports-analytics sports-classification transfer-learning vgg19

Last synced: 11 Jun 2025

https://github.com/xza85hrf/flux_pipeline

FluxPipeline is a prototype experimental project that provides a framework for working with the FLUX.1-schnell image generation model. This project is intended for educational and experimental purposes only.

ai cuda docker educational experimental flux1 flux1-schnell flux1ai gradio image-generation model non-commercial python pytorch research transformer-model

Last synced: 05 Jul 2025

https://github.com/doxakis/cosinesimilaritydistancesongpu

Compute cosine similarity distances for all combinations of the dataset on the gpu with CUDA

cuda

Last synced: 13 Apr 2026

https://github.com/kronbii/thermal-super-resolution

State-of-the-art thermal super-resolution system (IMDN) with RGB→thermal adaptation, custom multi-component loss, 29.6 dB PSNR, 0.713 SSIM, 250+ FPS, production-ready PyTorch + CUDA implementation.

computer-vision cuda deep-learning image-enhancement imdn model-optimization production-machine-learning pytorch real-time real-time-processing research super-resolution thermal-imaging

Last synced: 18 Apr 2026

https://github.com/voltr0x/raytracing-cuda

Raytracing in a weekend using CUDA

cpp11 cuda raytracing sdl2

Last synced: 01 Apr 2026

https://github.com/codename-detective/cuda_gpgpus_shared_memory_systems_pdp

CUDA GPGPUs Shared Memory Systems Parallel & Distributed Programming

cuda cuda-programming numa parallel-programming

Last synced: 30 Mar 2025

https://github.com/kataglyphis/machinelearningalgorithms

Basic Machine Learning Algorithms

cuda machine-learning python tensorflow

Last synced: 31 Mar 2025

https://github.com/Neuro-Mechatronics-Interfaces/python-intan

Tools and demos for working with EMG data from intan using python

circuitpython cuda emg pico python realtime tensorflow

Last synced: 13 Jan 2026

https://github.com/eyelor/text-to-image-item-generator

A Python workflow for generating random item images using models from Hugging Face.

ai conda cuda flux-schnell generator huggingface item llama python pytorch text-to-image

Last synced: 13 Apr 2026

https://github.com/sandialabs/tenzing

Core library for optimizing CUDA+MPI programs as sequential decision problems.

cuda mpi scr-2759 sequential-decision-problem

Last synced: 29 Apr 2026

https://github.com/shtrophic/wicuvanity

Generate wireguard vanity keys on your Nvidia GPU

cuda gpu vanity-address vanity-addresses vanitygen wireguard

Last synced: 10 Mar 2025

https://github.com/cuda8/brainwords2

GPU brainflayer for sale $250

brain brainflayer brainwords cuda gpu key pass passphrase private

Last synced: 10 Mar 2025

https://github.com/ionmich/cs149-local-dev

Provides `conda` installation instructions for Stanford's CS149 (Parallel Computing) programming assignments

conda cs149 cuda ispc parallel-computing

Last synced: 31 Mar 2025

https://github.com/ndgigliotti/torch-ipca

GPU-accelerated Incremental PCA for PyTorch

cuda dimensionality-reduction gpu incremental-pca machine-learning pca pytorch

Last synced: 26 Jan 2026

https://github.com/moesio-f/cla

C Linear Algebra (CLA) library. A simple toy library for basic vector/matrix operations with CUDA support and Python bindings.

c cuda linear-algebra python

Last synced: 09 May 2026

https://github.com/minseoc03/cuda-100-days

A 100-day journey to master CUDA programming, inspired by the CUDA-120-DAYS--CHALLENGE project. This repo contains daily CUDA exercises and code folders, with learning notes hosted on Notion. Practicing on leetgpu.com due to lack of local NVIDIA GPU.

100daysofcode cuda deeplearning gpgpu gpu hpc nvidia parallel-computing

Last synced: 19 Apr 2025

https://github.com/himeyama/cuda-convolve

convolve + cuda + ruby (1次元のみ対応)

cuda filter gem ruby

Last synced: 19 Apr 2026

https://github.com/mattjesc/federated-learning-simulation-1gpu-mi-is

Federated Learning Simulation on a Single GPU with Model Interpretability and Interactive Visualization

ai cuda deep-learning distributed-systems federated-learning gpu hpc keras machine-learning ml model-interpretability python pytorch simulation streamlit tensorflow

Last synced: 05 Jan 2026

https://github.com/Parxd/cuda-optim

various CUDA kernels optimized for specific ML algos

cuda machine-learning

Last synced: 02 Sep 2025

https://github.com/snandasena/cuda-at-scale-for-the-enterprise

Gauss Filter with CUDA and NPP

cpp cuda gpu nvidia

Last synced: 29 Apr 2026

https://github.com/efecaliskannn/pneumonia-detection-with-cnn--vgg16--and-resnet50-deep-learning-models

In this project, pneumonia detection using deep learning, a subset of artificial intelligence, is aimed. The performance of deep learning algorithms, including CNN, VGG16, and ResNet50 models, in detecting pneumonia has been examined.(Bu projede yapay zekanın alt kümesi olan derin öğrenme ile zatürre tespiti amaçlanmaktadır.)

artificial-intelligence convolutional-neural-networks cuda deep-learning keras-tensorflow nvidia-cuda pyhton transfer-learning

Last synced: 13 Jun 2025

https://github.com/cmazakas/cuda-stuff

A CUDA-based playground

cmake cuda delaunay-triangulation vscode

Last synced: 24 Mar 2025

https://github.com/akshaysinhaaa/emova

A deep learning framework designed for emotion and sentiment recognition using text, audio, and video modalities. This project leverages the MELD (Multimodal EmotionLines Dataset) to train a robust and flexible model that reflects human communication more accurately than unimodal models.

bert cnn cuda deep-learning multimodal python pytorch resnet-18 tensorboard transformers

Last synced: 05 May 2026

https://github.com/lucatedeschini/feedforwardnn

This project is my submission for the exam "Project Work in Architecture and Platform for Artificial Intelligence"

c cuda neural-networks openmp scratch-implementation

Last synced: 20 Apr 2026

https://github.com/h4ck3r-04/fpassword

Fpassword merges Hashcat's hash-cracking precision with Hydra's parallelized network login, offering penetration testers a powerful tool for swift hash deciphering and simultaneous login attempts across diverse protocols.

brute-force brute-force-attacks c cracking cuda gpgpu hashcat hashes hydra network-security opencl password penetration-testing

Last synced: 16 Jan 2026

https://github.com/neugence/acehub

AI Champions for Excellence: Fresh, informative courses and content designed to help developers, researchers, and leaders advance in the field of AI.

ai cuda cv ml mlops nlp pytorch rl rlhf tensorflow

Last synced: 05 Jan 2026

https://github.com/lk/gpu-nbody

GPU-accelerated n-body engine for t-SNE and physics simulation

cuda gpu n-body n-body-simulator

Last synced: 02 Sep 2025

https://github.com/bikrammajhi/100-days-of-gpu

This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA kernels, Triton spells, and PTX sorcery.

cuda nsight-compute ptx triton

Last synced: 18 Jun 2025

https://github.com/tchung1970/sd-cli-cuda

CUDA-accelerated Stable Diffusion plugin for wavespeed-desktop

cuda gpu linux nvidia stable-diffusion

Last synced: 09 May 2026

https://github.com/mrgkanev/tensorflow-gpu-docker-setup

A Docker environment for TensorFlow GPU development with optimized configurations for WSL2, troubleshooting guides, and common error fixes

cuda cuda-toolkit deep-learning dev-environment development-tools docker gpu-acceleration machine-learning nvidia-docker nvidia-docker-support python tensorflow

Last synced: 13 Apr 2026

https://github.com/neel-dandiwala/cuda-programs

Miscellaneous programs that grasp the concept of Parallel Computing

cuda gpu-programming parallel-programming

Last synced: 16 May 2025

https://github.com/sahil-rajwar-2004/vector-cuda

vector calculation with GPU acceleration using CUDA

c cpp11 cuda cuda-kernels cuda-programming nvcc

Last synced: 15 May 2025

https://github.com/hrshl212/custom-cuda-kernels-with-neural-network-implementation

The repository contains custom CUDA kernels for linear layer, softmax and relu which are integrated with python to develop a Neural Network

cuda neural-network python pytorch

Last synced: 08 May 2026

https://github.com/gammahazard/locate-anything

Sleek, mobile-friendly web UI for NVIDIA LocateAnything-3B — open-vocabulary object detection & grounding on your own GPU, via one docker compose up.

bounding-boxes computer-vision cuda docker fastapi gpu grounding locate-anything machine-learning nvidia object-detection ocr open-vocabulary-detection react self-hosted tailwindcss typescript vision-language-model web-ui

Last synced: 28 May 2026

https://github.com/parxd/cuda-optim

optimizing CUDA kernels

cuda machine-learning

Last synced: 26 Mar 2025

https://github.com/viktor-akusoff/chernabogpy

ChernabogPy is a Python package for visualizing gravitational distortions caused by black holes using nonlinear ray tracing.

cuda gpu physics-simulation python3 relativity-of-space-and-time torch

Last synced: 15 May 2026

https://github.com/vladd12/libexecstd

Modern C++ library for using an execution context of computer devices

cpp cpp17 cuda gpu-acceleration gpu-computing

Last synced: 06 May 2026

https://github.com/lord-turmoil/cudacmakedemo

A demo for building CUDA program with CMake

cuda tutorial

Last synced: 16 Mar 2025

https://github.com/delusionary/histoptimizer

Solves a minimum variance cost of the partition problem.

cuda numba python

Last synced: 14 Jan 2026

https://github.com/dgcnz/nvtx-vscode

Create NVIDIA NVTX ranges directly in VS Code, then profile with Nsight Systems without modifying source code.

cuda nvtx pytorch vscode

Last synced: 13 Apr 2026

https://github.com/ran-2012/cuda-practice

cuda practice code for nvidia programming guide

cuda

Last synced: 27 Feb 2025

https://github.com/BardiFarsi/ThreadPoolManager

ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.

cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe

Last synced: 15 May 2025

https://github.com/avicted/hip_fm_synthesis

This project demonstrates FM Synthesis (Frequency Modulation) using HIP (Heterogeneous Compute Interface), enabling high-performance sound generation on both AMD and NVIDIA GPUs.

amd audio-processing cuda fm-synthesis hip nvidia rocm

Last synced: 16 Mar 2025

https://github.com/jeremywildsmith/shadowhash-distributed

Elixir distributed Shadow File password cracker with GPU accelerated cracking for md5crypt hashing algorithm.

cracking-hash cracking-hashes cracking-password cuda distributed-systems elixir erlang hashing nx security

Last synced: 11 May 2026

https://github.com/fabulani/360ip-with-cuda

360° Image Processing with CUDA and OpenCV.

360-image 360-video cpp cuda image-processing opencv

Last synced: 11 May 2026

https://github.com/apws25/accelmoe

This repository is for CUDA kernel re-implementation of CPU-based MoE model.

cpp cuda mixture-of-experts

Last synced: 11 May 2026

https://github.com/daniilvorontsov/fourier-option-pricing

MSc thesis project concerned with option pricing for Levy Jump models. Package includes pricing implementations for European Call and Put options for Carr-Madan, COS and Fourier Time Stepping.

carr-madan cuda fourier-transform monte-carlo option-pricing

Last synced: 11 May 2026

https://github.com/theogravity/dual-rtx-6000-blackwell-gemma-4-31b-it-nvfp4

Optimized vLLM setup for Gemma 4 31B NVFP4 with MTP on dual RTX PRO 6000 Blackwell using vllm and docker: native FP4 Tensor Cores, Multi-Token Prediction (96.5% acceptance rate), and prefix caching. Includes benchmark results and replication scripts.

am5 amd blackwell cuda docker fp4 gemma gemma4 llm-inference multi-token-prediction nvfp4 prefix-caching rtx-6000 speculative-decoding tensor-parallel vllm

Last synced: 11 May 2026

https://github.com/realdougeubanks/unmanic.plugin.encoder_video_hevc_nvenc_gpu

Unmanic plugin: H.265/HEVC encoder using NVIDIA hevc_nvenc with a true end-to-end GPU pipeline. Fork of Josh5/unmanic.plugin.encoder_video_hevc_nvenc that adds -hwaccel_output_format cuda when NVDEC HW decoding is enabled, keeping decoded frames in GPU memory through NVENC. Drop-in replacement with sensible defaults and full settings parity.

cuda ffmpeg hardware-acceleration nvdec nvenc nvidia unmanic unmanic-plugin video-transcoding

Last synced: 12 May 2026

https://github.com/tomaszrewak/csgpathtracer

A constructive solid geometry path tracer.

computer-graphics cuda path-tracing rendering

Last synced: 12 May 2026

https://github.com/thesupercd/rainbow_table_builder

A high performance CUDA-based GPU accelerated Rainbow-Table maker, written in C++ without any external libraries or dependencies needed.

cpp cryptography cuda hash-table hashing parallel-processing rainbow-table sha3 sha3-512 uuid

Last synced: 12 May 2026

https://github.com/vishalanandv/small_scale_parallel_programming

The project describes the design and development of a sparse matrixvector product kernel, implemented using super computer.

clanguage cuda kernel

Last synced: 12 May 2026

https://github.com/brocbyte/cuball

CUDA-based implementation of "Real-Time Rigid Body Simulation on GPUs" [from GPU Gems 3]

cpp cuda

Last synced: 12 May 2026

https://github.com/aspragueumkc/hydra2dgpu

GPU-accelerated 2D shallow water equation solver for QGIS — CUDA finite-volume method with unstructured mesh support

cuda finite-volume-method gis gpu-computing hydraulic-modeling hydrodynamics qgis shallow-water-equations

Last synced: 11 Jun 2026

https://github.com/programmergnome/cuda-codes

Snippet repository for learning parallel GPU programming with CUDA.

c cpp-programming cuda cuda-kernel gpu-programming learning-materials parallel-programming parallelization

Last synced: 13 May 2026

https://github.com/rossbates/rummage

Rummage is a GPU accelerated npub miner for Nostr

cuda identity mining nostr

Last synced: 13 May 2026

https://github.com/nyxflower/mosaics-cuda-openmp

Simple image mosaic command line too (CUDA-OpenMP-C Implementation)

c cuda gpu-programming mosaic mosaic-images openmp parallel-computing parallel-processing

Last synced: 13 May 2026

https://github.com/gianmariaromano/pmc-translated-notes

The repository contains translated notes for the course "Programmazione di Sistemi Multicore" given by Professor De Sensi for the "Informatica" course at Sapienza Università di Roma.

cuda cuda-programming mpi multicore openmp parallel-computing parallel-programming pthreads

Last synced: 14 May 2026

https://github.com/gcol33/resolve

Neural network framework for species distribution modelling (PyTorch/C++/CUDA)

cpp cuda deep-learning ecology machine-learning neural-network pytorch species-distribution

Last synced: 12 Jun 2026

https://github.com/kaierikniermann/hpc-uzh-notes

These are some notes for the High Performance Computing course taught at UZH

cuda high-performance-computing mpi openacc openmp

Last synced: 13 Jun 2026

https://github.com/g023/cuda_inf

A self-contained CUDA inference engine for LiquidAI/LFM2.5-8B-A1B (hybrid conv + GQA-attention MoE, 8.5B params, 1B active) targeting a single RTX 3060 (12 GB). No Python, no frameworks at runtime: a single .cu engine + a header-only byte-level BPE tokenizer.

3060 ai c cpp cuda fast-inference gpu inference inference-engine large-language-models lfm25 liquidai llm moe nvidia open-source rtx token

Last synced: 15 Jun 2026

https://github.com/p4suta/mojiokoshi

Local audio transcription tool with real-time progress, powered by faster-whisper and CUDA

audio-transcription cuda docker fastapi faster-whisper gpu python self-hosted speech-to-text sveltekit transcription whisper

Last synced: 16 Jun 2026

https://github.com/hailiang-wang/cuda-get-started

Get started with CUDA

cuda machine-learning nvidia

Last synced: 17 Jun 2026

https://github.com/angchen0325/cuda-learn

Ang's CUDA-learn project

cuda gpu-computing

Last synced: 18 Jun 2026

https://github.com/rurumimic/cuda

compute unified device architecture

cuda deep-learning gpu nvidia

Last synced: 18 Jun 2026

https://github.com/acuoci/pbe-fixed-pivot-cuda

Fast CUDA implementation of aggregation and breakage terms in Population Balance Equations using the fixed pivot sectional method

aggregation breakage cuda fixed-pivot pbe

Last synced: 18 Jun 2026

https://github.com/farukalamai/jetson-yolo-cpp

Real-time object detection, segmentation and tracking on NVIDIA Jetson using YOLO + TensorRT in C++

cpp cuda jetson object-detection tensorrt yolo26

Last synced: 19 Jun 2026

https://github.com/aeyage/intraday-prices

gpu-accelerated portfolio optimisation

cuda cupy nvidia-gpu

Last synced: 19 Jun 2026

https://github.com/sbstndb/neural_k

A simple Neural Network library using Kokkos enabling CUDA or OpenMP backend

ai cuda kokkos library neural-network openmp

Last synced: 22 Jun 2026

https://github.com/sebsop/kmeans-thesis-segmentation

Real-time hybrid quantum-classical K-means segmentation using C++ and CUDA. Bachelor's Thesis at BBU bridging HPC and Quantum Machine Learning (QML).

cpp cuda hpc imgui kmeans opencv quantum-computing

Last synced: 23 Jun 2026

https://github.com/ironjr/minimal-cuda-pytorch

Repository-level snippet for minimal implementation of a PyTorch CUDA extension.

cuda minimal pytorch

Last synced: 04 May 2026

https://github.com/sebsop/realtime-parallel-kmeans-segmentation

Real-time C++ K-means image segmentation on live video streams, using OpenCV, RCC trees, and 5D features, optimized for consumer hardware with Sequential, Multi-threaded, MPI, and CUDA backends.

cpp cuda k-means-clustering mpi multithreading opencv rcc real-time-stream-processing

Last synced: 23 Jun 2026

https://github.com/rurumimic/candle

huggingface candle

cuda gpu huggingface nvidia transformer

Last synced: 05 May 2026

https://github.com/llm-db/understanding-gpu-architecture-implications-on-llm-serving-workloads

Understanding GPU Architecture Implications on LLM Serving Workloads (Master Thesis, ETH Zürich, 2024)

cuda inference pytorch rocm transformer

Last synced: 05 May 2026

https://github.com/kobinarth-panchalingam/parallel-and-concurrent-programming

Semester - 7 | CS4533 - Parallel and Concurrent Programming | Labs

c concurrent-programming cuda java openmp pthreads

Last synced: 05 May 2026

https://github.com/zelosleone/audiobook-generator

A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.

ai-audio audiobook cuda gpu-acceleration machine-learning pdf-converter python pytorch speech-synthesis text-processing text-to-speech

Last synced: 05 May 2026