An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/thesupercd/cuda_sort

A simple project implementing and measuring the runtime performance metrics related to massively parallel algorithms (radix sort) on an NVIDIA GPU device.

benchmarking c cpp cuda cuda-programming gpu-acceleration gpu-programming multithreading parallel-processing radix-sort sorting-algorithms

Last synced: 10 May 2026

https://github.com/grindelfp/cuda-texture-memory

Exercise on using texture memory in CUDA.

cuda texture-memory

Last synced: 30 Mar 2025

https://github.com/thesoenke/deeplearning-docker

Setup for Deep Learning experiments in Docker with Cuda

cuda docker fastai jupyter

Last synced: 11 May 2026

https://github.com/h4ck3r-04/fpassword

Fpassword merges Hashcat's hash-cracking precision with Hydra's parallelized network login, offering penetration testers a powerful tool for swift hash deciphering and simultaneous login attempts across diverse protocols.

brute-force brute-force-attacks c cracking cuda gpgpu hashcat hashes hydra network-security opencl password penetration-testing

Last synced: 16 Jan 2026

https://github.com/dragonscypher/prompty

Tool for generating smart and secure prompts for language models!

autotokenizer bert-model cuda google-t5 llm python3 tensorflow threading

Last synced: 02 Jan 2026

https://github.com/lucatedeschini/feedforwardnn

This project is my submission for the exam "Project Work in Architecture and Platform for Artificial Intelligence"

c cuda neural-networks openmp scratch-implementation

Last synced: 20 Apr 2026

https://github.com/akshaysinhaaa/emova

A deep learning framework designed for emotion and sentiment recognition using text, audio, and video modalities. This project leverages the MELD (Multimodal EmotionLines Dataset) to train a robust and flexible model that reflects human communication more accurately than unimodal models.

bert cnn cuda deep-learning multimodal python pytorch resnet-18 tensorboard transformers

Last synced: 05 May 2026

https://github.com/maltsev-andrey/julia_set_cuda

High-performance Julia set fractal computation in pure CUDA C, achieving 2.78 billion pixels/second on Tesla P100. Demonstrates GPU kernel programming, memory optimization, and massive parallelization (16M+ threads)."

cuda fractals gpu-programming high-performance-computing nvidia parallel-computing science visualization

Last synced: 03 Nov 2025

https://github.com/rbuj-uoc/m1.209

PAC 1, PAC 2, PAC 3 i PAC 4 de l'assignatura Computació d'altes prestacions del MUEI

cuda mpi openmp sge

Last synced: 21 May 2026

https://github.com/cmazakas/cuda-stuff

A CUDA-based playground

cmake cuda delaunay-triangulation vscode

Last synced: 24 Mar 2025

https://github.com/maxenceleguery/jare

3D Render engine accelerated with CUDA

3d cuda engine raytracing

Last synced: 21 May 2026

https://github.com/sbstndb/nbody_k

A simple 3D naïve NBody simulation using Kokkos enabling CUDA or OpenMP backend

cuda kokkos nbody openmp simulation

Last synced: 21 May 2026

https://github.com/Parxd/cuda-optim

various CUDA kernels optimized for specific ML algos

cuda machine-learning

Last synced: 02 Sep 2025

https://github.com/mattjesc/federated-learning-simulation-1gpu-mi-is

Federated Learning Simulation on a Single GPU with Model Interpretability and Interactive Visualization

ai cuda deep-learning distributed-systems federated-learning gpu hpc keras machine-learning ml model-interpretability python pytorch simulation streamlit tensorflow

Last synced: 05 Jan 2026

https://github.com/shermanlo77/poisson_icing

Gibbs sampling on the Poisson-Ising model. The Poisson-Ising model is a 2D image of Poisson distributed random variables but has a dependency on their four neighbours. This causes the Poisson random variables to be similar (or dissimilar) to their neighbours.

cuda cupy gibbs-sampling gpu ising-model mcmc monte-carlo poisson poisson-ising

Last synced: 21 May 2026

https://github.com/himeyama/cuda-convolve

convolve + cuda + ruby (1次元のみ対応)

cuda filter gem ruby

Last synced: 19 Apr 2026

https://github.com/bjornmelin/ml-algorithm-playground

🧪 Core ML algorithm implementations with GPU acceleration. Featuring optimized implementations across various libraries with comprehensive analysis. 📈

algorithms cuda gpu-computing lightgbm machine-learning python scikit-learn xgboost

Last synced: 13 May 2026

https://github.com/minseoc03/cuda-100-days

A 100-day journey to master CUDA programming, inspired by the CUDA-120-DAYS--CHALLENGE project. This repo contains daily CUDA exercises and code folders, with learning notes hosted on Notion. Practicing on leetgpu.com due to lack of local NVIDIA GPU.

100daysofcode cuda deeplearning gpgpu gpu hpc nvidia parallel-computing

Last synced: 19 Apr 2025

https://github.com/moesio-f/cla

C Linear Algebra (CLA) library. A simple toy library for basic vector/matrix operations with CUDA support and Python bindings.

c cuda linear-algebra python

Last synced: 09 May 2026

https://github.com/ndgigliotti/torch-ipca

GPU-accelerated Incremental PCA for PyTorch

cuda dimensionality-reduction gpu incremental-pca machine-learning pca pytorch

Last synced: 26 Jan 2026

https://github.com/ionmich/cs149-local-dev

Provides `conda` installation instructions for Stanford's CS149 (Parallel Computing) programming assignments

conda cs149 cuda ispc parallel-computing

Last synced: 31 Mar 2025

https://github.com/dasbd72/nthu-ipc-2022

National Tsing Hua University - Introduction to Parallel Computing - 2022

cuda cuda-programming hpc mpi openmp pthreads

Last synced: 30 Mar 2025

https://github.com/daelsepara/hipnewton

GPU Implementation of Newton Fractal Generator with Benchmarking

amd cuda fractal gpu gpu-compute gpu-computing hip newton parallel-computing rocm sdk

Last synced: 03 May 2026

https://github.com/anne-andresen/autoencoder_3d_c_cuda

3D Autoencoder training in raw C/CUDA

3d autoencoder c cuda nifti

Last synced: 28 Apr 2026

https://github.com/fedesky25/hpc-project-2024

Project for the 2024 course of HPC: generator of streamplot of complex-valued functions

complex-numbers cuda openmp

Last synced: 30 Mar 2025

https://github.com/cs550-epfl/review

Review of the paper A Formal Analysis of the NVIDIA PTX Memory Consistency Model

cuda formal-verification gpu memory-consistency ptx simt

Last synced: 30 Mar 2025

https://github.com/td99/ai-sandbox

A collection of AI tools and prototypes.

ai cuda docker image-generation-ai nvidia python

Last synced: 08 Apr 2026

https://github.com/belrbez/ship-graphic-qt-qml-cuda-c

Client-Server application for Rocket driving in QML graphics

c client-server cpp cuda qml qt5 rocket

Last synced: 08 Apr 2026

https://github.com/cuda8/brainwords2

GPU brainflayer for sale $250

brain brainflayer brainwords cuda gpu key pass passphrase private

Last synced: 10 Mar 2025

https://github.com/shtrophic/wicuvanity

Generate wireguard vanity keys on your Nvidia GPU

cuda gpu vanity-address vanity-addresses vanitygen wireguard

Last synced: 10 Mar 2025

https://github.com/Neuro-Mechatronics-Interfaces/python-intan

Tools and demos for working with EMG data from intan using python

circuitpython cuda emg pico python realtime tensorflow

Last synced: 13 Jan 2026

https://github.com/uefi-code/bachelorgraduationdesign

I developed a PyTorch_For_PoorGuys framework and Let it train LLM on NVIDIA GeForce 2080Ti GPU as my Bachelor's Graduation Design Project

chatbot cuda gpu hacking large-language-models pytorch

Last synced: 03 May 2026

https://github.com/sergeipapina/color2graycuda

color to gray image conversion nvidia CUDA kernel implementation using make or cmake to compile and link

cmake cuda cuda-kernels cuda-programming link makefile nvidia

Last synced: 06 Apr 2025

https://github.com/kataglyphis/machinelearningalgorithms

Basic Machine Learning Algorithms

cuda machine-learning python tensorflow

Last synced: 31 Mar 2025

https://github.com/codename-detective/cuda_gpgpus_shared_memory_systems_pdp

CUDA GPGPUs Shared Memory Systems Parallel & Distributed Programming

cuda cuda-programming numa parallel-programming

Last synced: 30 Mar 2025

https://github.com/voltr0x/raytracing-cuda

Raytracing in a weekend using CUDA

cpp11 cuda raytracing sdl2

Last synced: 01 Apr 2026

https://github.com/AndreasKaratzas/orin

Setting up the NVIDIA Jetson Orin Nano Developer Kit

cuda cudnn jetpack6 nvidia-jetson nvidia-sdkmanager orin-nano

Last synced: 25 Feb 2025

https://github.com/adesoji1/youtubesummaryai

Python script for YouTube summary. The service should summarize an YouTube video by url. It should works for long video and for different languages.

cuda googleapi python3 speech-recognition transformers youtube-api-v3 youtube-dl

Last synced: 04 Apr 2025

https://github.com/alkaifaftab000/autonomous-maze-solver

Building an Autonomous Maze Solver using reinforcement learning to train agents for decision-making in dynamic grid-based environments

agent criticism cuda gymnasium-environment maze-solving-bot pytorch reinforcement-learning reward-functions

Last synced: 12 Apr 2026

https://github.com/larygwil/cuda-samples-old

nvidia cuda samples old (5.0 - 7.5)

cuda nvidia

Last synced: 03 May 2026

https://github.com/tylerfaulkner/n-body_simulation

CUDA N-Body Gravitational Simulation with rendering in Python with MatPlotLib

cuda simulation

Last synced: 20 May 2026

https://github.com/kronbii/thermal-super-resolution

State-of-the-art thermal super-resolution system (IMDN) with RGB→thermal adaptation, custom multi-component loss, 29.6 dB PSNR, 0.713 SSIM, 250+ FPS, production-ready PyTorch + CUDA implementation.

computer-vision cuda deep-learning image-enhancement imdn model-optimization production-machine-learning pytorch real-time real-time-processing research super-resolution thermal-imaging

Last synced: 18 Apr 2026

https://github.com/asadiahmad/100_sports_image_classification

A deep learning project for sport image classification using a custom VGG19-based architecture with integrated Grad-CAM heatmap visualization for model interpretability.

computer-vision cuda data-augmentation deep-learning explainable-ai gpu-acceleration grad-cam heatmap-visualization image-classification mixed-precision-training pytorch pytorch-grad-cam sports-analytics sports-classification transfer-learning vgg19

Last synced: 11 Jun 2025

https://github.com/ysl1016/cudadigitfilter

CUDA-based parallel image filtering system for MNIST dataset

computer-vision cuda deep-learning gpu-acceleration image-processing mnist parallel-computing

Last synced: 28 Mar 2025

https://github.com/githubfoam/cuda-travisci

cuda miniconda pytorch

cuda miniconda pytroch

Last synced: 30 Mar 2025

https://github.com/ojaswithag/opencv-doc

OpenCV ile görüntü ve video işleme, makine öğrenmesi ve proje uygulamaları için Türkçe kapsamlı bir rehber. 🐙 Adım adım kod örnekleriyle öğrenin ve projeler geliştirin.

arm-architecture cuda cuda-support deployment django docker-image docker-images heroku image-processing javascript nodejs nvidia opencv-contrib opencv3 production python scanner tutorial

Last synced: 08 Apr 2026

https://github.com/yangfengzzz/tardis

Travel space and time by using autodiff and codegen

autodiff codegen cuda

Last synced: 03 May 2026

https://github.com/ergus/cuda-ts-mode

An emacs Cuda mode supported by tree-sitter

cuda emacs treesitter

Last synced: 20 May 2026

https://github.com/branebb/nn-framework

Framework for creating neural networks using C++ and CUDA platform. This project is part of my final university assignment for bachelor's degree.

cmake cpp cuda cuda-programming

Last synced: 20 Jan 2026

https://github.com/voduchuy/cudafsp

CUDA-based implementation of the Finite State Projection (FSP) algorithm.

chemical-master-equation cuda stochastic-reaction-networks sundials

Last synced: 20 Jan 2026

https://github.com/maltsev-andrey/cuda-nn-inference

GPU-accelerated neural network inference using custom CUDA kernels. Achieves 97.82% accuracy on MNIST.

cuda deep-learning gpu-programming neural-networks numba nvidia parallel-computing parallel-programming performance-optimization python3 pytorch rhel9 tesla-p100

Last synced: 07 Mar 2026

https://github.com/gama1903/cuda_programming

Practice of cuda programming

cuda parallel-computing

Last synced: 01 Nov 2025

https://github.com/andreasholt/cuda-matmul-benchmarking

Implementing and benchmarking various matmul implementations in CUDA

cuda matrix-multiplication

Last synced: 01 Nov 2025

https://github.com/nxoti1/points-reader-ocr

🖥️ Extract text from images easily with POINTS-Reader OCR, a high-accuracy application for seamless document conversion and processing.

cuda gradio huggingface-transformers ocr open-source points-reader reportlab spaces tencent vision-language-model vlm

Last synced: 20 May 2026

https://github.com/rainlumostaipei/cuda-qnet-a2c

Qnet and A2C impl in cuda

a2c cuda qnet

Last synced: 26 Jun 2025

https://github.com/ludekcizinsky/fast-cg-solver

Implementation of Conjugate Gradient (CG) algorithm for solving sparse linear systems using MPI and CUDA.

conjugate-gradient cuda mpi

Last synced: 17 May 2026

https://github.com/tomtolleson/cuda-kernel-benchmarking-tool

A benchmarking tool in C++ that creates Cuda kernels and tests the overall system performance between CPU and GPU

cuda cuda-kernels cuda-support cuda-toolkit nvidia nvidia-cuda nvidia-gpu

Last synced: 30 Mar 2025

https://github.com/myselfaryan/attention-mechanism

Accelerating Scaled Dot-Product Attention using OpenMP and CUDA

cuda openmp

Last synced: 27 Apr 2026

https://github.com/juliankarrer/reyn

CUDA-based Implementation of Smoothed Particle Hydrodynamics for Fluid Simulation

cuda fluid lagrangian simulation sph

Last synced: 31 Oct 2025

https://github.com/nabilshadman/cuda-4-dummies

Lecture slides and exercise files of the CUDA 4 Dummies course (2025)

cuda gpu-computing high-performance-computing nsight-systems nvidia-gpu parallel-computing

Last synced: 31 Oct 2025

https://github.com/flosmume/cpp-cuda-streams-and-pinned-mem

A CUDA C++ demo showing how to overlap data transfer and kernel execution using multiple streams and pinned (page-locked) host memory. This project illustrates asynchronous memcpy, event timing, and performance benefits of concurrent GPU execution — essential for building high-throughput pipelines.

asynchronous-execution cuda cuda-streams gpu parallel-programming performance-optimization pinned-memory

Last synced: 13 May 2026

https://github.com/sephiroth7712/k-nearest-neigbours

Implementation of K-Nearest Neighbors algorithm using multiple parallel computing approaches: CUDA (GPU), Hadoop, Spark, MPI, OpenMP, and PThreads. Demonstrates scalable machine learning across different parallel computing paradigms from GPU to distributed frameworks.

cuda cuda-programming hadoop-mapreduce java mpi multiprocessing multithreading openmp pthreads scala spark

Last synced: 12 Apr 2026

https://github.com/uva-trasgo/controllers

Read-only mirror of the official repository: https://gitlab.com/trasgo-group-valladolid/controllers. Controllers is a library written in C11 that provides a simplified way to program applications that can exploit heterogeneous computational platforms including accelerators and/or multi-core CPUs.

cuda heterogeneous-computing heterogeneous-parallel-programming hip opencl openmp

Last synced: 12 May 2026

https://github.com/mahdi-hasan-shuvo/ml-opensource-project

is an open source repository focused on providing practical and educational machine learning resources. The project aims to make learning and applying machine learning more accessible through well-documented code, tutorials, and real-world examples.

cuda machine-learning machine-learning-algorithms ml-projects open-source python

Last synced: 19 May 2026

https://github.com/eastonman/tensorrt-pytorch-wrapper

A wrapper makes TensorRT engine accept PyTorch Cuda Tensor.

cuda pytorch tensorrt

Last synced: 06 May 2026

https://github.com/TeamBipartite/bipartite-gemm

High throughput data-parallel GEMM implementations in Cuda using Cuda cores and Tensor cores

cuda data-parallelism gemm

Last synced: 14 Jan 2026

https://github.com/naetherm/derelictcublas

Dynamic bindings to the CuBLAS library for the D Programming Language.

cublas cuda d derelict dlang

Last synced: 25 Jun 2026

https://github.com/drilonaliu/parallel-mandelbrot-set

GPU-accelerated Mandelbrot Set generation with CUDA and OpenGL interoperability.

cuda fractals gpu mandelbrot-fractal parallel-programming

Last synced: 12 Apr 2026

https://github.com/aurelienperez/gpu-heston-monte-carlo

GPU-accelerated Monte Carlo simulation for option pricing under the Heston model using CUDA.

cuda gpu heston-model

Last synced: 01 Apr 2025

https://github.com/nikhilrout/thetensorcoreproject

Microarchitecture implementation of Nvidia's Tensor Cores

cuda floating-point gpgpu hybrid-precision-training tensorcore

Last synced: 01 Apr 2025

https://github.com/akira4o4/cuda-program

CUDA YOLO Processing

cuda yolo

Last synced: 22 Jul 2025

https://github.com/ramyacp14/document-based-question-and-answers

Developed a document question answering system that utilizes Llama and LangChain for contextual and accurate answers. The system supports .txt documents, intelligent text splitting, and context-aware querying through an easy-to-use Streamlit interface.

chroma cuda hugging-face langchain llama python recursivecharactertextsplitter streamlit

Last synced: 07 Mar 2026

https://github.com/jaderock/cuda-by-example

Sample CUDA projects for the CUDA by Example book

bazel c cpp cuda gpu

Last synced: 05 May 2026

https://github.com/storterald/neural-network

Simple neural network implementation in C++ and CUDA

asm asmx86 c-plus-plus cmake cpp cuda machine-learning neural-network

Last synced: 28 Mar 2025

https://github.com/yutakseo/docker_ubuntu-cuda_environment

🐳 A ready-to-use Docker environment for deep learning development with Ubuntu 22.04 and CUDA 11.8.

container cuda docker environment ubuntu

Last synced: 12 Apr 2026

https://github.com/amypad/miutil

Basic functionality needed for AMYPAD

cuda matlab medical-imaging python

Last synced: 13 May 2025

https://github.com/ivanfioravanti/tflops_mps

TFLOPs testing on MPS and CUDA

cuda mps tflops

Last synced: 19 May 2026

https://github.com/isquicha/cuda-parallel-studies

Learning CUDA programming here =D

cuda cuda-programming cuda-toolkit

Last synced: 03 Jul 2025

https://github.com/grindelfp/cuda-n-body-simulation

Simulation of N-Body movement using CUDA.

cuda n-body-simulation

Last synced: 06 Apr 2025

https://github.com/drilonaliu/parallel-fractal-tree

GPU-accelerated fractal tree generation with CUDA and OpenGL interoperability.

cuda fractal-tree fractals gpu

Last synced: 19 May 2026

https://github.com/fikri-rouzan/cuda-c-program-part-1

CUDA C program from NVIDIA course.

c cuda

Last synced: 12 Apr 2026

https://github.com/lixk28/knn-cuda

cuda knn

Last synced: 01 Apr 2025

https://github.com/patriciobcs/mini-aevol

Parallel implementation of a reduced version of the Aevol simulator

aevol cuda simulation

Last synced: 19 May 2026

https://github.com/alpinebuster/meshlib

Mesh processing library with extra `C/C#/JS/TS/PYTHON` bindings.

cuda dicom electron emscripten mesh mesh-modelling pybind11 stl stomatology threejs wasm

Last synced: 03 Jul 2025

https://github.com/muneeb706/cuda

sample programs implemented using cuda (gpu)

cplusplus cuda gpu-programming

Last synced: 19 May 2026

https://github.com/pipecruz/cuda-flocking-sim

CPU and GPU (CUDA) implementations of naive/optimized flocking algorithms

cuda

Last synced: 07 May 2026

https://github.com/chiragajain/gpu-optimization-roadmap

This repository is part of a structured curriculum designed to master GPU optimization, Triton, Deep Learning, and LLMs. This section focuses on GPU fundamentals, CUDA programming, and PyTorch optimizations.

cuda deeplearning gpu-acceleration learning python pytorch triton

Last synced: 18 Feb 2026

https://github.com/kar-dim/CAS-2D

Implementation of the AMD FidelityFX CAS (Contrast Adaptive Sharpening) algorithm on CUDA, for sharpening static images.

cpp cuda dll fidelityfx gpu image-processing parallel-computing sharpen

Last synced: 01 Nov 2025

https://github.com/mxm-tr/docker-darknet-opencv

Accelerated objects detection on streams and files, using a Docker darknet YOLO container

cuda docker docker-compose object-recognition opencv-python python3 yolo

Last synced: 10 Apr 2026