An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/xueeinstein/udacity-cs344-cuda8

Code for Udacity CS344 (Intro to Parallel Programming) using CUDA 8.0

cuda cuda-8 parallel-computing

Last synced: 02 May 2026

https://github.com/jiaau/kernels

This repository showcases common optimization techniques for kernels.

cpp cuda cute cutlass hpc kernel

Last synced: 21 Apr 2026

https://github.com/fxzxmicah/fedora-llama-cpp

llama.cpp tools with OpenMP, CUDA, and OpenVINO support

cuda fedora llama-cpp openmp openvino rpm

Last synced: 05 Jun 2026

https://github.com/juntyr/necsim-rust-docs

Documentation of the spatially explicit biodiversity simulation necsim-rust

biodiversity cuda docs mpi necsim rust simulation

Last synced: 14 May 2026

https://github.com/shermanlo77/oxwasp_phd

Code for the PhD thesis. The topic was on defect detection of 3D printing using x-rays. The repository includes an implementation of the mode filter and empirical null filter.

3d-printing applied-statistics computational-statistics cuda empirical-null imagej mode-filter statistics xray-projection

Last synced: 27 Mar 2025

https://github.com/dimitrijkrstev/pp-cuda-fft

A parallelised CUDA implementation of the FFT Radix-2 algorithm and its execution time comparison to the DFT and non-parallelised Radix-2

cuda fft parallel-computing

Last synced: 22 Apr 2026

https://github.com/mdnpascual/judgebarmashvp

Error bar for the game called Mash VP

cuda emgucv screencapturer tesseract-ocr

Last synced: 22 Apr 2026

https://github.com/bergolho/sycl

Repository with simple programs to learn SYCL.

cpp cuda sycl

Last synced: 16 May 2026

https://github.com/maxenceleguery/jat

Tensor library

computation cuda tensor

Last synced: 24 Apr 2026

https://github.com/bikemazzell/tuonella-sift

A high-performance, memory-efficient CSV deduplication tool

csv cuda deduplication logger osint rust

Last synced: 24 Apr 2026

https://github.com/cserajdeep/dnn-iris-pytorch

Deep Neural Network with Batch normalization for tabulat datasets.

batch batch-normalization classification cuda deep-learning dnn iris-dataset

Last synced: 02 May 2026

https://github.com/bardifarsi/threadpoolmanager

ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.

cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe

Last synced: 24 Apr 2026

https://github.com/fanziyang-v/parallel-computing

Parallel Computing course materials from Harbin Institute of Technology(Shenzhen).

cuda openmp openmpi parallel-computing

Last synced: 27 Mar 2025

https://github.com/juntyr/necsim-rust-analysis

Analysis of the spatially explicit biodiversity simulation `necsim-rust`

analysis biodiversity cuda mpi necsim rust simulation

Last synced: 24 Apr 2026

https://github.com/tzervas/unsloth-rs

Memory-optimized GPU kernels for LLM fine-tuning in Rust (2-5x speedup, 70-80% less VRAM)

cuda gpu machine-learning optimization rust

Last synced: 25 Jan 2026

https://github.com/illagrenan/cuda-80-cudnn6-runtime-1604-py36

Ubuntu 16.04 with Python 3.6 and CUDA Dockerfile

cuda dockerfile ubuntu

Last synced: 22 Jun 2025

https://github.com/0xsooki/extending-jax

JAX Custom Operations with C++ and CUDA (using Pybind11)

cuda jax pybind11 xla

Last synced: 25 Apr 2026

https://github.com/danieljvickers/fluid_simulation

An educational example for learning the Navier-Stoke equations. Also included is a C++ and CUDA shared object library, buildable with CMake, for use in your personal projects.

cpp cuda differential-equations navier-stokes numpy physics python simulation

Last synced: 04 May 2026

https://github.com/sangioai/torchpace

PyTorch CUDA/C++ extension of PACE: Transformer non-linearlity accelerator engine.

cuda pytorch transformer

Last synced: 25 Apr 2026

https://github.com/shineiarakawa/particle-stabilizer

A C++ and CUDA-based program for simulating the motion of particles.

cpp cuda n-body particles

Last synced: 12 May 2026

https://github.com/daviddavo/19gpu

Short exercises for GPU at Complutense University of Madrid. Mirror from GitLab

accelerator cuda gpu-programming

Last synced: 26 Apr 2026

https://github.com/oaslananka/cv_cuda_cpp_sample

This is a sample project demonstrating how to use OpenCV and CUDA in C++ for detecting people in drone footage with YOLO. The project aims to be simple and understandable for those who want to learn how to use OpenCV and CUDA in C++.

computervision cpp cuda opencv

Last synced: 01 May 2026

https://github.com/waz4/tinycomb

A lightweight C and CUDA library for efficiently calculating combinations with repetition. Jump to any combination much faster than bruteforce methods, leveraging precomputed factorials and `tiny-bignum-c` for big-number support.

c combinations-generator combinations-with-repetition cuda tiny-bignum-c tinycomb

Last synced: 02 May 2026

https://github.com/shashshukla/ee-210-signals-and-systems

Code for the assignments for EE-210, Signals and Systems, at IIT Bombay 2016.

cuda image-processing signal-processing

Last synced: 26 Apr 2026

https://github.com/sergiomarquezdev/yt-transcriber

🛠️ CLI tool to transcribe YouTube videos using OpenAI Whisper with CUDA acceleration, generate AI summaries (EN/ES) with Gemini, and create LinkedIn/Twitter content. Supports YouTube, Google Drive, and local files.

ai cli cuda gemini python transcription whisper youtube

Last synced: 15 May 2026

https://github.com/countzero/windows_exllama

This is a playground to explore the ExLlama project in a Windows environment.

conda cuda exllama python torch

Last synced: 26 Apr 2026

https://github.com/alexyzha/cuda-bioinformatics

A CUDA-Accelerated Bioinformatics Toolchain

bioinformatics bioinformatics-tool cplusplus cuda

Last synced: 26 Apr 2026

https://github.com/separatrixxx/pgp_labs_7_sem

👓 Laboratory work for the 7 semester of MAI on PGP and PDP

cpp cuda nvidia

Last synced: 15 May 2026

https://github.com/bjornmelin/edge-ai-engineering

📱 Optimized ML for edge devices. Showcasing efficient model deployment, GPU-CPU memory transfer optimization, and real-world edge AI applications. 🤖

cuda edge-computing embedded-systems gpu-optimization iot mobile-ml model-optimization python tflite

Last synced: 02 May 2026

https://github.com/mathiasotnes/gemm

General Matrix Multiplication (GEMM) optimization in Cuda.

cuda gpu

Last synced: 26 Mar 2025

https://github.com/baro-00/cpp-cuda-lab

Experimental C++ projects using NVIDIA CUDA for parallel computing. Learning & testing GPU kernels

cpp cuda

Last synced: 04 May 2026

https://github.com/mateuszk098/parallel-programming-examples

Simple parallel programming examples with CUDA, MPI and OpenMP.

cpp cuda mpi openmp parallel-programming

Last synced: 27 Apr 2026

https://github.com/kbredies/tgv_pycuda

Algorithms, examples and tests for denoising, deblurring, zooming, dequantization and compressive imaging with total variation (TV) and second-order total generalized variation (TGV) regularization. GPU-accelerated code using PyCUDA.

compressive-imaging cuda image-deblurring image-denoising image-dequantization image-zooming python3 total-generalized-variation total-variation

Last synced: 27 Apr 2026

https://github.com/notkartikye/cuda-image-box-filters

🖼️ CUDA-powered tool for applying box filters to a large amount of images

cuda cuda-library cuda-programming npp

Last synced: 27 Apr 2026

https://github.com/tornikeo/minimal-vscode-cuda-meson

Minimal sample of using VSCode and Meson to build CUDA applications

cuda meson template vscode

Last synced: 08 Sep 2025

https://github.com/lablup/backend.ai-accelerator-cuda

The Backend.AI CUDA Accelerator Plugin

backendai cuda

Last synced: 16 May 2026

https://github.com/luchrist69/ascent

📄 Improve your resume with Ascent, a simple web app that provides instant feedback to help you land more interviews, all for free.

agentic-ai ascent cuda dapr dapr-pub-sub datalog differential-equations docker engine kafka mpi odeint openai openai-api rancher-desktop rendering simulation simulation-framework

Last synced: 02 May 2026

https://github.com/0x778/gaussian_filter_using_cuda

Implemention of gaussain filter using CUDA

cuda cuda-kernels cuda-programming image-processing

Last synced: 04 May 2026

https://github.com/seanwevans/damnati

A CUDA-accelerated iterated prisoner's dilemma arena

arena cuda iterated-prisoners-dilemma prisoners-dilemma tournament

Last synced: 14 May 2026

https://github.com/gladap/heterogeneous_computing_project

Heterogeneous parallel programming exercise using OpenMP and CUDA to parallelize image filters

cuda heterogeneous-parallel-programming

Last synced: 27 Apr 2026

https://github.com/perhuepenbecker/cudyn

CUDA library for irregular tasks using a dynamic block-internal balancing mechanism

cpp cuda cuda-library cuda-programming gpu-computing gpu-programming irregular

Last synced: 28 Apr 2026

https://github.com/timxor/c_code

Some of my C code

c cuda m4 parallel-programming

Last synced: 03 May 2026

https://github.com/ncorgan/arrayfire-config-info

A small command-line utility that outputs all available ArrayFire devices

arrayfire cuda gpu opencl

Last synced: 28 Apr 2026

https://github.com/dwain-barnes/llm-gguf-auto-converter

Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.

auto-converter batch-processing cuda gguf huggingface jupyter-notebook llama-cpp llm model-quantization

Last synced: 17 Jun 2025

https://github.com/obsidianplusplus/yolov5-tensorrt-accelerator

基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization

cuda dynamic-shapes-cuda-stream fp16 int8 pycuda tensorrt yolov5

Last synced: 28 Apr 2026

https://github.com/rajkamalsah/flow-hpc-shocktrack

GPU-accelerated, fault-tolerant Schlieren/PIV shock tracking with interactive ROI, 1-px edges, and resumable training.

ai-ml computer-vision cuda fluid-dynamics hpc mlsystem opencv piv pytorch schlieren scientific-ml smalldata transformer

Last synced: 03 May 2026

https://github.com/dlzou/rt-weekend

Ray Tracing in One Weekend, using CUDA

cuda ray-tracing

Last synced: 28 Apr 2026

https://github.com/rog0d/gpuss_watchers

"The GPU Watchers swore upon their shared memory hierarchy, from L1 to global memory, which also served as their mandate as lords of parallel computation."

cuda gpu-acceleration gpu-monitoring gpu-profiling

Last synced: 28 Apr 2026

https://github.com/axeloooo/pytorch

Collection of deep learning workflows in PyTorch, from fundamentals and classification to transfer learning and experiment tracking.

cuda python pytorch

Last synced: 28 Apr 2026

https://github.com/ltsyk/smart-snake-ai

Advanced Deep Q-Network AI for Snake Game with CUDA support and 700% performance boost

artificial-intelligence cuda deep-q-network dqn game-ai machine-learning pytorch reinforcement-learning snake-game

Last synced: 28 Apr 2026

https://github.com/elcruzo/cuda-conv

Lightweight CUDA kernel for 2D image convolution achieving 20x+ speedup. Built with CuPy for the NVIDIA Hackathon.

computer-vision convolution cuda cupy gpu-computing hackathon high-performance-computing image-processing nvidia python

Last synced: 15 May 2026

https://github.com/atelierarith/julia_gpu_playground

For those who want use Julia with GPU

cuda docker docker-compose julia

Last synced: 28 Apr 2026

https://github.com/ccfelius/hpc

High Performance Computing (CUDA, MPI/openMP, high performance ML)

cuda high-performance-computing machine-learning mpi

Last synced: 28 Apr 2026

https://github.com/lehoangan2906/cuda_basics

A simple implementation of operations on vectors and matrices, optimized for running on Nvidia GPU with CUDA

cpp cuda cuda-programming

Last synced: 16 Jun 2025

https://github.com/emanuelemessina/cuda-benchmark

Evaluate matrix calculations time between CPU and GPU (CUDA)

benchmark cuda matrix-calculations

Last synced: 28 Apr 2026

https://github.com/shermanlo77/modefilter

ImageJ plugin, Java and CuPy implementation of the mode filter and empirical null filter. The mode filter is an edge-preserving smoothing filter by taking the mode of the empirical density.

cuda cupy empirical-null fiji filter image-filter imagej jcuda mode-filter

Last synced: 28 Apr 2026

https://github.com/0xhilsa/tenop

A lightweight & minimalist tensor computation library with CUDA backend

bash c cuda python3 tensor

Last synced: 13 Apr 2026

https://github.com/timvgl/cuxrft

Performs FFT in xarrays using cuda

cuda cupy fft python xarray

Last synced: 07 Jan 2026

https://github.com/lionpsiuc/cflow

A computational model for heat propagation in a cylindrical radiator using both CPU and GPU parallel processing. The simulation uses finite difference methods to model the directional flow of heat through a cylindrical pipe system with specific boundary conditions and cyclic connections between pipe segments.

c cuda parallel-programming

Last synced: 29 May 2026

https://github.com/bjornmelin/cuda-core-projects

🎯 Essential CUDA programming patterns and optimizations. Showcasing parallel computing expertise through matrix operations, memory management, and advanced kernel implementations. 💻

cpp cuda cuda-kernels gpu-computing high-performance-computing nvidia optimization parallel-computing

Last synced: 12 Apr 2026

https://github.com/karusb/2dca-cuda

2 Dimensional Cellular Automata Visualisation (Game of Life)

algorithm-flowchart cellular-automata cuda game game-of-life glut visual-studio

Last synced: 12 Apr 2026

https://github.com/emanuelemessina/gigacheck

ABFT Matrix Multiplication of any size in CUDA

abft cuda matrix-multiplication

Last synced: 28 Feb 2025

https://github.com/enapiuz/logic-circuit-simulator

Logic circuit (based on NAND gates) simulator using OpenCL

c circuit-simulator cuda digital-logic gpgpu logic-gates opencl simulator

Last synced: 03 May 2026

https://github.com/deltatecs/voses

Volatile Secret Searcher - massively parallel, brute force memory dump analysis for (D)TLS secret extraction

cuda memory-hacking reverse-engineering tls

Last synced: 15 Jun 2025

https://github.com/fmigneault/dockers

Collection of docker setup with common libraries for image processing and machine learning.

boost cuda docker image-processing opencv python

Last synced: 12 Apr 2026

https://github.com/mohammadshabazuddin/text_to_speech_generation_with_llm_with_hugging_face

Build a text-to-speech generation system using LLMs and Hugging Face to convert text into natural audio speech.

cuda huggingface-transformers llms nlp

Last synced: 03 May 2026

https://github.com/boned-fruitwood759/whisperx-asr-with-fastapi

🎤 Enable real-time speech recognition with WhisperX using FastAPI for efficient, scalable audio processing.

asr ctranslate2 cuda fastapi openai python speech-recognition torch transformers whisper whisperx

Last synced: 12 Apr 2026

https://github.com/occisor2/fluidsimulation

Second project of my parallel algorithms course

cuda high-performance-computing

Last synced: 28 Feb 2025

https://github.com/prdai/mnist-digit-recognition

A PyTorch-based deep learning implementation for MNIST digit recognition featuring CNNs, GPU acceleration, experiment tracking, and comprehensive testing capabilities.

cnn computer-vision cuda data-science deep-learning digit-recognition image-classification machine-learning mnist neural-networks python pytorch wandb

Last synced: 12 Apr 2026

https://github.com/boohohoo/shamining

Shamining is a cloud mining service that allows users to mine cryptocurrencies without the need for personal hardware. By renting computing power from eco-friendly data centers, users can mine efficiently. The platform offers easy-to-use interface, flexible contracts, and daily payouts.

cryptocurrency cryptomining cuda gpu-mining mining mining-software open-source opencl

Last synced: 04 Jul 2025

https://github.com/marcorentap/kokkos-docker-cluster

Deploy Docker containers with Kokkos, OpenMP, OpenMPI and CUDA as a Docker swarm.

cuda docker hpc kokkos

Last synced: 10 Mar 2025

https://github.com/pintamonas4575/rlgan-project-maadm-upm

Neuroevolution to learn the Lunar Lander from Gymnasium and a GAN to learn to color images. Subject from the ML and BD master´s degree of UPM.

cifar10 cuda dcgan deep-learning flappy-bird gan genetic-algorithm lunar-lander machine-learning mlp python3 pytorch reinforcement-learning tensorflow wgan-gp

Last synced: 12 Apr 2026

https://github.com/pipecruz/cuda-flocking-sim

CPU and GPU (CUDA) implementations of naive/optimized flocking algorithms

cuda

Last synced: 07 May 2026

https://github.com/hrolive/data-analytics-in-the-era-of-large-scale-machine-learning

Slides and other material for the Cyprus NCC training event about "Data analytics in the era of large-scale machine learning".

cuda deep-learning gpu-acceleration gradient-boosting large-language-models machine-learning preprocessing python pytorch

Last synced: 13 Apr 2026

https://github.com/alpinebuster/meshlib

Mesh processing library with extra `C/C#/JS/TS/PYTHON` bindings.

cuda dicom electron emscripten mesh mesh-modelling pybind11 stl stomatology threejs wasm

Last synced: 03 Jul 2025

https://github.com/lixk28/knn-cuda

cuda knn

Last synced: 01 Apr 2025

https://github.com/9prady9/archdock

Arch linux docker image for app development

arch-linux arrayfire cuda docker-image forge opencl

Last synced: 03 May 2026

https://github.com/fikri-rouzan/cuda-c-program-part-1

CUDA C program from NVIDIA course.

c cuda

Last synced: 12 Apr 2026

https://github.com/isquicha/cuda-parallel-studies

Learning CUDA programming here =D

cuda cuda-programming cuda-toolkit

Last synced: 03 Jul 2025

https://github.com/yutakseo/docker_ubuntu-cuda_environment

🐳 A ready-to-use Docker environment for deep learning development with Ubuntu 22.04 and CUDA 11.8.

container cuda docker environment ubuntu

Last synced: 12 Apr 2026

https://github.com/jaderock/cuda-by-example

Sample CUDA projects for the CUDA by Example book

bazel c cpp cuda gpu

Last synced: 05 May 2026

https://github.com/matthewfeickert/report-urssi-fellowship-2025

Report on URSSI 2025 Early-Career Fellowship

cuda pixi urssi

Last synced: 17 Jan 2026

https://github.com/akira4o4/cuda-program

CUDA YOLO Processing

cuda yolo

Last synced: 22 Jul 2025

https://github.com/alessiobugetti/histogram-equalization

Implements sequential and parallel histogram equalization in C++ and Python, utilizing CUDA for parallel computation on GPU

cuda gpu-acceleration histogram-equalization parallel-computing pycuda

Last synced: 04 May 2026