An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/amypad/miutil

Basic functionality needed for AMYPAD

cuda matlab medical-imaging python

Last synced: 13 May 2025

https://github.com/f-koehler/itesol

WIP: Iterative eigensolvers for C++20, Python and CUDA

cpp20 cuda eigenvalues linear-algebra python

Last synced: 08 Nov 2025

https://github.com/storterald/neural-network

Simple neural network implementation in C++ and CUDA

asm asmx86 c-plus-plus cmake cpp cuda machine-learning neural-network

Last synced: 28 Mar 2025

https://github.com/cerit-sc/scipion-docker

Scipion (Cryo em image processing framework (https://scipion.i2pc.es/)) adapted to run in Kubernetes.

cryo-em cryoem cuda desktop kubernetes scipion vnc

Last synced: 02 Aug 2025

https://github.com/pintamonas4575/rlgan-project-maadm-upm

Neuroevolution to learn the Lunar Lander from Gymnasium and a GAN to learn to color images. Subject from the ML and BD master´s degree of UPM.

cifar10 cuda dcgan deep-learning flappy-bird gan genetic-algorithm lunar-lander machine-learning mlp python3 pytorch reinforcement-learning tensorflow wgan-gp

Last synced: 12 Apr 2026

https://github.com/empenoso/doorcam-face-report

Пример проекта по распознаванию лиц с CUDA-ускорением. Включает скрипты для автоматической сборки dlib и анализа видео на GPU

cuda dlib dlib-face-detection

Last synced: 19 May 2026

https://github.com/marcorentap/kokkos-docker-cluster

Deploy Docker containers with Kokkos, OpenMP, OpenMPI and CUDA as a Docker swarm.

cuda docker hpc kokkos

Last synced: 10 Mar 2025

https://github.com/antoniakras/semantic-video-search

GPU-optimized semantic search on video transcripts, with benchmarking of FAISS, Pinecone, and PostgreSQL vector databases. Deployed via Docker on FORTH’s GPU infrastructure.

bert-embeddings bert-fine-tuning cuda dokcer embedding-models embeddings-word2vec faiss-vector-database gpu-computing huggingface-transformers nlp-machine-learning pgvector pineconedb postgresql python pytorch retrieval-augmented-generation similarity-search vector-database whisper-ai

Last synced: 03 May 2026

https://github.com/boohohoo/shamining

Shamining is a cloud mining service that allows users to mine cryptocurrencies without the need for personal hardware. By renting computing power from eco-friendly data centers, users can mine efficiently. The platform offers easy-to-use interface, flexible contracts, and daily payouts.

cryptocurrency cryptomining cuda gpu-mining mining mining-software open-source opencl

Last synced: 04 Jul 2025

https://github.com/naetherm/derelictcurand

Dynamic bindings to the CuRAND library for the D Programming Language.

cuda curand d derelict dlang

Last synced: 27 Mar 2025

https://github.com/prdai/mnist-digit-recognition

A PyTorch-based deep learning implementation for MNIST digit recognition featuring CNNs, GPU acceleration, experiment tracking, and comprehensive testing capabilities.

cnn computer-vision cuda data-science deep-learning digit-recognition image-classification machine-learning mnist neural-networks python pytorch wandb

Last synced: 12 Apr 2026

https://github.com/occisor2/fluidsimulation

Second project of my parallel algorithms course

cuda high-performance-computing

Last synced: 28 Feb 2025

https://github.com/ramyacp14/document-based-question-and-answers

Developed a document question answering system that utilizes Llama and LangChain for contextual and accurate answers. The system supports .txt documents, intelligent text splitting, and context-aware querying through an easy-to-use Streamlit interface.

chroma cuda hugging-face langchain llama python recursivecharactertextsplitter streamlit

Last synced: 07 Mar 2026

https://github.com/boned-fruitwood759/whisperx-asr-with-fastapi

🎤 Enable real-time speech recognition with WhisperX using FastAPI for efficient, scalable audio processing.

asr ctranslate2 cuda fastapi openai python speech-recognition torch transformers whisper whisperx

Last synced: 12 Apr 2026

https://github.com/bd2720/accesspatterns

Comparing chunked vs. striped memory access patterns for CPU and GPU code using the CUDA toolkit in C.

c cache cuda cuda-toolkit performance-analysis performance-testing profiling

Last synced: 16 May 2026

https://github.com/yash-1335/qwen600

🚀 Build a fast inference engine for the QWEN3-0.6B model using CUDA, optimizing performance with minimal dependencies for efficient learning and practice.

cuda cuda-programming gpu llamacpp llm llm-inference qwen qwen3 transformer

Last synced: 16 May 2026

https://github.com/lanceberge/cuda-newton-fractals

Parallelize and visualize the Newton Iteration

cpp cuda mathematical-modelling visualization

Last synced: 16 May 2026

https://github.com/fmigneault/dockers

Collection of docker setup with common libraries for image processing and machine learning.

boost cuda docker image-processing opencv python

Last synced: 12 Apr 2026

https://github.com/tornikeo/sample-openmp-in-cuda

Sample of using OpenMP and CUDA: single GPU, multiple CPU

cuda meson openmp

Last synced: 01 Aug 2025

https://github.com/emanuelemessina/gigacheck

ABFT Matrix Multiplication of any size in CUDA

abft cuda matrix-multiplication

Last synced: 28 Feb 2025

https://github.com/karusb/2dca-cuda

2 Dimensional Cellular Automata Visualisation (Game of Life)

algorithm-flowchart cellular-automata cuda game game-of-life glut visual-studio

Last synced: 12 Apr 2026

https://github.com/naetherm/derelictcublas

Dynamic bindings to the CuBLAS library for the D Programming Language.

cublas cuda d derelict dlang

Last synced: 31 Oct 2025

https://github.com/9prady9/imageconvolve

Qt app for previewing Image convolution. Uses CUDA for convolution.

c-plus-plus convolution cuda desktop-app qt

Last synced: 03 May 2026

https://github.com/voschezang/holographic-projector-simulations

Optimizations of Simulations of Holographic Projectors using CUDA

cuda gpu holography parallel-computing photonics

Last synced: 16 May 2026

https://github.com/jarmak-personal/vibespatial

GPU-first spatial analytics for Python. Drop-in GeoPandas replacement powered by runtime-compiled CUDA kernels

cccl cuda geodataframe geopandas geospatial gpu gpu-computing nvrtc python spatial-analytics

Last synced: 21 Apr 2026

https://github.com/bjornmelin/cuda-core-projects

🎯 Essential CUDA programming patterns and optimizations. Showcasing parallel computing expertise through matrix operations, memory management, and advanced kernel implementations. 💻

cpp cuda cuda-kernels gpu-computing high-performance-computing nvidia optimization parallel-computing

Last synced: 12 Apr 2026

https://github.com/mcp-tool-shop-org/gpu-container

Model-aware inference memory-placement planner for single-GPU rigs: profile hardware + model, generate explicit VRAM/RAM/NVMe placement plans across runtimes (llama.cpp/vLLM/...), and prove them with a measured receipt. Not VRAM overflow - declared placement.

cuda gpu inference llama-cpp llm moe offload vram wsl2

Last synced: 09 Jun 2026

https://github.com/shambac/shamboflow

Fierce tensorflow competitor

cuda cupy machine-learning numpy pypi-package

Last synced: 19 Feb 2026

https://github.com/lionpsiuc/cflow

A computational model for heat propagation in a cylindrical radiator using both CPU and GPU parallel processing. The simulation uses finite difference methods to model the directional flow of heat through a cylindrical pipe system with specific boundary conditions and cyclic connections between pipe segments.

c cuda parallel-programming

Last synced: 29 May 2026

https://github.com/illagrenan/cuda-90-cudnn7-runtime-1604-py36

Ubuntu 16.04 with Python 3.6 and CUDA9 Dockerfile

cuda dockerfile python ubuntu

Last synced: 03 May 2026

https://github.com/luis-kr/depthmap

Depth map estimation tool using Depth-Anything-V2. Generate accurate depth maps from images with support for both relative and metric depth measurements.

cuda depth-anything depth-estimation depth-map image-processing python pytorch

Last synced: 08 Feb 2026

https://github.com/eastonman/tensorrt-pytorch-wrapper

A wrapper makes TensorRT engine accept PyTorch Cuda Tensor.

cuda pytorch tensorrt

Last synced: 06 May 2026

https://github.com/psteinb/gtc2017

Slides for my presentation at GTC 2017 from May 8-11 in Silicon Valley

compression cuda ffmpeg gpu gpu-computing h264 h265 microscopes spim

Last synced: 03 May 2026

https://github.com/0xhilsa/tenop

A lightweight & minimalist tensor computation library with CUDA backend

bash c cuda python3 tensor

Last synced: 13 Apr 2026

https://github.com/timvgl/cuxrft

Performs FFT in xarrays using cuda

cuda cupy fft python xarray

Last synced: 07 Jan 2026

https://github.com/shermanlo77/oxwasp_phd

Code for the PhD thesis. The topic was on defect detection of 3D printing using x-rays. The repository includes an implementation of the mode filter and empirical null filter.

3d-printing applied-statistics computational-statistics cuda empirical-null imagej mode-filter statistics xray-projection

Last synced: 27 Mar 2025

https://github.com/macaycz/nn

A lightweight, GPU-accelerated machine learning library built with CUDA.

cuda deep-learning gpu machine-learning neural-network

Last synced: 25 Jul 2025

https://github.com/bergolho/sycl

Repository with simple programs to learn SYCL.

cpp cuda sycl

Last synced: 16 May 2026

https://github.com/mahdi-hasan-shuvo/ml-opensource-project

is an open source repository focused on providing practical and educational machine learning resources. The project aims to make learning and applying machine learning more accessible through well-documented code, tutorials, and real-world examples.

cuda machine-learning machine-learning-algorithms ml-projects open-source python

Last synced: 19 May 2026

https://github.com/hrolive/data-analytics-in-the-era-of-large-scale-machine-learning

Slides and other material for the Cyprus NCC training event about "Data analytics in the era of large-scale machine learning".

cuda deep-learning gpu-acceleration gradient-boosting large-language-models machine-learning preprocessing python pytorch

Last synced: 13 Apr 2026

https://github.com/malolm/football-player-detection-with-yolov8

Football player detection YOLOv8 fine-tuning

cuda jupyterlab python3 yolov8-detection

Last synced: 07 May 2026

https://github.com/faresargus/artaxerxes

Adaptive high-performance stress tester "artaxerxes" supports GPU, io_uring, DPDK, and eBPF/XDP for advanced cybersecurity labs. Ideal for network testing. 🚀🛠️

cuda cuda-programming cybersecurity cybersecurity-education cybersecurity-tools dpdk ebpf educational github-config high-performance network-security network-security-tool penetration-testing penetration-testing-framework penetration-testing-tools stress-testing

Last synced: 24 Jul 2025

https://github.com/matthewfeickert/report-urssi-fellowship-2025

Report on URSSI 2025 Early-Career Fellowship

cuda pixi urssi

Last synced: 17 Jan 2026

https://github.com/ray-chew/modified_ch

Density functional theory (DFT) and self-consistent field theory (SCFT) simulation of diblock copolymers

cuda density-functional-theory diblock-copolymer numerical-analysis numerical-methods self-consistent-field-theory

Last synced: 11 May 2026

https://github.com/hr-fahim/transformer-model-optimization

Sample GPT Transformer Model from Scratch.

cuda few-shot-learning transfomers

Last synced: 02 May 2026

https://github.com/xza85hrf/flux_pipeline

FluxPipeline is a prototype experimental project that provides a framework for working with the FLUX.1-schnell image generation model. This project is intended for educational and experimental purposes only.

ai cuda docker educational experimental flux1 flux1-schnell flux1ai gradio image-generation model non-commercial python pytorch research transformer-model

Last synced: 05 Jul 2025

https://github.com/doxakis/cosinesimilaritydistancesongpu

Compute cosine similarity distances for all combinations of the dataset on the gpu with CUDA

cuda

Last synced: 13 Apr 2026

https://github.com/uva-trasgo/controllers

Read-only mirror of the official repository: https://gitlab.com/trasgo-group-valladolid/controllers. Controllers is a library written in C11 that provides a simplified way to program applications that can exploit heterogeneous computational platforms including accelerators and/or multi-core CPUs.

cuda heterogeneous-computing heterogeneous-parallel-programming hip opencl openmp

Last synced: 12 May 2026

https://github.com/fanziyang-v/parallel-computing

Parallel Computing course materials from Harbin Institute of Technology(Shenzhen).

cuda openmp openmpi parallel-computing

Last synced: 27 Mar 2025

https://github.com/manishklach/gb300-rl-runtime

Close-to-metal C/CUDA lab for RL inference fast paths: persistent GPU workers, hugepage KV arenas, cacheline-aware command rings, and async reward handoff. Goal: remove page faults, malloc/free, scheduler wakeups, CPU round-trips, and KV migration from the per-token path.

ai-infrastructure close-to-metal cuda gb300 gpu-inference hpc lock-free nvlink reinforcement-learning spsc-queue

Last synced: 09 Jun 2026

https://github.com/tzervas/unsloth-rs

Memory-optimized GPU kernels for LLM fine-tuning in Rust (2-5x speedup, 70-80% less VRAM)

cuda gpu machine-learning optimization rust

Last synced: 25 Jan 2026

https://github.com/eyelor/text-to-image-item-generator

A Python workflow for generating random item images using models from Hugging Face.

ai conda cuda flux-schnell generator huggingface item llama python pytorch text-to-image

Last synced: 13 Apr 2026

https://github.com/flosmume/cpp-cuda-streams-and-pinned-mem

A CUDA C++ demo showing how to overlap data transfer and kernel execution using multiple streams and pinned (page-locked) host memory. This project illustrates asynchronous memcpy, event timing, and performance benefits of concurrent GPU execution — essential for building high-throughput pipelines.

asynchronous-execution cuda cuda-streams gpu parallel-programming performance-optimization pinned-memory

Last synced: 13 May 2026

https://github.com/nabilshadman/cuda-4-dummies

Lecture slides and exercise files of the CUDA 4 Dummies course (2025)

cuda gpu-computing high-performance-computing nsight-systems nvidia-gpu parallel-computing

Last synced: 31 Oct 2025

https://github.com/drc0ns0le/rtxvideoprocessor

CLI tool to apply NVIDIA RTX VSR and TrueHDR processing to video files

cuda ffmpeg hdr nvidia rtx upscale

Last synced: 20 Apr 2026

https://github.com/illagrenan/cuda-80-cudnn6-runtime-1604-py36

Ubuntu 16.04 with Python 3.6 and CUDA Dockerfile

cuda dockerfile ubuntu

Last synced: 22 Jun 2025

https://github.com/danieljvickers/fluid_simulation

An educational example for learning the Navier-Stoke equations. Also included is a C++ and CUDA shared object library, buildable with CMake, for use in your personal projects.

cpp cuda differential-equations navier-stokes numpy physics python simulation

Last synced: 04 May 2026

https://github.com/shineiarakawa/particle-stabilizer

A C++ and CUDA-based program for simulating the motion of particles.

cpp cuda n-body particles

Last synced: 12 May 2026

https://github.com/efecaliskannn/pneumonia-detection-with-cnn--vgg16--and-resnet50-deep-learning-models

In this project, pneumonia detection using deep learning, a subset of artificial intelligence, is aimed. The performance of deep learning algorithms, including CNN, VGG16, and ResNet50 models, in detecting pneumonia has been examined.(Bu projede yapay zekanın alt kümesi olan derin öğrenme ile zatürre tespiti amaçlanmaktadır.)

artificial-intelligence convolutional-neural-networks cuda deep-learning keras-tensorflow nvidia-cuda pyhton transfer-learning

Last synced: 13 Jun 2025

https://github.com/lttofu/cosmic

Fast, lightweight GUI-based C++ Ethereum ERC918 token miner for Win64 | CUDA GPUs | CPUs | Pool | Solo Mining

0xbitcoin 0xbtc cplusplus cplusplus-cli cpuminer cuda erc20 erc918 ethereum ethereum-token gpuminer gui pool-mining solo-mining windows windows-10 windows-7 windows-gui winforms

Last synced: 08 Apr 2026

https://github.com/oaslananka/cv_cuda_cpp_sample

This is a sample project demonstrating how to use OpenCV and CUDA in C++ for detecting people in drone footage with YOLO. The project aims to be simple and understandable for those who want to learn how to use OpenCV and CUDA in C++.

computervision cpp cuda opencv

Last synced: 01 May 2026

https://github.com/awikramanayake/optimized-matrix-mult

Optimizing matrix multiplication using parallelism and SIMD (AVX2, CUDA)

avx2 cuda matrix-multiplication

Last synced: 22 May 2026

https://github.com/sergiomarquezdev/yt-transcriber

🛠️ CLI tool to transcribe YouTube videos using OpenAI Whisper with CUDA acceleration, generate AI summaries (EN/ES) with Gemini, and create LinkedIn/Twitter content. Supports YouTube, Google Drive, and local files.

ai cli cuda gemini python transcription whisper youtube

Last synced: 15 May 2026

https://github.com/juliankarrer/reyn

CUDA-based Implementation of Smoothed Particle Hydrodynamics for Fluid Simulation

cuda fluid lagrangian simulation sph

Last synced: 31 Oct 2025

https://github.com/mrgkanev/tensorflow-gpu-docker-setup

A Docker environment for TensorFlow GPU development with optimized configurations for WSL2, troubleshooting guides, and common error fixes

cuda cuda-toolkit deep-learning dev-environment development-tools docker gpu-acceleration machine-learning nvidia-docker nvidia-docker-support python tensorflow

Last synced: 13 Apr 2026

https://github.com/hrshl212/custom-cuda-kernels-with-neural-network-implementation

The repository contains custom CUDA kernels for linear layer, softmax and relu which are integrated with python to develop a Neural Network

cuda neural-network python pytorch

Last synced: 08 May 2026

https://github.com/parxd/cuda-optim

optimizing CUDA kernels

cuda machine-learning

Last synced: 26 Mar 2025

https://github.com/separatrixxx/pgp_labs_7_sem

👓 Laboratory work for the 7 semester of MAI on PGP and PDP

cpp cuda nvidia

Last synced: 15 May 2026

https://github.com/lord-turmoil/cudacmakedemo

A demo for building CUDA program with CMake

cuda tutorial

Last synced: 16 Mar 2025

https://github.com/delusionary/histoptimizer

Solves a minimum variance cost of the partition problem.

cuda numba python

Last synced: 14 Jan 2026

https://github.com/dgcnz/nvtx-vscode

Create NVIDIA NVTX ranges directly in VS Code, then profile with Nsight Systems without modifying source code.

cuda nvtx pytorch vscode

Last synced: 13 Apr 2026

https://github.com/ran-2012/cuda-practice

cuda practice code for nvidia programming guide

cuda

Last synced: 27 Feb 2025

https://github.com/myselfaryan/attention-mechanism

Accelerating Scaled Dot-Product Attention using OpenMP and CUDA

cuda openmp

Last synced: 27 Apr 2026

https://github.com/avicted/hip_fm_synthesis

This project demonstrates FM Synthesis (Frequency Modulation) using HIP (Heterogeneous Compute Interface), enabling high-performance sound generation on both AMD and NVIDIA GPUs.

amd audio-processing cuda fm-synthesis hip nvidia rocm

Last synced: 16 Mar 2025

https://github.com/nel-s/vein-cracker

Recovers which internal generator states could have generated a provided set of Minecraft Java b1.6-1.12.2 veins. Those can then be used to recover 3/4ths of any worldseeds that could have generated them.

cuda minecraft seedcracking veins

Last synced: 16 Mar 2025

https://github.com/curiousci/wind

Multicore Systems Programming project

cuda mpi openmp pthreads

Last synced: 25 Dec 2025

https://github.com/baro-00/cpp-cuda-lab

Experimental C++ projects using NVIDIA CUDA for parallel computing. Learning & testing GPU kernels

cpp cuda

Last synced: 04 May 2026

https://github.com/cripterhack/business-address-scrapper

Python+Scrapy - Distributed scraping system with cache for business information extraction.

cuda ollama postgresql python redis scraper scraping scrapy tesseract

Last synced: 14 Jun 2025

https://github.com/ludekcizinsky/fast-cg-solver

Implementation of Conjugate Gradient (CG) algorithm for solving sparse linear systems using MPI and CUDA.

conjugate-gradient cuda mpi

Last synced: 17 May 2026

https://github.com/deep-1704/coa_lab_repo_grp01

COA Lab assignments

cuda gpgpu-sim

Last synced: 24 Dec 2025

https://github.com/0x778/gaussian_filter_using_cuda

Implemention of gaussain filter using CUDA

cuda cuda-kernels cuda-programming image-processing

Last synced: 04 May 2026

https://github.com/rainlumostaipei/cuda-qnet-a2c

Qnet and A2C impl in cuda

a2c cuda qnet

Last synced: 26 Jun 2025

https://github.com/rugleb/cuda

A simple example of a program that uses parallel GPU computing on an NVIDIA graphics card using CUDA technology

cuda gpu nvidia

Last synced: 10 Apr 2025

https://github.com/tornikeo/minimal-vscode-cuda-meson

Minimal sample of using VSCode and Meson to build CUDA applications

cuda meson template vscode

Last synced: 08 Sep 2025

https://github.com/zalo/matmul_cuda

A simple learning example for CUDA

cuda

Last synced: 07 Jul 2025

https://github.com/alessiobugetti/histogram-equalization

Implements sequential and parallel histogram equalization in C++ and Python, utilizing CUDA for parallel computation on GPU

cuda gpu-acceleration histogram-equalization parallel-computing pycuda

Last synced: 04 May 2026

https://github.com/lablup/backend.ai-accelerator-cuda

The Backend.AI CUDA Accelerator Plugin

backendai cuda

Last synced: 16 May 2026

https://github.com/tiktokfnf33/rayleigh-taylor-instability-simulation

# CUDA Rayleigh-Taylor Instability SimulationThis repository features a high-performance simulation of the Rayleigh-Taylor instability using CUDA, Python, and C. Explore the implementation and results to understand fluid dynamics in a parallel computing context. 🖥️🚀

c computational-fluid-dynamics cuda euler-method finite-difference gpu-computing hpc numerical-simulation parallel-computing physics-simulation python rayleigh-taylor-instability runge-kutta

Last synced: 04 May 2026