An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/qervas/cn_chess_ai

chinese chess(Xiangqi) AI

ai cpp cuda dqn qt6

Last synced: 13 Feb 2026

https://github.com/programmer-rd-ai/digivis

A PyTorch-based deep learning implementation for MNIST digit recognition featuring CNNs, GPU acceleration, experiment tracking, and comprehensive testing capabilities.

cnn computer-vision cuda data-science deep-learning digit-recognition image-classification machine-learning mnist neural-networks python pytorch wandb

Last synced: 10 Jun 2025

https://github.com/brendanbignell/cuda_montecarlooptionpricer

CUDA Monte Carlo Barrier Option Pricing Demo & Jupyer lab ML models

cuda deep-learning ml pytorch quantitative-finance xgboost-regression

Last synced: 19 Apr 2026

https://github.com/exprays/atlas

Atlas is a specialized convolutional neural network designed for satellite image change detection

alembic celery cnn-for-visual-recognition cuda geospatial-visualization python pytorch tensors

Last synced: 28 Feb 2026

https://github.com/han-minhee/sgemm_hip

SGEMM implementations in HIP for NVIDIA / AMD GPUs

cuda gpgpu gpu hip rocm

Last synced: 27 Apr 2026

https://github.com/orgh0/highperformancecnn

Implementation of a High Performance CNN for MNIST dataset

cnn cpp cuda

Last synced: 18 May 2026

https://github.com/rhysdg/whisper-onnx-python

A low-footprint GPU accelerated Speech to Text Python package for the Jetpack 5 era bolstered by an optimized graph

ai chatbot cuda machine-learning onnxruntime speech-to-text whisper

Last synced: 16 Feb 2026

https://github.com/enkerewpo/talaria

AI Voice Assistant for Dialogue and IoT Control Powered by GPT4o

cuda gpt-4 python3 pytorch stt tts

Last synced: 16 Apr 2026

https://github.com/davidalgis/godot_cuda

Demonstration that it is possible to use CUDA directly from Godot engine.

cuda godot modules

Last synced: 03 May 2026

https://github.com/sbstndb/grayscott_k

A simple 3D GrayScott simulation using Kokkos enabling CUDA or OpenMP backend

cuda finite-difference grayscott grid kokkos laplacian openmp simulation visualisation

Last synced: 16 May 2026

https://github.com/jxlarrea/homeassistant-voice-recipes

GPU/CUDA-accelerated voice control stack for Home Assistant. Runs on x86/x64 and ARM64 (including the NVIDIA DGX Spark). 100% Local - No Cloud, No Subscriptions.

arm64 cuda dgx-spark gb10 gpu-acceleration home-assistant local-llm qwen3 speech-to-text text-to-speech voice-assistant x86-64

Last synced: 26 May 2026

https://github.com/manishklach/intent-attention-kernel

Intent-aware attention research prototype that treats long-context inference as structured semantic blocks instead of a flat token stream, proving CPU-first correctness and analytical KV/FLOP savings before GPU kernel implementation.

agentic-ai ai-infrastructure attention block-attention cost-model cuda gpu-kernels inference kernel-research kv-cache llm-inference long-context python pytorch research semantic-attention sparse-attention systems transformers triton

Last synced: 28 May 2026

https://github.com/thomasonzhou/minitorch

rebuilding pytorch: from autograd to convolutions in CUDA

cuda numba numpy

Last synced: 02 Feb 2026

https://github.com/le-ander/msc_bioinfo-experimental_design

Using information theory to inform experimental design with GPU acceleration. Computing group project as part of the MSc in Bioinformatics and Theorectical Systems Biology at Imperial College London 2016/2017.

cuda experimental-design gpu-computing information-theory pycuda systems-biology

Last synced: 26 Apr 2026

https://github.com/tudasc/cusan-tests

A test suite for CUDA-aware MPI race detection

cuda dataracebench-cuda mpi

Last synced: 03 May 2026

https://github.com/Programmer-RD-AI/DetectX

A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.

coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet

Last synced: 04 May 2025

https://github.com/bolner/totally-diffused

Debian/NVIDIA Docker image for AUTOMATIC1111's Stable Diffusion application.

automatic1111 cuda debian docker-image nvidia stable-diffusion xformers

Last synced: 11 Apr 2026

https://github.com/hartorn/docker-python

Repository to build python image, based on ubuntu and CUDA

cuda docker mkl-dnn onednn python3 ubuntu ubuntu1804

Last synced: 05 May 2026

https://github.com/fynv/cudainline

A CUDA interface for Python. A distillation of the engine part of ThrustRTC.

cuda gpu nvrtc pyhton

Last synced: 18 May 2026

https://github.com/sakurabtc888/btc-eth-evm-ltc-trx-collision

针对BTC、ETH(EVM)、LTC、TRX链的私钥、公钥CPU+GPU碰撞工具

btc cuda eth evm ltc trx

Last synced: 04 Jul 2025

https://github.com/ezroot/gacc

GIACC - Generate Images, Art, Code and Conversations

ai codegen cuda huggingface image imagegeneration python rust stablediffusion

Last synced: 06 Apr 2026

https://github.com/naidezhujimo/cuda-rewrite-fast-matrix-multiplication

This repository contains an optimized implementation of matrix multiplication using CUDA. The goal of this project is to provide a high-performance solution for matrix multiplication operations on NVIDIA GPUs.

cuda

Last synced: 26 Mar 2025

https://github.com/stanczakdominik/cuda_poisson

A 2D poisson solver via CUDA

cuda electromagnetism pde

Last synced: 29 Jun 2025

https://github.com/chintak/theano-lasagne-docker

Dockerfile for Lasagne with Cuda support. Look at the branches for relevant Dockerfiles - ``cpu`` and ``gpu``.

caffe cuda docker dockerfile install-script lasagne machine-learning machine-learning-library theano

Last synced: 10 Apr 2025

https://github.com/gunrock/template

Template repository for essentials applications to get you started asap!

cpp cuda essentials gpu graph-algorithms graph-analytics gunrock

Last synced: 15 May 2026

https://github.com/shivendrra/axgrad

lightweight tensor library that contains it's own auto-diff engine like pytorch

autograd cuda pytorch scratch-implementation tinygrad

Last synced: 08 May 2026

https://github.com/gabrielmaialva33/enton

Autonomous AI Robot Assistant — Vision, Voice, and Soul

ai autonomous-agent computer-vision cuda llm python pytorch robot stt tts whisper yolo

Last synced: 01 Apr 2026

https://github.com/torotoki/simple-paged-attention

A simple implementation of PagedAttention purely written in CUDA and C++.

attention cpp cuda llm transformer

Last synced: 18 May 2026

https://github.com/eshibusawa/cupy-cuda

Learn CUDA programming essentials with CuPy, from basic kernels to advanced memory patterns

cooperative-thread-array cub cuda cupy gpu parallel-computing python

Last synced: 15 Jun 2025

https://github.com/kchristin22/ising_model

Implementation of a cellular automaton on GPU using different features of CUDA

cellular-automaton cuda gpu-programming hpc ising-model parallel-computing

Last synced: 15 Mar 2025

https://github.com/sd7campeon/yelp-sentiment-analysis-with-python-bs4-and-llm

A scalable pipeline for automated extraction, preprocessing, and sentiment analysis of Yelp reviews. Uses advanced HTTP requests, HTML parsing, and text normalization (tokenization, stopword removal, lemmatization) to enable precise polarity and subjectivity analysis for consumer insights and business analytics.

beautifulsoup beautifulsoup4 business-analytics cuda data-analysis nlp-machine-learning nltk opinion-mining pandas python python3 requests-library-python sentiment-analysis text-preprocessing textblob torch web-scraping yelp-reviews

Last synced: 06 May 2026

https://github.com/memergamer/cuda-fluid-simulation-with-interactive-visualization

A real-time fluid dynamics simulation implemented in Python using CUDA for GPU acceleration, featuring interactive ASCII visualization and automated movement patterns.

colab-notebook cuda liquid-simulations navier-stokes

Last synced: 18 May 2026

https://github.com/zhangjun/my_notes

Daily stuffs

cuda gpu

Last synced: 17 Apr 2026

https://github.com/pintamonas4575/tfg-diffusion-model-customdataset

Creación en Pytorch de un modelo de difusión para generación incondicional de imágenes con un dataset propio.

attention-mechanism cnn cosine-scheduler cuda custom-dataset ddim deep-learning diffusion-models gpu image-generation pytorch

Last synced: 17 Apr 2026

https://github.com/tortillazhawaii/fishes_cuda

3D boid simulation with GPU.

cuda opengl

Last synced: 04 May 2026

https://github.com/mayukhdeb/patrick

Tiny neural net library written from scratch with cupy :warning: under construction :warning:

cuda deep-learning gpu-computing machine-learning neural-network regression

Last synced: 08 May 2026

https://github.com/rkv0id/automata-vtk

Multi-dimensional Cellular Automata visualization using Python's VTK bindings on top of a CUDA-parallel grid updates.

cellular-automata cuda game-of-life python vtk

Last synced: 19 Apr 2026

https://github.com/satyajitghana/gpu-programming

Contains the contents of GPU Architecture and Programming course done on NPTEL

c cpp cuda cuda-programming gpu-programming nptel nvidia

Last synced: 09 Mar 2026

https://github.com/steleman/openai-triton

Fork of OpenAI's Triton compiler v3.4.0 using LLVM 21.1.0 / 21.1.1 on Fedora 41+

cuda fedora linux llvm mlir mlir-dialect openai rocm triton

Last synced: 08 Apr 2026

https://github.com/bhattbhavesh91/rapids-cudf-cuml-example

Running KNN algorithm much faster on GPU for free using RAPIDS packages like cuML and cuDF

cuda cuml deep-learning nvidia-gpu rapids rapidsai

Last synced: 17 Apr 2026

https://github.com/mcp-tool-shop-org/backpropagate

Headless LLM fine-tuning in 3 lines — smart defaults, VRAM-aware batch sizing, multi-run SLAO, GGUF export for Ollama.

api cuda fine-tuning headless llm lora machine-learning ollama python qlora training unsloth web-security windows

Last synced: 31 May 2026

https://github.com/bjornmelin/ml-vision-lab

👁️ Production-grade computer vision implementations. Real-world applications in image processing, object detection, and video analytics with GPU acceleration. 📸

computer-vision cuda deep-learning image-processing object-detection opencv pytorch video-analytics

Last synced: 04 Apr 2026

https://github.com/emmanuelmess/firstcollisiontimesteprarefiedgassimulator

This simulator computes all possible intersections for a very small timestep for a particle model

cpp20 cuda simulator

Last synced: 17 Apr 2026

https://github.com/ergonomech/comfyui-windows-installer

Automated setup for ComfyUI on Windows with CUDA, custom plugins, and optimized PyTorch settings. Made to Run as Server and Error Correct,. Easy installation and launch using Miniconda.

automation comfy conda conda-environment cuda hosting-deployment setup windows

Last synced: 31 Mar 2025

https://github.com/m15kh/cuda_programming

CUDA programming enables parallel computing on NVIDIA GPUs for high-performance tasks like deep learning and scientific computing

cuda cuda-programming gpu nvidia parallel-computing practice-programming

Last synced: 03 Apr 2025

https://github.com/umer-farooq-cs/canny-edge-detector

High-performance Canny edge detector with CPU and CUDA implementations. Loads PGM images, performs Gaussian smoothing, gradients, non-max suppression, and hysteresis. Benchmarks both paths, outputs edge maps, and reports speedup. Simple Makefile, sample images included.

c canny-edge-detection computer-vision cpp cuda gpu high-performance-computing image-processing nvcc pgm

Last synced: 18 Apr 2026

https://github.com/bjornmelin/deep-learning-evolution

🧠 Deep-Learning Evolution: Unified collection of TensorFlow & PyTorch projects, featuring custom CUDA kernels, distributed training, memory‑efficient methods, and production‑ready pipelines. Showcases advanced GPU optimizations, from foundational models to cutting‑edge architectures. 🚀

ai-research cuda data-science deep-learning distributed-training gan gpu-acceleration machine-learning model-optimization neural-networks python pytorch tensorflow training-pipeline transformers

Last synced: 09 May 2026

https://github.com/vietdoo/seam-carving-cuda

CUDA Seam Carving: Accelerating Image Resizing with GPU Computing

cc cuda cuda-programming gpu-computing parrallel-computing seam-carving

Last synced: 02 May 2026

https://github.com/yooodleee/hello-cuda

👽Nice to meet you, CUDA!👽

c cc cuda gpgpu multiprocessing

Last synced: 09 Apr 2026

https://github.com/sleeepyjack/multisplit

Simple multisplit for CUDA accelerators

cpp cuda gpu nvidia parallel-programming primitive split

Last synced: 20 May 2026

https://github.com/hyunjinno/multicore_computing

A repository of multicore programming in Java and C.

c cpp cuda java multithreading openmp thread thrust

Last synced: 18 Apr 2026

https://github.com/wallneradam/docker-ccminer

CCMiner (tpruvot version) Docker Builder

ccminer cuda docker gpu litecoin miner monero nvidia nvidia-docker

Last synced: 18 Apr 2026

https://github.com/straightchlorine/quantum-pipeline

A Python module for executing and monitoring quantum algorithms across local simulators and IBM Quantum platforms. Seamlessly handles data collection, organization, and streaming to Apache Kafka

apache-kafka apache-spark aws-s3 cuda docker gpu-acceleration ibm-cloud ibm-quantum minio qiskit qiskit-aer qiskit-nature quantum-computing visualizations vqe

Last synced: 08 Oct 2025

https://github.com/steleman/llvm-21.1.8

LLVM 21.1.8 on Fedora 41/43 and some additions for Torch-MLIR, ONNX-MLIR and IREE

clang cuda fedora fedora-41 fedora-43 iree llvm mlir onnx-mlir torch-mlir

Last synced: 07 Apr 2026

https://github.com/mala13f/statistical-learning-in-finance

This Repository contains all the codes, papers and related data for assignments done during the course.

cuda gpu-acceleration jupyter-notebook machine-learning python statistical-learning

Last synced: 12 Apr 2026

https://github.com/senli1073/docker-gpu-monitor

A lightweight GPU monitor designed for real-time web-based viewing of GPU server status.

container cuda docker flask gpu gpu-monitoring linux memory-usage nvidia-smi web

Last synced: 05 Apr 2026

https://github.com/makischristou/mandelbrot

Mandelbrot set visualizer using CUDA.

cpp cuda gpu mandelbrot nvidia renderer rust

Last synced: 09 Apr 2026

https://github.com/5had3z/torch-discounted-cumsum-nd

PyTorch Discounted Cumsum with Autograd (CPU + CUDA)

cuda machine-learning pytorch

Last synced: 18 Apr 2026

https://github.com/sohhamseal/scalable-systems-programs

A little less effort to learn parallel programming...

cuda mpi openmp

Last synced: 18 Apr 2026

https://github.com/franciscoda/psvm

R package and C++ library that allows training SVM models in a GPU using CUDA and predicting out-of-sample data. A support vector machine (SVM) is a type of machine learning model that is trained using supervised data to classify samples.

cpp cpp17 cuda machine-learning r svm-classifier svm-training

Last synced: 18 Apr 2026

https://github.com/rajarsheya/real-time-audio-feature-extraction-with-cuda-for-speech-recognition

This project accelerates MFCC extraction using CUDA for real-time speech recognition. Offloading the process to the GPU reduces latency and speeds up processing, enabling fast, local speech-to-text transcription for applications like virtual assistants, without cloud reliance.

audio-processing cpp cuda fourier-transform python

Last synced: 10 May 2026

https://github.com/himeyama/cuda-nmf

NMF calculations are performed on NVIDIA GPUs using the Cuda API. (GEM released)

cublas cuda gem nmf ruby

Last synced: 13 Apr 2026

https://github.com/ayoussf/triton-hub

A container of various PyTorch neural network modules written in Triton.

cuda deep-learning openai pytorch triton triton-lang

Last synced: 14 Apr 2025

https://github.com/abhisheknair10/occupancy.nn

An multi-step pipeline to train and inference Occupancy Networks

3d-reconstruction cuda vision

Last synced: 20 Jul 2025

https://github.com/adamczykpiotr/cudamatrixlibrary

Matrix operation library using single, n-threads or CUDA supported GPU

agh agh-ust cpp cuda cuda-library matrix matrix-computations matrix-functions matrix-multiplication

Last synced: 19 Apr 2026

https://github.com/programmer-rd-ai/object-detection-framework

A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.

coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet

Last synced: 24 Sep 2025

https://github.com/dotblueshoes/robertscross

The Roberts cross operator is used in image processing and computer vision for edge detection.

cuda edge-detection image-processing

Last synced: 30 Mar 2025

https://github.com/inventwithdean/cuda_mlp

Implementation of a simple Multilayer Perceptron in pure CUDA

cuda cuda-programming deep-learning neural-networks

Last synced: 30 Mar 2025

https://github.com/quantum-integrated-technologies/deepforge

DeepForge : framework for working with machine learning.

ai artificial-intelligence cuda library machine-learning ml neural-network

Last synced: 31 Jul 2025

https://github.com/mhaseeb123/gcb

GCB includes a suite of benchmarks and basic tests for CUDA-aware MPI and C++ compilers.

cpp cpp23 cuda mpi partitioned-communication st-mpi

Last synced: 17 May 2026

https://github.com/varun-1703/eu-act-navigator-rag-qabot

An interactive, privacy-first application for querying the European Union’s AI Act using a local Retrieval-Augmented Generation (RAG) pipeline. Combines semantic search (FAISS) and a quantized TinyLlama LLM for fast, accurate, and context-aware answers—all running on your own hardware.

cuda faiss hugging-face-transformers langchain legal-tech local-slm machine-learning nlp open-source privacy rag-chatbot sentence-transformers streamlit tinyllama

Last synced: 03 May 2026

https://github.com/fandreuz/parallel-programming-for-hpc

Scientific codes in C/C++ with CUDA, OpenACC, FFTW, (cu)BLAS

cpp cuda hpc mpi

Last synced: 20 Apr 2026

https://github.com/ssoehdata/cuda_fortran_sci_eng

Working through examples from the Cuda Fortran for Scientists and Engineers 2nd Edition Book

cuda cuda-fortran fortran hpc nvfortran

Last synced: 21 Aug 2025

https://github.com/brosnanyuen/raybnn_graph

Graph Manipulation Library For GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI

cuda gpu graph graph-algorithms neural-network neural-networks opencl raybnn rust

Last synced: 06 Feb 2026

https://github.com/jakubriegel/game_of_life_3d

3D game of life implemented in CUDA

concurency cuda gameoflife nvidia put-poznan

Last synced: 21 Apr 2026

https://github.com/patrickm663/localglmnet.jl

This is a WIP implementation of Richman & Wüthrich (2022) using Julia's Flux.jl + CUDA.jl

cuda deep-learning flux julia neural-networks symbolic-regression xai

Last synced: 22 Apr 2026

https://github.com/duskvirkus/ofxarrayfire

An openFrameworks addon with pre-compiled binaries of ArrayFire.

arrayfire cuda ofxaddon openframeworks openframeworks-addon

Last synced: 09 May 2026

https://github.com/katpercent/raytracing

A foundation for ray tracing using CUDA and parallel computing techniques.

3d cuda engine game parrallel-computing ray raytracing

Last synced: 01 Nov 2025

https://github.com/bokutotu/cudnn_graph_api_example

cudnn graph api example

cuda cudnn cudnn-v8

Last synced: 04 May 2026

https://github.com/hariprashad-ravikumar/accelerated-computing-in-cuda-c

This repo contains my codes for problem sets in NVIDIA Getting Started with Accelerated Computing in CUDA C/C++

c cuda cuda-kernels cuda-toolkit

Last synced: 24 Apr 2026

https://github.com/sarah627/horus_eye_fcih_graduation_project

An AI-powered tourism website using YOLOv7 for real-time landmark detection in images. Built with Flask, PyTorch, and Roboflow for seamless tourist interaction.

computer-vision cuda flask jupyter-notebook kaggle matplotlib object-detection opencv python pytorch roboflow

Last synced: 14 Apr 2026

https://github.com/tensorbfs/cutropicalgemm.jl

The fastest Tropical number matrix multiplication on GPU

cuda gemm tropical-algebra

Last synced: 20 Jan 2026

https://github.com/alegau03/parallel-k-means

Implementation of C programs for the K-Means algorithm for parallel computing.

c c-programming cuda parallel parallel-programming

Last synced: 24 Apr 2026

https://github.com/fblupi/grado_informatica-ppr

Prácticas de la asignatura Programación Paralela de la UGR

cuda mpi openmp parallel-computing

Last synced: 22 Apr 2026

https://github.com/rajarsheya/real-time-traffic-analysis-with-cuda-object-detection

Implemented CUDA-accelerated object detection (YOLO) to analyze a sample image dataset. Performed vehicle counting and simulated speed estimation to demonstrate real-time traffic analysis capabilities.

cpp cuda opencv python yolo

Last synced: 12 Apr 2026