An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/kirubhakaranm/vision-pipeline-cuda

High-performance camera processing pipeline with CUDA GPU acceleration, CPU multithreading, and real-time TCP/IP telemetry monitoring (1,200+ FPS, <1ms latency)

computer-vision cpp17 cuda edge-detection gpu-acceleration image-processing multithreading networking opencv performance-optimization real-time robotics tcp-ip telemetry

Last synced: 12 Apr 2026

https://github.com/sangioai/sph

CUDA and OpenMP versions of SPH (Smoothed Particle Hydrodynamics) serial algorithm.

cuda openmp

Last synced: 27 Apr 2026

https://github.com/pintamonas4575/rlgan-project-maadm-upm

Neuroevolution to learn the Lunar Lander from Gymnasium and a GAN to learn to color images. Subject from the ML and BD master´s degree of UPM.

cifar10 cuda dcgan deep-learning flappy-bird gan genetic-algorithm lunar-lander machine-learning mlp python3 pytorch reinforcement-learning tensorflow wgan-gp

Last synced: 12 Apr 2026

https://github.com/lruizap/testcuda

Guide to install and use cuda for programming

cuda cudnn nvidia pytorch

Last synced: 12 May 2026

https://github.com/marcorentap/kokkos-docker-cluster

Deploy Docker containers with Kokkos, OpenMP, OpenMPI and CUDA as a Docker swarm.

cuda docker hpc kokkos

Last synced: 10 Mar 2025

https://github.com/amitkumarj441/deep-learning-on-your-finger

A rich collection of dockerfiles for installing deep learning dependecies on your way :rocket:

cuda cudnn gcp

Last synced: 18 Apr 2026

https://github.com/debanjan06/spatial-streamio

An optimized, out-of-core asynchronous data streaming pipeline for high-throughput 3D point cloud training loops. Features low-level numpy.memmap zero-copy reads and multi-threaded ring prefetching to eliminate I/O bottlenecks, delivering a 33.33% throughput efficiency gain on PyTorch CUDA workloads.

asynchronous-programming cuda data-engineering deep-learning-pipelines io-optimization memory-mapping point-cloud pytorch

Last synced: 11 Jun 2026

https://github.com/matteopolak/stock-predict

Stock prediction with LSTM using TensorFlow and TypeScript.

ai artificial-intelligence cuda lstm machine-learning stock tensorflow typescript

Last synced: 09 May 2026

https://github.com/boohohoo/shamining

Shamining is a cloud mining service that allows users to mine cryptocurrencies without the need for personal hardware. By renting computing power from eco-friendly data centers, users can mine efficiently. The platform offers easy-to-use interface, flexible contracts, and daily payouts.

cryptocurrency cryptomining cuda gpu-mining mining mining-software open-source opencl

Last synced: 04 Jul 2025

https://github.com/xstupi00/N-Body-CUDA

PCG - Parallel Computations on GPU - Project - N-Body-CUDA

cuda gpu-acceleration gpu-computing nbody-simulation optimization parallel-computing pcg vut vut-fit

Last synced: 11 Mar 2025

https://github.com/prdai/mnist-digit-recognition

A PyTorch-based deep learning implementation for MNIST digit recognition featuring CNNs, GPU acceleration, experiment tracking, and comprehensive testing capabilities.

cnn computer-vision cuda data-science deep-learning digit-recognition image-classification machine-learning mnist neural-networks python pytorch wandb

Last synced: 12 Apr 2026

https://github.com/occisor2/fluidsimulation

Second project of my parallel algorithms course

cuda high-performance-computing

Last synced: 28 Feb 2025

https://github.com/brendanm12345/simple_renderer_cs149

Simple CUDA renderer implementation. 19th most efficient out of 150+ submissions

cpp cuda

Last synced: 18 May 2026

https://github.com/rajshrestha86/kmeans-clusterize-cuda

Implementation of K-Means algorithm from scratch using CUDA.

c cuda kmeans-clustering

Last synced: 18 May 2026

https://github.com/boned-fruitwood759/whisperx-asr-with-fastapi

🎤 Enable real-time speech recognition with WhisperX using FastAPI for efficient, scalable audio processing.

asr ctranslate2 cuda fastapi openai python speech-recognition torch transformers whisper whisperx

Last synced: 12 Apr 2026

https://github.com/amruthapatil/nyu-cudaconvolution

Implementing convolution operations on an image using CUDA, exploiting different methodologies - basic, tiled, and cuDNN

cuda high-performance

Last synced: 13 Mar 2025

https://github.com/jiriklepl/bits-knn-jpdc2024

Replication package for the paper Towards Optimal GPU-accelerated K-Nearest Neighbors Search

bitonic-sort cuda gpu k-nearest-neighbors knn-search top-k

Last synced: 21 Mar 2025

https://github.com/edcalderin/huggingface_ragflow

This project implements a classic Retrieval-Augmented Generation (RAG) system using HuggingFace models with quantization techniques. The system processes PDF documents, extracts their content, and enables interactive question-answering through a Streamlit web application.

bitsandbytes cuda huggingface huggingface-embeddings langchain langchain-community large-language-models llm nf4 python qdrant quantization rag retrieval-augmented-generation ruff streamlit text-generation

Last synced: 15 Jul 2025

https://github.com/fmigneault/dockers

Collection of docker setup with common libraries for image processing and machine learning.

boost cuda docker image-processing opencv python

Last synced: 12 Apr 2026

https://github.com/aayes89/pyllm

Entrena tu propio LLM desde cero

cpu cuda llm llm-training pip python3

Last synced: 18 May 2026

https://github.com/edisonslightbulbs/viewer

Exploring real-time 3D point cloud rendering using Cuda and openGL

cuda cxx11 opengl pangolin submodule

Last synced: 02 May 2026

https://github.com/avarga1/vllm-hb

vLLM-compatible inference runtime in pure Rust. Zero Python. Zero libtorch. CUDA via candle.

candle cuda inference llm openai-api rust tokio vllm

Last synced: 07 Apr 2026

https://github.com/loveboyme/yolov5-tensorrt-accelerator

基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization

cuda dynamic-shapes-cuda-stream fp16 int8 pycuda tensorrt yolov5

Last synced: 29 Mar 2025

https://github.com/emanuelemessina/gigacheck

ABFT Matrix Multiplication of any size in CUDA

abft cuda matrix-multiplication

Last synced: 28 Feb 2025

https://github.com/akira4o4/cuda-yolo-processing

CUDA YOLO Processing

cuda yolo

Last synced: 12 Jul 2025

https://github.com/karusb/2dca-cuda

2 Dimensional Cellular Automata Visualisation (Game of Life)

algorithm-flowchart cellular-automata cuda game game-of-life glut visual-studio

Last synced: 12 Apr 2026

https://github.com/wiktor2718/matrix_flow

Matrix Flow is a simple machine learning library written in Rust and CUDA. It was created as a portfolio project to deepen my understanding of machine learning, GPU programming, and Rust. It provides an API for matrix manipulation and includes specially optimized neural networks.

adam-optimizer benchmarking cuda deep-learning gpu-computing machine-learning matrix-operations neural-networks portfolio-project rust

Last synced: 18 May 2026

https://github.com/cppshizoids/cuda

This is my basic lessons of CUDA

cuda cuda-demo cuda-programming

Last synced: 15 Jul 2025

https://github.com/bjornmelin/cuda-core-projects

🎯 Essential CUDA programming patterns and optimizations. Showcasing parallel computing expertise through matrix operations, memory management, and advanced kernel implementations. 💻

cpp cuda cuda-kernels gpu-computing high-performance-computing nvidia optimization parallel-computing

Last synced: 12 Apr 2026

https://github.com/baro-00/cpp-cuda-lab

Experimental C++ projects using NVIDIA CUDA for parallel computing. Learning & testing GPU kernels

cpp cuda

Last synced: 04 May 2026

https://github.com/tfogal/gemm-db

For creating a cacheable GEMM cost model.

cuda rust

Last synced: 18 May 2026

https://github.com/demetriantitus/machine-vision---yolov8

This project provides a comprehensive guide to object detection in cluttered environments using YOLOv8. It demonstrates how to identify and classify objects in both still images and video streams

computer-vision cuda dataset image-classification machine-learning nvidia-gpu object-detection surveillance traffic-monitoring video-analysis yolov8

Last synced: 18 May 2026

https://github.com/lionpsiuc/cflow

A computational model for heat propagation in a cylindrical radiator using both CPU and GPU parallel processing. The simulation uses finite difference methods to model the directional flow of heat through a cylindrical pipe system with specific boundary conditions and cyclic connections between pipe segments.

c cuda parallel-programming

Last synced: 29 May 2026

https://github.com/0x778/gaussian_filter_using_cuda

Implemention of gaussain filter using CUDA

cuda cuda-kernels cuda-programming image-processing

Last synced: 04 May 2026

https://github.com/obj-wtf/gan-architecture

APP For training GAN Models on Architecture Plan

architecture building cuda gan pix2pix-tensorflow plan

Last synced: 18 May 2026

https://github.com/moshiba/fmindex

ultra fast parallel FM index generation for DNA reads

cpp cuda fmindex parallel

Last synced: 18 May 2026

https://github.com/0xhilsa/tenop

A lightweight & minimalist tensor computation library with CUDA backend

bash c cuda python3 tensor

Last synced: 13 Apr 2026

https://github.com/timvgl/cuxrft

Performs FFT in xarrays using cuda

cuda cupy fft python xarray

Last synced: 07 Jan 2026

https://github.com/ivanbgd/cuda_quad_c

Calculates a definite integral by using three different rules. Compares sequential to parallel implementations.

cuda integrals parallel-implementations

Last synced: 28 Mar 2025

https://github.com/hrolive/data-analytics-in-the-era-of-large-scale-machine-learning

Slides and other material for the Cyprus NCC training event about "Data analytics in the era of large-scale machine learning".

cuda deep-learning gpu-acceleration gradient-boosting large-language-models machine-learning preprocessing python pytorch

Last synced: 13 Apr 2026

https://github.com/rushirg/cuda-matrix-multiplication

Matrix Multiplication on GPGPU in CUDA

cpu cuda gpu parallel-processing

Last synced: 17 May 2026

https://github.com/tomosatop/docker-lammps

Lammps を手軽に使いたかったので、サービスを作りました

cuda lammps wsl-ubuntu

Last synced: 28 Mar 2025

https://github.com/puzzlef/vector-max-cuda

Performance of sequential vs CUDA-based vector element max.

basics cuda element experiment max vector

Last synced: 17 May 2026

https://github.com/reuben-sun/pybind-cuda-demo

一个 基于pybind11实现python调用cuda C++接口 的示例

cpp cuda pybind11 python pytorch

Last synced: 07 Apr 2026

https://github.com/matthewfeickert/report-urssi-fellowship-2025

Report on URSSI 2025 Early-Career Fellowship

cuda pixi urssi

Last synced: 17 Jan 2026

https://github.com/ray-chew/modified_ch

Density functional theory (DFT) and self-consistent field theory (SCFT) simulation of diblock copolymers

cuda density-functional-theory diblock-copolymer numerical-analysis numerical-methods self-consistent-field-theory

Last synced: 11 May 2026

https://github.com/hr-fahim/transformer-model-optimization

Sample GPT Transformer Model from Scratch.

cuda few-shot-learning transfomers

Last synced: 02 May 2026

https://github.com/miferreiro/cdap-cuda

CUDA exercises for the subject of "Computación Distribuída e de Altas Prestacións" in the Master Degree of Computer Engineering of the University of Vigo in 2020

c cuda scan

Last synced: 17 May 2026

https://github.com/alessiobugetti/histogram-equalization

Implements sequential and parallel histogram equalization in C++ and Python, utilizing CUDA for parallel computation on GPU

cuda gpu-acceleration histogram-equalization parallel-computing pycuda

Last synced: 04 May 2026

https://github.com/xza85hrf/flux_pipeline

FluxPipeline is a prototype experimental project that provides a framework for working with the FLUX.1-schnell image generation model. This project is intended for educational and experimental purposes only.

ai cuda docker educational experimental flux1 flux1-schnell flux1ai gradio image-generation model non-commercial python pytorch research transformer-model

Last synced: 05 Jul 2025

https://github.com/doxakis/cosinesimilaritydistancesongpu

Compute cosine similarity distances for all combinations of the dataset on the gpu with CUDA

cuda

Last synced: 13 Apr 2026

https://github.com/eyelor/text-to-image-item-generator

A Python workflow for generating random item images using models from Hugging Face.

ai conda cuda flux-schnell generator huggingface item llama python pytorch text-to-image

Last synced: 13 Apr 2026

https://github.com/tianzonglin/cloud-control-gui

A tool to compute, visualize, analyse and drag points (high-dimensional data)

cuda interaction-design visualization

Last synced: 25 Apr 2026

https://github.com/versi379/optimized-matrix-multiplication

This project utilizes CUDA and cuBLAS to optimize matrix multiplication, achieving up to a 5x speedup on large matrices by leveraging GPU acceleration. It also improves memory efficiency and reduces data transfer times between CPU and GPU.

cublas cuda cuda-programming hpc matrix-multiplication parallel-computing parallel-programming

Last synced: 17 May 2026

https://github.com/santiagoenriquega/gpu_projects

Various Python GPU accelerated computations and simulations.

cuda cupy numba opencl pyopencl python

Last synced: 17 May 2026

https://github.com/ergus/algorithms

Set of multiple algorithms implemented in multiple paradigms

algorithms cmake concurrency cpp cuda gpgpu inter-language metaprogramming multithreading pthreads stl testing

Last synced: 17 May 2026

https://github.com/ubermorgott/morgottalk

Cross-platform desktop push-to-talk voice transcription. Single binary. GPU accelerated (CUDA/Vulkan/Metal/ROCm/OpenCL). Powered by whisper.cpp.

cuda desktop go gpu speech-to-text svelte transcription voice wails whisper

Last synced: 07 Apr 2026

https://github.com/efecaliskannn/pneumonia-detection-with-cnn--vgg16--and-resnet50-deep-learning-models

In this project, pneumonia detection using deep learning, a subset of artificial intelligence, is aimed. The performance of deep learning algorithms, including CNN, VGG16, and ResNet50 models, in detecting pneumonia has been examined.(Bu projede yapay zekanın alt kümesi olan derin öğrenme ile zatürre tespiti amaçlanmaktadır.)

artificial-intelligence convolutional-neural-networks cuda deep-learning keras-tensorflow nvidia-cuda pyhton transfer-learning

Last synced: 13 Jun 2025

https://github.com/programmergnome/kutyai

This is a python dog breed recognizer graphical application with 420 breeds and 42000 images.

cuda deep-learning image-classification python3 qt5-gui tensorflow transfer-learning

Last synced: 11 May 2026

https://github.com/tiktokfnf33/rayleigh-taylor-instability-simulation

# CUDA Rayleigh-Taylor Instability SimulationThis repository features a high-performance simulation of the Rayleigh-Taylor instability using CUDA, Python, and C. Explore the implementation and results to understand fluid dynamics in a parallel computing context. 🖥️🚀

c computational-fluid-dynamics cuda euler-method finite-difference gpu-computing hpc numerical-simulation parallel-computing physics-simulation python rayleigh-taylor-instability runge-kutta

Last synced: 04 May 2026

https://github.com/drilonaliu/bachelor-thesis

Parallel Programming Fractals

cuda fractals gpu parallel-programming

Last synced: 15 May 2026

https://github.com/mrgkanev/tensorflow-gpu-docker-setup

A Docker environment for TensorFlow GPU development with optimized configurations for WSL2, troubleshooting guides, and common error fixes

cuda cuda-toolkit deep-learning dev-environment development-tools docker gpu-acceleration machine-learning nvidia-docker nvidia-docker-support python tensorflow

Last synced: 13 Apr 2026

https://github.com/hrshl212/custom-cuda-kernels-with-neural-network-implementation

The repository contains custom CUDA kernels for linear layer, softmax and relu which are integrated with python to develop a Neural Network

cuda neural-network python pytorch

Last synced: 08 May 2026

https://github.com/phantom7knight/cuda-fusion

This project is for learning CUDA to understand the GPU work better.

cuda cuda-programming gpgpu gpu

Last synced: 17 May 2026

https://github.com/parxd/cuda-optim

optimizing CUDA kernels

cuda machine-learning

Last synced: 26 Mar 2025

https://github.com/lord-turmoil/cudacmakedemo

A demo for building CUDA program with CMake

cuda tutorial

Last synced: 16 Mar 2025

https://github.com/delusionary/histoptimizer

Solves a minimum variance cost of the partition problem.

cuda numba python

Last synced: 14 Jan 2026

https://github.com/dgcnz/nvtx-vscode

Create NVIDIA NVTX ranges directly in VS Code, then profile with Nsight Systems without modifying source code.

cuda nvtx pytorch vscode

Last synced: 13 Apr 2026

https://github.com/ran-2012/cuda-practice

cuda practice code for nvidia programming guide

cuda

Last synced: 27 Feb 2025

https://github.com/avicted/hip_fm_synthesis

This project demonstrates FM Synthesis (Frequency Modulation) using HIP (Heterogeneous Compute Interface), enabling high-performance sound generation on both AMD and NVIDIA GPUs.

amd audio-processing cuda fm-synthesis hip nvidia rocm

Last synced: 16 Mar 2025

https://github.com/nel-s/vein-cracker

Recovers which internal generator states could have generated a provided set of Minecraft Java b1.6-1.12.2 veins. Those can then be used to recover 3/4ths of any worldseeds that could have generated them.

cuda minecraft seedcracking veins

Last synced: 16 Mar 2025

https://github.com/chensongpoixs/cmedia_transcode

媒体服务转码版本GPU(cuda) 支持H264与H265转码

cuda gpu h264 h265 media transcode-media

Last synced: 19 May 2026

https://github.com/cripterhack/business-address-scrapper

Python+Scrapy - Distributed scraping system with cache for business information extraction.

cuda ollama postgresql python redis scraper scraping scrapy tesseract

Last synced: 14 Jun 2025

https://github.com/kratugautam99/logiclink-project

LogicLink is a conversational AI chatbot developed by Kratu Gautam (AIML Engineer). Powered by the TinyLlama-1.1B-Chat-v1.0 model, it provides an interactive interface for engaging conversations, query resolution, and task assistance. Version 5 features streaming responses, conversation management, and a sleek GUI.

antd-design chatbot-application conversational-ai cuda gradio graphical-user-interface huggingface-spaces huggingface-transformers jupyter-notebooks keras large-language-models mlops model-service-controller modelscope-studio natural-language-generation natural-language-processing pytorch reasoning-agent tensorflow

Last synced: 07 Apr 2026

https://github.com/deep-1704/coa_lab_repo_grp01

COA Lab assignments

cuda gpgpu-sim

Last synced: 24 Dec 2025

https://github.com/zyn10/cuda_code

cude practice

cuda cuda-programming

Last synced: 22 Jun 2025

https://github.com/kar-dim/cas-2d

Implementation of the AMD FidelityFX CAS (Contrast Adaptive Sharpening) algorithm on CUDA/OpenCL, for sharpening static images.

cpp cuda dll fidelityfx gpu image-processing parallel-computing sharpen

Last synced: 22 Jun 2025

https://github.com/rugleb/cuda

A simple example of a program that uses parallel GPU computing on an NVIDIA graphics card using CUDA technology

cuda gpu nvidia

Last synced: 10 Apr 2025

https://github.com/kanchishimono/python-images

Ubuntu based Python container images, including CUDA images

container-image cuda docker dockerfile machine-learning python python3

Last synced: 30 Apr 2026

https://github.com/rkarahul/person-detector-faceverifier

Person-Detector-FaceVerifier is a sophisticated system for detecting and verifying faces in images. Ideal for applications like passport control and security, it combines advanced face detection with precise verification techniques.

bootstrap5 css3 cuda django html5 javascipt opencv-python os python pytorch yolov8

Last synced: 07 Apr 2026

https://github.com/ribin-baby/cuda_cudnn_installation_on_ubuntu20.04

Installation of CUDA-11.8 with cuDNN-8.7 for ubuntu(20.04) server A30 GPU, and onnx gpu installation guide

cuda gpu linux onnxruntime server

Last synced: 16 May 2026

https://github.com/zalo/matmul_cuda

A simple learning example for CUDA

cuda

Last synced: 07 Jul 2025