An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/eshibusawa/cupy-cuda

Learn CUDA programming essentials with CuPy, from basic kernels to advanced memory patterns

cooperative-thread-array cub cuda cupy gpu parallel-computing python

Last synced: 15 Jun 2025

https://github.com/haleelrah/Vision-pro-MAX

A Raspberry Pi-based object detection system for assisting visually impaired individuals. This project utilizes YOLO object detection and a Hailo 8L TPU to identify obstacles like manholes, potholes, and bumps, providing real-time audio feedback to aid navigation.

bash computer-vision cuda fine-tuning jupyter-notebook object-detection opencv python pytorch raspberry-pi rpi-camera ssh text-to-speech ultralytics yolo yolov8

Last synced: 30 Dec 2025

https://github.com/erosiv/silt

simple immediate lightweight tensors

cmake cuda simulation tensor

Last synced: 31 Oct 2025

https://github.com/enkerewpo/talaria

AI Voice Assistant for Dialogue and IoT Control Powered by GPT4o

cuda gpt-4 python3 pytorch stt tts

Last synced: 16 Apr 2026

https://github.com/bjornmelin/ml-vision-lab

👁️ Production-grade computer vision implementations. Real-world applications in image processing, object detection, and video analytics with GPU acceleration. 📸

computer-vision cuda deep-learning image-processing object-detection opencv pytorch video-analytics

Last synced: 04 Apr 2026

https://github.com/emmanuelmess/firstcollisiontimesteprarefiedgassimulator

This simulator computes all possible intersections for a very small timestep for a particle model

cpp20 cuda simulator

Last synced: 17 Apr 2026

https://github.com/antonioberna/nn-gpu-logic-gates

Neural Network implementation on GPU using CUDA C++ to learn logic gates operations

cpp cuda gpu logic-gates neural-networks nvidia

Last synced: 01 May 2026

https://github.com/kenmalik/cuda-dr-bcg

CUDA C++ implementation of the DR-BCG algorithm for numerically solving linear systems.

cpp cuda hpc numerical-methods

Last synced: 19 Apr 2026

https://github.com/edcalderin/huggingface_ragflow

This project implements a classic Retrieval-Augmented Generation (RAG) system using HuggingFace models with quantization techniques. The system processes PDF documents, extracts their content, and enables interactive question-answering through a Streamlit web application.

bitsandbytes cuda huggingface huggingface-embeddings langchain langchain-community large-language-models llm nf4 python qdrant quantization rag retrieval-augmented-generation ruff streamlit text-generation

Last synced: 15 Jul 2025

https://github.com/xstupi00/N-Body-CUDA

PCG - Parallel Computations on GPU - Project - N-Body-CUDA

cuda gpu-acceleration gpu-computing nbody-simulation optimization parallel-computing pcg vut vut-fit

Last synced: 11 Mar 2025

https://github.com/ragu-manjegowda/parallel-programming

Assignments and Projects of Udacity's Introduction to Parallel Programming Course

cuda gpu-programming nvidia-cuda nvidia-gpu udacity-parallel-programming

Last synced: 25 May 2026

https://github.com/unknownnuts/meshsdk

Mesh processing library with extra `C/C#/JS/TS/PYTHON` bindings.

cuda dicom electron emscripten mesh modelling pybind11 stl stomatology threejs wasm

Last synced: 10 Apr 2026

https://github.com/aayes89/pyllm

Entrena tu propio LLM desde cero

cpu cuda llm llm-training pip python3

Last synced: 18 May 2026

https://github.com/edisonslightbulbs/viewer

Exploring real-time 3D point cloud rendering using Cuda and openGL

cuda cxx11 opengl pangolin submodule

Last synced: 02 May 2026

https://github.com/matteopolak/stock-predict

Stock prediction with LSTM using TensorFlow and TypeScript.

ai artificial-intelligence cuda lstm machine-learning stock tensorflow typescript

Last synced: 09 May 2026

https://github.com/avarga1/vllm-hb

vLLM-compatible inference runtime in pure Rust. Zero Python. Zero libtorch. CUDA via candle.

candle cuda inference llm openai-api rust tokio vllm

Last synced: 07 Apr 2026

https://github.com/jpodivin/gputomata

Cellular automata running on CUDA capable GPUs

cellular-automata cellular-automaton cuda

Last synced: 07 Nov 2025

https://github.com/debanjan06/spatial-streamio

An optimized, out-of-core asynchronous data streaming pipeline for high-throughput 3D point cloud training loops. Features low-level numpy.memmap zero-copy reads and multi-threaded ring prefetching to eliminate I/O bottlenecks, delivering a 33.33% throughput efficiency gain on PyTorch CUDA workloads.

asynchronous-programming cuda data-engineering deep-learning-pipelines io-optimization memory-mapping point-cloud pytorch

Last synced: 11 Jun 2026

https://github.com/loveboyme/yolov5-tensorrt-accelerator

基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization

cuda dynamic-shapes-cuda-stream fp16 int8 pycuda tensorrt yolov5

Last synced: 29 Mar 2025

https://github.com/ngoma1713/rushirb2001

🤖 Explore advanced AI and machine learning solutions for protein modeling and medical applications, developed by a dedicated data science graduate student.

computer-vision-opencv cuda data-science-portfolio deep-learning generative-ai machine-learning medical-ai protein-modeling published-researcher pytorch quantum-ml rag-chatbot tensorflow

Last synced: 02 May 2026

https://github.com/bjornmelin/llm-gpu-optimization

🚄 Advanced LLM optimization techniques using CUDA. Features efficient attention mechanisms, custom CUDA kernels for transformers, and memory-efficient training strategies. ⚡

cuda deep-learning gpu-acceleration llm-optimization machine-learning memory-optimization parallel-computing transformers

Last synced: 18 Mar 2025

https://github.com/amitkumarj441/deep-learning-on-your-finger

A rich collection of dockerfiles for installing deep learning dependecies on your way :rocket:

cuda cudnn gcp

Last synced: 18 Apr 2026

https://github.com/m-torhan/advent-of-code

🎄 Solutions for the Advent of Code

advent-of-code advent-of-code-2024 cuda

Last synced: 07 Apr 2025

https://github.com/deepschneider/tinygrad-universal

Universal version of Tinygrad with CUDA and OpenCL support

autograd automatic-differentiation cuda pycuda pyopencl tinygrad tinygrad-cuda

Last synced: 06 Mar 2025

https://github.com/akira4o4/cuda-yolo-processing

CUDA YOLO Processing

cuda yolo

Last synced: 12 Jul 2025

https://github.com/kylesayrs/pttp

PyTorch Tensor Profiler with fully-supported memory timelines and events

cuda memory profiling pytorch

Last synced: 07 May 2026

https://github.com/wiktor2718/matrix_flow

Matrix Flow is a simple machine learning library written in Rust and CUDA. It was created as a portfolio project to deepen my understanding of machine learning, GPU programming, and Rust. It provides an API for matrix manipulation and includes specially optimized neural networks.

adam-optimizer benchmarking cuda deep-learning gpu-computing machine-learning matrix-operations neural-networks portfolio-project rust

Last synced: 18 May 2026

https://github.com/ghusta/jcuda-demo

JCUDA demo

cuda java nvidia

Last synced: 14 May 2026

https://github.com/mjun0812/setup-cuda

Set up a specific version of NVIDIA CUDA in GitHub Actions on Linux x86_64, arm64 (Debian and Fedora based distribution) and Windows

action cuda cuda-toolkit github-actions

Last synced: 13 Jan 2026

https://github.com/cppshizoids/cuda

This is my basic lessons of CUDA

cuda cuda-demo cuda-programming

Last synced: 15 Jul 2025

https://github.com/kmock930/texture-image-comparison

This project aims to build a model which classifies the type of an unseen image as accurate as possible, by implementing, evaluating, and comparing amongst 2 different multi-layer perceptron Neural Networks.

computer-vision conda confusion-matrix convolutional-neural-networks cuda image-preprocessing keras keras-tensorflow learning-curve-analysis matplotlib multi-layer-perceptron neural-network pickle-file python3 skimage

Last synced: 12 Apr 2026

https://github.com/lruizap/testcuda

Guide to install and use cuda for programming

cuda cudnn nvidia pytorch

Last synced: 12 May 2026

https://github.com/flavienbwk/tensorflow2-cuda-10.2-docker

Tensorflow 2.3, CUDA 10.2, Docker compatible image

cuda docker python3 tensorflow ubuntu1804

Last synced: 11 Apr 2026

https://github.com/tfogal/gemm-db

For creating a cacheable GEMM cost model.

cuda rust

Last synced: 18 May 2026

https://github.com/demetriantitus/machine-vision---yolov8

This project provides a comprehensive guide to object detection in cluttered environments using YOLOv8. It demonstrates how to identify and classify objects in both still images and video streams

computer-vision cuda dataset image-classification machine-learning nvidia-gpu object-detection surveillance traffic-monitoring video-analysis yolov8

Last synced: 18 May 2026

https://github.com/promptromp/aws-bootstrap-g4dn

fast and easy bootstrapping of AWS EC2 instances for CUDA development. Use as a CLI, as a programmatic SDK, or as an Agent Skill!

aws cuda ec2 jupyter-notebook machine-learning mlops python

Last synced: 21 Feb 2026

https://github.com/williamzhang20/cuda-practice

Exercises in CUDA

cuda n-body-problem

Last synced: 23 Mar 2025

https://github.com/obj-wtf/gan-architecture

APP For training GAN Models on Architecture Plan

architecture building cuda gan pix2pix-tensorflow plan

Last synced: 18 May 2026

https://github.com/moshiba/fmindex

ultra fast parallel FM index generation for DNA reads

cpp cuda fmindex parallel

Last synced: 18 May 2026

https://github.com/sangioai/sph

CUDA and OpenMP versions of SPH (Smoothed Particle Hydrodynamics) serial algorithm.

cuda openmp

Last synced: 27 Apr 2026

https://github.com/sonhm3029/setup-experience

This project for storage my setup experience, error met-and-solve in developing end to end AI, software project

ai computer-vision cuda deep-learning software

Last synced: 10 Jun 2026

https://github.com/jesuscopado/parallel-programming

My solutions for the course Programming Parallel Computers at Aalto University (http://ppc.cs.aalto.fi/). Grade: 5/5

cpp cuda image-segmentation median-filter sorting-algorithms

Last synced: 19 Apr 2026

https://github.com/kis-balazs/cuda-research

CUDA Research & Code. Course-style structured. Inspiration from @Infatoshi.

cuda

Last synced: 14 May 2025

https://github.com/alan-cooney/python-cuda-starter-template

Python CUDA Starter Template

cuda deep-learning

Last synced: 30 Mar 2025

https://github.com/bjornmelin/ai-system-design

🎨 Large-scale AI system architectures and implementations. Features distributed training systems, multi-GPU pipelines, and efficient resource management. 🏗️

architecture cuda distributed-systems engineering gpu-computing production scalability system-design

Last synced: 23 Jul 2025

https://github.com/ivanbgd/cuda_quad_c

Calculates a definite integral by using three different rules. Compares sequential to parallel implementations.

cuda integrals parallel-implementations

Last synced: 28 Mar 2025

https://github.com/rushirg/cuda-matrix-multiplication

Matrix Multiplication on GPGPU in CUDA

cpu cuda gpu parallel-processing

Last synced: 17 May 2026

https://github.com/tomosatop/docker-lammps

Lammps を手軽に使いたかったので、サービスを作りました

cuda lammps wsl-ubuntu

Last synced: 28 Mar 2025

https://github.com/derek-palmer/dvr-scan-file-organizer

DVR-Scan-Organizer is a Dockerized extension for DVR-Scan, designed to process multiple video files and organize output in a structured format.

cuda dvr dvr-scan multimedia opencv opencv-python python video video-processing

Last synced: 01 May 2026

https://github.com/gaaniruddha/mphil-gpu-imager

This repository contains code for project #1 of MPhil: test-version of GPU imager for a single time-step, single-channel and single time-step, multi-channel.

astronomy benchmarks cuda cufft google-sheets gpu-imager imaging-astronomy interferometry radio-astronomy

Last synced: 11 Jun 2026

https://github.com/elymsyr/auv_ws

An open-source simulation and control workspace for an Autonomous Underwater Vehicle (AUV) built on ROS 2 Humble and Gazebo. It features a high-fidelity dynamics model and an advanced AI-based motion controller (FossenNet) that uses a pre-trained LibTorch model to imitate a NL-MPC for real-time, high-performance manoeuvring.

autonomous-vehicles auv control-systems cpp cuda deep-learning gazebo imitation-learning libtorch mpc python robotics ros2 simulation

Last synced: 15 Apr 2026

https://github.com/awaldis/cuda-experiments

A place to explore the capabilities and limits of CUDA parallel processing.

cuda cuda-kernels cuda-programming

Last synced: 27 Aug 2025

https://github.com/puzzlef/vector-max-cuda

Performance of sequential vs CUDA-based vector element max.

basics cuda element experiment max vector

Last synced: 17 May 2026

https://github.com/hit07/ml-dl-torch

This repository contains comprehensive understanding of Machine Leaning, DeepLeaning using Pytorch

computer-vision convolutional-neural-networks cuda neural-networks pytorch

Last synced: 28 Feb 2025

https://github.com/quik-fe/node-nvidia-smi

Node wrapper around nvidia-smi.

cuda gpu nodejs nvidia nvidia-smi typescript

Last synced: 19 Feb 2026

https://github.com/lordofhyphens/gpu-path-delay-coverage

CUDA-based Path Delay Fault Coverage

cpp cuda gpgpu moderngpu

Last synced: 04 May 2026

https://github.com/reuben-sun/pybind-cuda-demo

一个 基于pybind11实现python调用cuda C++接口 的示例

cpp cuda pybind11 python pytorch

Last synced: 07 Apr 2026

https://github.com/usman619/pdc

Parallel and Distributed Computing

cuda distributed-computing distributed-systems nextcloud

Last synced: 11 Apr 2026

https://github.com/miferreiro/cdap-cuda

CUDA exercises for the subject of "Computación Distribuída e de Altas Prestacións" in the Master Degree of Computer Engineering of the University of Vigo in 2020

c cuda scan

Last synced: 17 May 2026

https://github.com/maneeshsit/pcie

Modify run:ai and other FOSS projects code for use with PCIe card-based AI accelerators for both inference and training

cuda cxl cxl-mem distro exo k3s k8s kestra llamacpp llm-d mpi4py mpio onnxoptimizer opentelemetry-ebpf-profiler paxos-cluster pcie photonics-computing runai visualize vllm

Last synced: 24 Aug 2025

https://github.com/timxor/c_code

Some of my C code

c cuda m4 parallel-programming

Last synced: 03 May 2026

https://github.com/mvishiu11/kmeans-clustering

K-Means Clustering with both GPU (CUDA) and CPU implementations

cuda kmeans-clustering

Last synced: 15 Mar 2025

https://github.com/abhiram-kandiyana/cuda-blast-2024

Reimplementation of NCBI BLAST with CUDA backend for faster retrieval

blast cuda gpu-acceleration parallel-processing

Last synced: 15 Mar 2025

https://github.com/rajkamalsah/flow-hpc-shocktrack

GPU-accelerated, fault-tolerant Schlieren/PIV shock tracking with interactive ROI, 1-px edges, and resumable training.

ai-ml computer-vision cuda fluid-dynamics hpc mlsystem opencv piv pytorch schlieren scientific-ml smalldata transformer

Last synced: 03 May 2026

https://github.com/tianzonglin/cloud-control-gui

A tool to compute, visualize, analyse and drag points (high-dimensional data)

cuda interaction-design visualization

Last synced: 25 Apr 2026

https://github.com/sshoecraft/shepherd

An interactive multi-backend LLM runtime with intelligent cache eviction and persistent retrieval-augmented memory.

anthropic cli cpp cuda gemini grok inference kv-cache llama-cpp llm mcp ollama openai openai-server rag smart-evictions tensorrt tool-calling ulimited-context

Last synced: 10 Apr 2026

https://github.com/zhaocc1106/cuxx-programing

一些cuda库的样例,cuda、cublas、cublaslt、cusparse...

cublas cublaslt cuda cusparse

Last synced: 23 Mar 2025

https://github.com/zhaocc1106/cuda-programming

Learning cuda programming

cuda nvidia

Last synced: 23 Mar 2025

https://github.com/versi379/optimized-matrix-multiplication

This project utilizes CUDA and cuBLAS to optimize matrix multiplication, achieving up to a 5x speedup on large matrices by leveraging GPU acceleration. It also improves memory efficiency and reduces data transfer times between CPU and GPU.

cublas cuda cuda-programming hpc matrix-multiplication parallel-computing parallel-programming

Last synced: 17 May 2026

https://github.com/proafxin/cuda-docker

High performance computing Images with pycuda and tensorrt preinstalled

cuda docker dockerfile libcudnn nvidia-tensorrt pycuda python tensorrt

Last synced: 11 Apr 2026

https://github.com/BardiFarsi/ThreadPoolManager

ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.

cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe

Last synced: 15 May 2025

https://github.com/kirubhakaranm/vision-pipeline-cuda

High-performance camera processing pipeline with CUDA GPU acceleration, CPU multithreading, and real-time TCP/IP telemetry monitoring (1,200+ FPS, <1ms latency)

computer-vision cpp17 cuda edge-detection gpu-acceleration image-processing multithreading networking opencv performance-optimization real-time robotics tcp-ip telemetry

Last synced: 12 Apr 2026

https://github.com/camille-004/cusprec

🏁 Sparse signal recovery library written in PyCUDA.

cuda ml python signal-processing sparse-recovery

Last synced: 18 Jan 2026

https://github.com/vladd12/libexecstd

Modern C++ library for using an execution context of computer devices

cpp cpp17 cuda gpu-acceleration gpu-computing

Last synced: 06 May 2026

https://github.com/santiagoenriquega/gpu_projects

Various Python GPU accelerated computations and simulations.

cuda cupy numba opencl pyopencl python

Last synced: 17 May 2026

https://github.com/mxm-tr/docker-darknet-opencv

Accelerated objects detection on streams and files, using a Docker darknet YOLO container

cuda docker docker-compose object-recognition opencv-python python3 yolo

Last synced: 10 Apr 2026

https://github.com/ergus/algorithms

Set of multiple algorithms implemented in multiple paradigms

algorithms cmake concurrency cpp cuda gpgpu inter-language metaprogramming multithreading pthreads stl testing

Last synced: 17 May 2026

https://github.com/ubermorgott/morgottalk

Cross-platform desktop push-to-talk voice transcription. Single binary. GPU accelerated (CUDA/Vulkan/Metal/ROCm/OpenCL). Powered by whisper.cpp.

cuda desktop go gpu speech-to-text svelte transcription voice wails whisper

Last synced: 07 Apr 2026

https://github.com/programmergnome/kutyai

This is a python dog breed recognizer graphical application with 420 breeds and 42000 images.

cuda deep-learning image-classification python3 qt5-gui tensorflow transfer-learning

Last synced: 11 May 2026

https://github.com/drilonaliu/bachelor-thesis

Parallel Programming Fractals

cuda fractals gpu parallel-programming

Last synced: 15 May 2026

https://github.com/gammahazard/locate-anything

Sleek, mobile-friendly web UI for NVIDIA LocateAnything-3B — open-vocabulary object detection & grounding on your own GPU, via one docker compose up.

bounding-boxes computer-vision cuda docker fastapi gpu grounding locate-anything machine-learning nvidia object-detection ocr open-vocabulary-detection react self-hosted tailwindcss typescript vision-language-model web-ui

Last synced: 28 May 2026