An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/trahay/mpi-wattmeter

MPI-Wattmeter measures the power consumption of MPI programs

carbon-emissions cuda energy-consumption energy-monitor gpu hpc mpi

Last synced: 17 May 2026

https://github.com/infotrend-inc/ctpo-demo_projects

Jupyter Notebook examples using CTPO as their source container.

cuda opencv pytroch tensorflow2

Last synced: 14 Apr 2026

https://github.com/andreabak/whispersubs

Generate subtitles for your video or audio files using the power of AI

ai cuda deep-learning gpu-acceleration machine-learning srt subtitles transcribe transcription translate whisper

Last synced: 15 Feb 2026

https://github.com/hadv/vaneth

GPU-accelerated CREATE2 vanity address miner for Ethereum

create2-contract-deployment cuda ethereum gpu gpu-acceleration gpu-programming open-cl vanity-address

Last synced: 21 Jan 2026

https://github.com/kishore-narendran/eecs221-highperformancecomputing

Assignments done during the graduate course EECS 221 - Introduction to HPC that I took in the Spring Quarter of 2016 at University of California, Irvine. Involves assignments that use OpenMP, MPI and CUDA.

cuda hpc mpi openmp

Last synced: 17 May 2026

https://github.com/markdtw/parallel-programming

Basic Pthread, OpenMP, CUDA examples

cuda openmp parallel-programming pthreads

Last synced: 20 Apr 2026

https://github.com/B1-663R/docker-mining

Dockerfiles to build docker images to start mining with an NVIDIA Docker architecture

cryptocurrency cuda docker-image docker-nvidia mining

Last synced: 28 Mar 2025

https://github.com/droduit/multiprocessor-architecture

Introduction to Multiprocessor Architecture @ EPFL

cuda multiprocessor multithreading openmp-parallelization

Last synced: 17 Apr 2026

https://github.com/huwzpf/parallel-processing-cpu-and-gpu-env-and-lib-with-powercap

(2024/2025) A library and environment for parallel processing in a power-limited CPU+GPU cluster environment.

c cpu cuda gpu mpi openmp parallel powercap

Last synced: 11 Apr 2025

https://github.com/pelayo-felgueroso/tensorflow-gpu-setup

Step-by-step guide to installing TensorFlow with GPU support on Conda.

artificial-intelligence cuda deep-learning gpu machine-learning nvidia nvidia-gpu setup-guide tensorflow

Last synced: 17 Feb 2026

https://github.com/bogdanminko/laperf

La Perf is a framework for AI performance benchmarking — covering LLMs, VLMs, embeddings, with power-metrics collection.

ai-benchmark ai-performance apple-silicon cuda lmstudio ml-benchmark mlx mps nvidia-gpu ollama open-source-benchmark

Last synced: 15 May 2026

https://github.com/superlinear-ai/scipy-notebook-gpu

jupyter/scipy-notebook with CUDA Toolkit, cuDNN, NCCL, and TensorRT

cuda cudnn docker nccl scipy-notebook tensorflow tensorrt

Last synced: 01 May 2026

https://github.com/peri044/cuda

GPU implementations of algorithms

cuda gauss-jordan parallel-programming

Last synced: 14 Jul 2025

https://github.com/tyler-romero/aegae

Learning Triton / CUDA

cuda triton

Last synced: 11 Apr 2026

https://github.com/maliknaik16/parallel-computing

CUDA programming in C++ for high-performance computing using Nvidia GPUs, optimized for tasks like machine learning, or image processing

cores cpp cuda gpu makefile matrix nvcc optimization

Last synced: 10 Jun 2025

https://github.com/programmer-rd-ai/detectx

A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.

coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet

Last synced: 10 Jun 2025

https://github.com/yosh-matsuda/gpu-ptr

Cross-platform GPU smart pointer with C++20 range support

cpp cpp20 cuda gpu header-only hip

Last synced: 17 Jan 2026

https://github.com/xiongsp/pytorch-docker

Pure Pytorch Docker Images. Support almost all combinations of Pytorch, Python, Ubuntu, CentOS, and CUDA. 纯净的Pytorch镜像,支持几乎各种Pytorch、Python、Ubuntu、CentOS、CUDA版本的组合。

centos cuda docker docker-image python3 pytorch ubuntu

Last synced: 17 Apr 2026

https://github.com/bdwhst/fluora

A CUDA PBR path tracer

cpp cuda pathtracing pbr rendering

Last synced: 13 Feb 2026

https://github.com/shreyansh26/mlsys-experiments

A collection of scripts on experimenting and implementing MLSys-related stuff

cuda cuda-kernel gpu gpu-programming llm-inference profiling pytorch triton

Last synced: 30 Aug 2025

https://github.com/tvanfossen/entropic

Local-first agentic inference engine in C/C++. Multi-tier model routing, grammar-constrained output, MCP tool servers. Embeddable via C ABI.

agentic-ai agentic-framework cpp cpp20 cuda edge-ai embedded-ai gbnf gguf grammar-constrained-decoding inference-engine llama-cpp llm local-llm mcp on-device-ai privacy-first tool-calling

Last synced: 30 May 2026

https://github.com/kilamper/matrix-multiplication

AC - Matrix multiplication using OpenMP, MPI and CUDA

cuda ms-mpi openmp

Last synced: 16 May 2026

https://github.com/xkevio/cuda-raytracer

A simple ray tracer written with CUDA that saves its output in a .ppm file, CPU version included for reference.

cpu cuda cuda-raytracer gpu

Last synced: 25 Aug 2025

https://github.com/orlandopalmeira/trabalho-cp-2023-2024

Repositório do trabalho prático no âmbito da UC de Computação Paralela (CP) - Mestrado em Engenharia Informática (MEI/MIEI) - Universidade do Minho (UMinho)

computacao-paralela cp cuda cuda-programming mei miei nvidia nvidia-cuda openmp optimization optimization-problem parallelism performance uminho uminho-mei uminho-miei

Last synced: 18 May 2026

https://github.com/wendylabsinc/tensorrt-swift

TensorRT Swift 6.2 Bindings for Linux

cuda nvidia swift tensor tensorrt

Last synced: 01 Feb 2026

https://github.com/slesniew/parallel-processing-cpu-and-gpu-env-and-lib-with-powercap

(2024/2025) A library and environment for parallel processing in a power-limited CPU+GPU cluster environment.

c cpu cuda gpu mpi openmp parallel powercap

Last synced: 30 Mar 2025

https://github.com/debowin/gpu-parallel-recommender-system

GPGPU Parallel User-User Collaborative Filtering System in CUDA C

collaborative-filtering cuda gpu-programming movielens-dataset recommender-system

Last synced: 24 Apr 2026

https://github.com/puzzlef/pagerank-cuda-dynamic

Design of CUDA-based Parallel Dynamic PageRank algorithm for measuring importance.

algorithm cuda gpu graph pagerank static temporal

Last synced: 21 Feb 2026

https://github.com/tank3-tk3/pi-calculation-cpu-gpu

PI calculation with CPU and GPU

c cpp cuda parallel-computing pi

Last synced: 13 Apr 2026

https://github.com/babak2/optimizedsum

Optimized Parallel Sum program demonstrating CPU vs GPU performance

cuda cuda-programming gpu-acceleration gpu-computing gpu-parallelism visual-studio

Last synced: 27 Mar 2025

https://github.com/alpinebuster/arkime-docker-compose

Deploy Arkime with GPU-accelerated Rust/Python parsers and custom plugins using Docker Compose.

arkime c cuda deep-neural-networks docker docker-compose llm machine-learning networking pcap pcapng python rust traffic-analysis

Last synced: 16 Apr 2026

https://github.com/lchsk/ney

A header-only parallel functions library for Intel Xeon/Xeon Phi/GPUs

cuda gpu linux parallel phi scientific xeon xeonphi

Last synced: 07 May 2026

https://github.com/szymon423/tsp-cpu-vs-gpu

Simple brute force approach to solve travelling salesman problem with CPU and GPU

cuda tsp

Last synced: 11 Mar 2025

https://github.com/elftausend/sliced

Array operations with automatic differentiation on CPU and GPU

autograd automatic-differentiation cuda custos matrix opencl

Last synced: 31 Jan 2026

https://github.com/openspeedshop/cbtf-argonavis-gui

Baseline for next generation Open|SpeedShop Graphical User Interface (GUI). The primary focus of this GUI will be the processing and display of CUDA collector performance data. However, there will be refactoring phases to adopt the GUI to support the processing and display of any collector performance data.

cuda performance profiler profiling

Last synced: 18 Apr 2026

https://github.com/kar-dim/watermarking-gpu

Code for my Diploma thesis at Information and Communication Systems Engineering (University of the Aegean, School of Engineering) with title "Efficient implementation of watermark and watermark detection algorithms for image and video using the graphics processing unit". Part 2 / GPU

arrayfire cpp cuda ffmpeg gpu image-processing opencl parallel-computing video-processing watermark-image watermarking

Last synced: 09 Apr 2025

https://github.com/yingding/applyllm

A python package for applying LLM with LangChain and Hugging Face on local CUDA/MPS host

accelerator batch cuda framework inference kubeflow langchain llm mps pipeline slurm transformers

Last synced: 24 Aug 2025

https://github.com/tmrob2/cuda2rust_sandpit

Minimal examples to get CUDA linear algebra programs working with Rust using CC & FFI.

cc clang cublas cuda cusparse rust

Last synced: 14 May 2025

https://github.com/teodutu/asc

Arhitectura Sistemelor de Calcul - UPB 2020

cache-optimization cuda parallel-programming profiling python-threading

Last synced: 24 Apr 2026

https://github.com/dujonwalker/nixos-config-x86_64-cuda

This repository contains my NixOS configuration optimized for 64-bit x86 systems with NVIDIA CUDA support, featuring a Plasma 6 desktop environment and a variety of essential applications for development, multimedia, and productivity. It serves as a backup for easy restoration and setup on new installations.

cuda flatpak nix nixos nixos-configuration ollama

Last synced: 17 Jan 2026

https://github.com/lzyrapx/llm-grandmaster-notes

🎓The path to LLM mastery is paved with broken embeddings and resurrected gradients.

cuda deep-learning llm reinforcement-learning

Last synced: 14 May 2025

https://github.com/csvancea/gpu-hashtable

GPU-backed linear-probing hash table implemented in CUDA. Supports batch operations such as insert and retrieval.

cuda hashtable

Last synced: 24 Apr 2026

https://github.com/bonj4/wiki

This repository contains documentation and installation scripts for various tools and libraries.

cuda pangolin pybind11 sfm tensorrt

Last synced: 17 Jan 2026

https://github.com/agalue/sherpa-voice-assistant

Local AI-based voice assistant implemented using Sherpa, Whisper, Kokoro, and Ollama

coreml cuda golang kokoro-tts linux macos ollama onnx-runtime rust sherpa whisper-ai

Last synced: 04 Apr 2026

https://github.com/pnocera/cembedd

Embeddings rust API serving intfloat/multilingual-e5-large using huggingface/candle with CUDA enabled

bert cuda huggingface

Last synced: 12 Jan 2026

https://github.com/mrglaster/cuda-acfcalc

Calculation of the smallest ACF for signals of length N using CUDA technology.

acf c calculations cpp cuda google-colaboratory google-colaboratory-notebooks isu

Last synced: 06 May 2026

https://github.com/tky823/bitlinear158compression

Compare compression models for inference by BitLinear158

cuda pytorch quantization

Last synced: 12 Jun 2026

https://github.com/dereklstinson/nccl

golang wrapper for nccl

cuda deep-learning go nccl parallel-computing

Last synced: 14 May 2026

https://github.com/pd2871/high-performance-computing

This repo contain the logs of High Performance Computing module's final Assignment

blurred-images c cuda gaussian-blur matrix-multiplication multi-threading parallel-computing pthreads pthreads-api

Last synced: 10 May 2026

https://github.com/tank3-tk3/parallel-processing-cuda

Parallel processing with CUDA C / C++

c cpp cuda parallel-computing parallel-programming

Last synced: 09 May 2026

https://github.com/ezamagni/knapsack-simd

A genetic 01-Knapsack problem solver in CUDA

cuda knapsack-problem knapsack01

Last synced: 09 May 2026

https://github.com/skillfulelectro/integral-solver

Simple integral solver

c cpp cuda math mathematics

Last synced: 08 May 2026

https://github.com/dhruvsrikanth/fastconv

Distributed and serial implementations of the 2D Convolution operation in c++ and CUDA.

convolution-filters cpp cuda gpu-programming high-performance-computing hpc image-editor image-processing nvidia parallel-programming

Last synced: 04 May 2026

https://github.com/abhans/archdev

Container that is built with Arch Linux with NVIDIA Driver & CUDA support, PyTorch and TensorFlow built in.

archlinux container cuda docker

Last synced: 07 May 2026

https://github.com/kibotu/llm-windows-server

Turn your Windows GPU into a private, low-latency LLM server. Docker-based, OpenAI-compatible API.

agentic cuda docker gguf llma-cpp local-llm nvidia-gpu openai-api opencode qwen self-hosted windows

Last synced: 10 Jun 2026

https://github.com/willigarneau/object-detection-cuda

🕺 Put my knowledge of OpenCV and Cuda into practice to create an object detection system. 💻

camera cplusplus cuda detector filter opencv

Last synced: 08 May 2026

https://github.com/poodarchu/vision-lab

Computer Vision Experiments in all.

computer-vision cuda object-detection

Last synced: 07 May 2026

https://github.com/daaboulex/unsloth-nix

Unsloth (git main) packaged for NixOS — CPU/CUDA/ROCm LoRA fine-tuning envs

cuda fine-tuning flake lora machine-learning nix nixos nixos-module pytorch rocm unsloth

Last synced: 10 Jun 2026

https://github.com/pedro-avalos/cuda-samples-snap

Unofficial snap for CUDA Samples

cuda gpu gpu-test linux nvidia package snap snapcraft

Last synced: 08 May 2026

https://github.com/sun-zhenxing/fast-neural-style

快速风格迁移部署

cuda cv2 fast-neural-style opencv

Last synced: 05 May 2026

https://github.com/kayuii/ironfish-miner

docker nvidia/amd Gpu hpool-dev/ironfish-miner ironfish-miner

amdgpu cuda docker gpu nvidia rocm

Last synced: 07 May 2026

https://github.com/xebastex/sfw-python

Python package designed to provide the essentials tools for off-the-grid inverse problem. This is the bedrock for future GUI implementation.

blasso cuda frank-wolfe pytorch

Last synced: 09 May 2026

https://github.com/speedcell4/torchdevice

Setup CUDA_VISIBLE_DEVICES

cuda deep-learning gpu machine-learning pytorch

Last synced: 07 May 2026

https://github.com/alextmjugador/rust-cuda-quickstart

Bring the Rust-CUDA project back to life under modern Linux environments.

cuda cuda-programming cuda-rust cuda-support docker rust

Last synced: 06 May 2026

https://github.com/uefi-code/msra_thepracticespaceproject_pytorchcuda

My repo to attend MSRA the Practice Space Project 2022, CUDA Implement and Optimize

ann cuda pytorch

Last synced: 06 May 2026

https://github.com/garciparedes/cuda-examples

Cuda examples who I develop to learn HPC based on GPU

c c-plus-plus cuda examples gpgpu gpu hpc

Last synced: 09 May 2026

https://github.com/igorcosta/deep-docker

Docker image for Deep Learning on AWS Cloud

cuda deep-learning docker docker-image tensorflow

Last synced: 05 May 2026

https://github.com/gmfatcat/ai-photoviewer

AI幫你分類你的舊照片

ai cuda local-first photo

Last synced: 16 Jun 2026

https://github.com/seieric/gst-dsobjectsmask

📀NVIDIA DeepStream integrated GStreamer Plugin. Mask objects with cuda cores on Jetson boards. Fast and smooth since everything is done on NVMM.🏎

cuda cuda-programming deepstream gpu gstreamer gstreamer-plugins instance-segmentation jetson-agx-orin jetson-agx-xavier jetson-tx1 jetson-tx2 jetson-xavier maskrcnn nvidia-jetson nvidia-jetson-nano opencv opencv4 resnet resnet50

Last synced: 06 May 2026

https://github.com/poyea/lollipop

🍭 Sweet GPU compute kernels in CUDA, wrapped via CuPy

cuda cuda-kernel cuda-kernels cuda-programming gpu-kernels gpu-programming python

Last synced: 17 Jun 2026

https://github.com/abdulfatir/subkmeans

Numpy and pyCUDA implementation of subKmeans

clustering cuda kdd kmeans numpy pycuda python subspace-clustering

Last synced: 09 May 2026

https://github.com/manishklach/gpu-resident-inference-lab

Research lab for GPU-resident LLM inference loops: persistent kernels, sparse KV selection, tiered residency, speculative decode, and trace-driven scheduling.

cuda gpu-systems kv-cache llm-inference mega-kernel model-systems persistent-kernel runtime speculative-decoding

Last synced: 19 Jun 2026

https://github.com/jayemscript/llm-systems-from-scratch

A hands-on learning project for building the core systems behind Large Language Models using C++, Rust, and optional Python/JavaScript bindings. Includes tensor operations, autograd, neural networks, tokenization, and a minimal transformer pipeline.

ai-systems autograd c-language cpp cuda educational-project high-performance-computing inference-engine machine-learning neural-networks-from-scratch pybind11 tensor-library tokenization transformers wasm

Last synced: 19 Jun 2026

https://github.com/pharmcat/metidacu.jl

CUDA solver for Metida.jl

cuda julia-language metida mixed-models

Last synced: 27 Apr 2026

https://github.com/codingrule/cuda-mbrot

Just another mandlebrot with cuda

cuda cuda-toolkit cupy fractal mandelbrot mathematics nvidia

Last synced: 27 Apr 2026

https://github.com/axel-ex/seame-ads-autonomous-lane-detection-24-25

🚗 Real-time lane detection and autonomous steering for JetRacer, powered by ROS2 and GPU-accelerated CV on Jetson Nano.

cuda jetson-nano ros2 tensorrt

Last synced: 27 Apr 2026

https://github.com/andrewboessen/bitonic-merge-sort

Bitonic Merge Sort algorithm optimized for GPU execution

bitonic-merge-sort cuda sorting-network

Last synced: 16 May 2026

https://github.com/bl33h/productoftwovectors

This code utilizes CUDA for parallel vector multiplication on a GPU, demonstrating GPU's acceleration capabilities.

cuda gpu kernel paralelism parallel-programming product vector

Last synced: 16 May 2026

https://github.com/ehsanmok/cs-521

UBC CS 521: Parallel Computing and Architectures

cuda erlang parallel-algorithm parallel-computing

Last synced: 16 May 2026

https://github.com/maelstrom6/mandelpy

A Mandelbrot and Buddhabrot viewer with GPU acceleration

buddhabrot cuda gpu mandelbrot python3

Last synced: 27 Apr 2026

https://github.com/xusworld/tars

Tars is a cool deep learning framework.

avx2 avx512 cuda deep-learning

Last synced: 27 Apr 2026

https://github.com/erosiv/silt

simple immediate lightweight tensors

cmake cuda simulation tensor

Last synced: 31 Oct 2025

https://github.com/thunder-compute/thunder-compute-documentation

Documentation for Thunder Compute, a cloud platform creating technology to virtualize GPUs over TCP

ai artificial-intelligence cloud cloud-computing cuda gpu llm machine-learning nvidia pytorch tensorflow thunder-compute virtualization

Last synced: 15 Oct 2025

https://github.com/ashwani-rathee/imagesgpu.jl

Image Processing on GPU in Julia

cuda gpu image image-processing julia

Last synced: 11 Jul 2025

https://github.com/pvdberg1998/cufft_rust

A safe Rust wrapper around a subset of cuFFT.

cuda cufft fft rust

Last synced: 19 Apr 2025

https://github.com/dolongbien/cuda

CUDA and Caffe/Caffe2 installation Ubuntu 16.04

c3d-intel-caffe caffe caffe2 cuda cudnn deep-learning ubuntu

Last synced: 28 Apr 2026