An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/romankoblov/rust-nvrtc

NVRTC bindings for RUST

cuda nvrtc rust

Last synced: 05 May 2025

https://github.com/redhat-na-ssa/gpu-workshop

Using GPUs on Red Hat Platforms

cuda gpu nvidia opencl

Last synced: 30 Jul 2025

https://github.com/definetlynotai/llm_data

A bunch of very famous repos source code's in python as pure localdocs all in this repo to train CODE AI

c code-examples cpp cuda data data-dum jupyter-notebook llm llm-code llm-datasets programming-data programming-data-sets python3

Last synced: 08 Oct 2025

https://github.com/nolmoonen/cuda-sdf

CUDA-accelerated path traced Menger sponge using ray marching.

cuda menger path-tracer ray-marching sdf

Last synced: 12 Feb 2026

https://github.com/zhihu/ZhiLight

A highly optimized inference acceleration engine for Llama and its variants.

cuda gpt inference-engine llama llm llm-serving pytorch

Last synced: 12 Aug 2025

https://github.com/yosh-matsuda/gpu-array

Maximum GPU performance with Modern C++ syntax. RAII and Range-based abstraction to GPU memory management and data layouts, enabling code safety and performance optimization with zero overhead.

cpp cpp20 cuda gpu header-only hip

Last synced: 08 Jun 2026

https://github.com/tristanpenman/cuda-examples

A collection of CUDA example code

cuda

Last synced: 10 Apr 2025

https://github.com/shmishtopher/cudnn-versions

A scoop bucket for installing NVIDIA cuDNN versions.

cuda cudnn scoop scoop-apps scoop-bucket

Last synced: 01 May 2026

https://github.com/yunzhu-li/recognizer

An object recognizer mobile app based on deep convolutional neural networks

cnn cuda cudnn gpu ios python swift tensorflow

Last synced: 20 Apr 2026

https://github.com/shalithasuranga/cudaperformance

Compare the performance of matrix multiplication among GPU shared memory, GPU global memory and CPU

cuda cuda-demo matrix-multiplication nvidia

Last synced: 21 Jan 2026

https://github.com/3zrv/raytracerincpp

A ray tracer that renders in 16-color VGA palette at 640x480 resolution.

cpp cuda nvidia

Last synced: 18 May 2026

https://github.com/lzyrapx/llm-grandmaster-notes

🎓The path to LLM mastery is paved with broken embeddings and resurrected gradients.

cuda deep-learning llm reinforcement-learning

Last synced: 14 May 2025

https://github.com/kishore-narendran/eecs221-highperformancecomputing

Assignments done during the graduate course EECS 221 - Introduction to HPC that I took in the Spring Quarter of 2016 at University of California, Irvine. Involves assignments that use OpenMP, MPI and CUDA.

cuda hpc mpi openmp

Last synced: 17 May 2026

https://github.com/copperfr/blendervxkex

Windows 7 CUDA & OptiX support for Blender 4.x

blender cuda cycles-renderer optix vxkex windows-7

Last synced: 20 Jan 2026

https://github.com/pnocera/cembedd

Embeddings rust API serving intfloat/multilingual-e5-large using huggingface/candle with CUDA enabled

bert cuda huggingface

Last synced: 12 Jan 2026

https://github.com/murrellgroup/conflux.jl

Single-node data parallelism in Julia with CUDA

cuda data-parallelism flux julia nccl

Last synced: 22 May 2026

https://github.com/yosh-matsuda/gpu-ptr

Cross-platform GPU smart pointer with C++20 range support

cpp cpp20 cuda gpu header-only hip

Last synced: 17 Jan 2026

https://github.com/scarfy-sysu/rtx5060-pytorch-cuda129

Run PyTorch with CUDA 12.9 on RTX 50 series (e.g. RTX 5060)

cuda deep-learning pytorch rtx5060

Last synced: 20 Jul 2025

https://github.com/dujonwalker/nixos-config-x86_64-cuda

This repository contains my NixOS configuration optimized for 64-bit x86 systems with NVIDIA CUDA support, featuring a Plasma 6 desktop environment and a variety of essential applications for development, multimedia, and productivity. It serves as a backup for easy restoration and setup on new installations.

cuda flatpak nix nixos nixos-configuration ollama

Last synced: 17 Jan 2026

https://github.com/boltzmannentropy/vllm-5090

vLLM-5090: Docker Container for RTX 5090 on WSL2/Windows

5090 cuda docker vllm

Last synced: 08 Oct 2025

https://github.com/peri044/cuda

GPU implementations of algorithms

cuda gauss-jordan parallel-programming

Last synced: 14 Jul 2025

https://github.com/toxy4ny/artaxerxes

Artaxerxes - Adaptive High-Performance Stress Tester v.1.0. Rebuild old version Xerxes DDoS. Supports GPU+io_uring, DPDK, eBPF/XDP with intelligent fallbacks. Educational tool for advanced cybersecurity labs

cuda cuda-programming cybersecurity cybersecurity-education cybersecurity-tools dpdk ebpf educational high-performance network-security network-security-tool penetration-testing penetration-testing-framework penetration-testing-tools security-tools stress-testing

Last synced: 08 Oct 2025

https://github.com/capelliexp/sc2-im-pf-pathfinding-thesis

Master of science thesis project. Using CUDA to utilize a systems GPU to create pathfinding data (IM+PF), usable by multiple agents in the same environment.

ai cplusplus cuda gpgpu pathfinding starcraft2

Last synced: 15 May 2026

https://github.com/l30nardosv/reproduce-parcosi-moleculardocking

Reproducing paper: "Benchmarking the Performance of Irregular Computations in AutoDock-GPU Molecular Docking"

autodock-gpu cpu cuda gpu molecular-docking molecular-docking-scripts opencl paper reproducible-research

Last synced: 16 Feb 2026

https://github.com/terrylindev/image-to-ASCII

🖼️ A command-line tool for converting images to ASCII art

ascii ascii-art cli command-line cpp cuda docker image-processing image-to-ascii mpi opencv terminal

Last synced: 12 Jul 2025

https://github.com/brocbyte/realtime-deformations

Snow simulation (Material Point Method)

cuda glm material-point-method opengl

Last synced: 10 Aug 2025

https://github.com/hadv/vaneth

GPU-accelerated CREATE2 vanity address miner for Ethereum

create2-contract-deployment cuda ethereum gpu gpu-acceleration gpu-programming open-cl vanity-address

Last synced: 21 Jan 2026

https://github.com/kim-hwiwon/T-espresso

A CUDA Library for Low-overhead Host-to-Device Transmission of Patterned Profile Data

cuda profiler

Last synced: 10 Apr 2025

https://github.com/lintenn/cudaaddvectors-explicit-vs-unified-memory

Performance comparison of two different forms of memory management in CUDA

c cuda explicit memory memory-management performance unified-memory

Last synced: 17 May 2026

https://github.com/tyler-romero/aegae

Learning Triton / CUDA

cuda triton

Last synced: 11 Apr 2026

https://github.com/kohulan/tensorflow-2.0-installation-with-cuda-support

A detailed step by step guide to install Tensorflow-2.0-gpu with CUDA Drivers on Ubuntu Server/ Desktop LTS

cuda gpu nvidia ubuntu

Last synced: 07 May 2025

https://github.com/shreyansh26/mlsys-experiments

A collection of scripts on experimenting and implementing MLSys-related stuff

cuda cuda-kernel gpu gpu-programming llm-inference profiling pytorch triton

Last synced: 30 Aug 2025

https://github.com/kilamper/matrix-multiplication

AC - Matrix multiplication using OpenMP, MPI and CUDA

cuda ms-mpi openmp

Last synced: 16 May 2026

https://github.com/bdwhst/fluora

A CUDA PBR path tracer

cpp cuda pathtracing pbr rendering

Last synced: 13 Feb 2026

https://github.com/sthysel/jtx2-tools

nvidia jtx/xavier GPU monitor tool

cuda nvidia txt2 xavier

Last synced: 19 May 2026

https://github.com/tmrob2/cuda2rust_sandpit

Minimal examples to get CUDA linear algebra programs working with Rust using CC & FFI.

cc clang cublas cuda cusparse rust

Last synced: 14 May 2025

https://github.com/geekysuavo/gpufield

A CUDA-accelerated electromagnetostatics solver

cuda magnetic-fields magnetostatics

Last synced: 24 Dec 2025

https://github.com/tortillazhawaii/rr_sort

Various sorting implementations using distributed and parallel methods

bazel cpp cuda java openmp spark threads

Last synced: 14 Apr 2026

https://github.com/amruthapatil/nyu-cudamatrixoperations

Optimizing CUDA programs for vector addition and matrix multiplication

cuda high-performance-computing

Last synced: 21 May 2026

https://github.com/tawssie/zmpy3d_pt

Python implementation of 3D Zernike moments with PyTorch

3d-zernike cuda gpu protein-structure python pytorch structural-bioinformatics superposition zernike-moments

Last synced: 24 Oct 2025

https://github.com/trahay/mpi-wattmeter

MPI-Wattmeter measures the power consumption of MPI programs

carbon-emissions cuda energy-consumption energy-monitor gpu hpc mpi

Last synced: 17 May 2026

https://github.com/infotrend-inc/ctpo-demo_projects

Jupyter Notebook examples using CTPO as their source container.

cuda opencv pytroch tensorflow2

Last synced: 14 Apr 2026

https://github.com/bonj4/wiki

This repository contains documentation and installation scripts for various tools and libraries.

cuda pangolin pybind11 sfm tensorrt

Last synced: 17 Jan 2026

https://github.com/kagof/julia-image-processing

Image processing programs written in Julia

cuda image-processing julia

Last synced: 18 May 2026

https://github.com/lukasboettcher/msc-code

This is the repo for my master thesis on a GPU accelerated andersen analysis.

andersen-analysis clang cuda llvm static-analysis

Last synced: 16 Jan 2026

https://github.com/demoriarty/doksparse

sparse DOK tensors on GPU, pytorch

cuda pytorch sparse

Last synced: 23 Feb 2025

https://github.com/betarixm/cuecc

POSTECH: Heterogeneous Parallel Computing (Fall 2023)

cryptography ctypes cuda ecc postech secp256k1

Last synced: 12 May 2025

https://github.com/mazharuddin-mohammed/semidgfem

High-performance TCAD Simulator Using Discontinuous Galerkin FEM

cuda discontinuous-galerkin-method tcad tcad-device-simulator

Last synced: 15 Jun 2025

https://github.com/muhac/jupyter-pytorch-docker

JupyterLab for AI in Docker! Anaconda and PyTorch GPU supported.

conda-environment cuda docker jupyterlab pytorch

Last synced: 01 Oct 2025

https://github.com/seungjaelim/cuda.tutorial

References content from the OLCF CUDA Training Series. (https://github.com/olcf/cuda-training-series)

cuda gpu-programming nsight-compute nsight-systems

Last synced: 07 Feb 2026

https://github.com/elftausend/sliced

Array operations with automatic differentiation on CPU and GPU

autograd automatic-differentiation cuda custos matrix opencl

Last synced: 31 Jan 2026

https://github.com/gjbex/gpu-programming

Material for a training on portable GPU programming

cuda gpu kokkos openmp openmp-off stl thrust

Last synced: 08 Feb 2026

https://github.com/dpbm/qml-course

Minicurso de quantum Machine learning

cuda cuda-q cuquantum docker ml python qml quantum quantum-computing tensorflow

Last synced: 31 Jan 2026

https://github.com/acrlakshman/gradient-augmented-levelset-cuda

Implementation of Gradient Augmented Levelset method for CPU and GPU

cfd cuda levelset

Last synced: 17 Feb 2026

https://github.com/wendylabsinc/tensorrt-swift

TensorRT Swift 6.2 Bindings for Linux

cuda nvidia swift tensor tensorrt

Last synced: 01 Feb 2026

https://github.com/fattorib/thunderkittens-simple-gemm

Simple Tensorcore GEMM in ThunderKittens

cuda gemm gpu thunderkittens

Last synced: 09 Feb 2026

https://github.com/hanzhi713/bitonic-sort

In-place GPU sort with bitonic sort

bitonic-sort cuda gpu in-place sorting

Last synced: 09 Feb 2026

https://github.com/xkevio/cuda-raytracer

A simple ray tracer written with CUDA that saves its output in a .ppm file, CPU version included for reference.

cpu cuda cuda-raytracer gpu

Last synced: 25 Aug 2025

https://github.com/andreasholt/cusmc

A CUDA-accelerated Statistical Model Checker for Stochastic Timed Automata

cuda smc

Last synced: 11 Feb 2026

https://github.com/tthebc01/cudaconda3

Lightweight container environment with Cuda, Miniconda3, and Jupyter Lab.

cuda docker gpu jupyterlab marimo-notebook miniconda3 reverse-proxy-application

Last synced: 11 Feb 2026

https://github.com/yingding/applyllm

A python package for applying LLM with LangChain and Hugging Face on local CUDA/MPS host

accelerator batch cuda framework inference kubeflow langchain llm mps pipeline slurm transformers

Last synced: 24 Aug 2025

https://github.com/andreabak/whispersubs

Generate subtitles for your video or audio files using the power of AI

ai cuda deep-learning gpu-acceleration machine-learning srt subtitles transcribe transcription translate whisper

Last synced: 15 Feb 2026

https://github.com/alpinebuster/arkime-docker-compose

Deploy Arkime with GPU-accelerated Rust/Python parsers and custom plugins using Docker Compose.

arkime c cuda deep-neural-networks docker docker-compose llm machine-learning networking pcap pcapng python rust traffic-analysis

Last synced: 16 Apr 2026

https://github.com/lchsk/ney

A header-only parallel functions library for Intel Xeon/Xeon Phi/GPUs

cuda gpu linux parallel phi scientific xeon xeonphi

Last synced: 07 May 2026

https://github.com/zeloe/juce_cuda_convolution

GPU acceleration for efficient, high-quality audio processing.

audio audio-processing convolution cuda dsp juce

Last synced: 03 Mar 2026

https://github.com/digimortl/libguess

Patches that give Bitcoin Core an ability of CUDA mining

bitcoin c-plus-plus cryptocurrency cuda

Last synced: 16 Apr 2026

https://github.com/orlandopalmeira/trabalho-cp-2023-2024

Repositório do trabalho prático no âmbito da UC de Computação Paralela (CP) - Mestrado em Engenharia Informática (MEI/MIEI) - Universidade do Minho (UMinho)

computacao-paralela cp cuda cuda-programming mei miei nvidia nvidia-cuda openmp optimization optimization-problem parallelism performance uminho uminho-mei uminho-miei

Last synced: 18 May 2026

https://github.com/alexjmercer/fractal-art

Generating Fractals in C++ using SFML. For the ultimate visual stimulation and in-depth code!

cmake cmakelists cpp20 cuda cuda-programming fractal-rendering graphics mandelbrot multithreading sfml2

Last synced: 05 Mar 2026

https://github.com/openspeedshop/cbtf-argonavis-gui

Baseline for next generation Open|SpeedShop Graphical User Interface (GUI). The primary focus of this GUI will be the processing and display of CUDA collector performance data. However, there will be refactoring phases to adopt the GUI to support the processing and display of any collector performance data.

cuda performance profiler profiling

Last synced: 18 Apr 2026

https://github.com/cfries/javagpuexperiments

Repository used to demo OpenCL, JOCL, JCuda.

cuda

Last synced: 25 Apr 2026

https://github.com/droduit/multiprocessor-architecture

Introduction to Multiprocessor Architecture @ EPFL

cuda multiprocessor multithreading openmp-parallelization

Last synced: 17 Apr 2026

https://github.com/matthewfeickert/cuda-tf-torch

An Ubuntu 18.04 NVIDIA Docker image with CUDA 10.1 CuDNN 7 with TensorFlow and PyTorch

cuda cuda-101 cudnn cudnn-v7 docker docker-image gpu nvidia-docker nvidia-gpu pytorch tensorflow torch

Last synced: 07 Jan 2026

https://github.com/xiongsp/pytorch-docker

Pure Pytorch Docker Images. Support almost all combinations of Pytorch, Python, Ubuntu, CentOS, and CUDA. 纯净的Pytorch镜像,支持几乎各种Pytorch、Python、Ubuntu、CentOS、CUDA版本的组合。

centos cuda docker docker-image python3 pytorch ubuntu

Last synced: 17 Apr 2026

https://github.com/babak2/optimizedsum

Optimized Parallel Sum program demonstrating CPU vs GPU performance

cuda cuda-programming gpu-acceleration gpu-computing gpu-parallelism visual-studio

Last synced: 27 Mar 2025

https://github.com/agalue/sherpa-voice-assistant

Local AI-based voice assistant implemented using Sherpa, Whisper, Kokoro, and Ollama

coreml cuda golang kokoro-tts linux macos ollama onnx-runtime rust sherpa whisper-ai

Last synced: 04 Apr 2026

https://github.com/szaghi/adam

Multi-physics AMR SDK and apps for High Performance Computing — from laptop to exascale device-accelerated superpc

amr cfd cuda fluid-dynamics fortran gas-dynamics hpc hydro-dynamics mpi openacc openmp plasma-dynamics

Last synced: 04 Apr 2026

https://github.com/artain-ai/ignite-ms

Fast self-hosted embedding engine for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust + TensorRT for teams that care about scale, cost, and control.

batch-inference batch-processing cuda embeddings gpu high-performance huggingface machine-learning multi-gpu nlp rag rust self-hosted semantic-search tensorrt text-embeddings vector-search

Last synced: 04 Jun 2026

https://github.com/juntyr/necsim-rust

Spatially explicit biodiversity simulations using a parallel library written in Rust

biodiversity cuda mpi necsim rust simulation

Last synced: 22 Mar 2025