Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/kareimgazer/mat-transpose-cuda

series of trials for optimizing matrix transpose with CUDA

cuda hpc matrix parallel-computing simd

Last synced: 03 Feb 2025

https://github.com/enfiskutensykkel/cuda-rdma-bench

NVIDIA GPU direct RDMA using SISCI API

cuda dma gpudirect-rdma pcie rdma sisci

Last synced: 01 Nov 2024

https://github.com/kuroko1t/gocuda

Go binding for Cuda Driver API

cuda go golang

Last synced: 22 Dec 2024

https://github.com/ai-dock/pytorch

PyTorch docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experience.

ai cuda docker jupyter machine-learning python pytorch rocm runpod syncthing vast

Last synced: 18 Nov 2024

https://github.com/cascadingradium/air-traffic-distribution

A GPU-Accelerated Multi-Objective Genetic Algorithm for Air Traffic Management

air-traffic-control air-traffic-management c cuda genetic-algorithm gpu-acceleration

Last synced: 19 Nov 2024

https://github.com/jackeylea/cuda_linux

linux下cuda/qt教程

cpp cuda cudnn qt5

Last synced: 23 Jan 2025

https://github.com/pmeier/tox-ltt

Install PyTorch distributions with light-the-torch

cuda install light-the-torch pip plugin pytorch tox

Last synced: 22 Dec 2024

https://github.com/bencardoen/singularity_slurm_cuda

Example on how to get started with Singularity and CUDA on a SLURM cluster

cuda nvidia singularity-container slurm-cluster tensorflow

Last synced: 23 Oct 2024

https://github.com/sbaldu/neural_network_hep

Implementation of a neural network framework from scratch in C++ applied to particle physics

cpp cuda high-energy-physics neural-networks

Last synced: 30 Oct 2024

https://github.com/enp1s0/culip

Library for profiling the execution time of CUDA official library functions

cublas cuda profiling

Last synced: 06 Nov 2024

https://github.com/rfsantacruz/mycudasamples

This is a series of CUDA C++ programming samples developed to study CUDA technology and its parallel programming model.

cpp cuda gpgpu

Last synced: 07 Nov 2024

https://github.com/fynv/curandrtc

CURandRTC is a GPU random number generation module based on ThrustRTC.

cuda nvrtc random-number-generators thrust

Last synced: 23 Oct 2024

https://github.com/aespinosadev/opengl-renderer

OpenGL renderer showcasing all basic functionality to render 3D scenes.

computer-graphics cuda gpgpu graphics-engine graphics-programming opengl rendering rendering-3d-graphics shaders video-game

Last synced: 23 Jan 2025

https://github.com/rogerallen/qtmandelbrotr

Qt CUDA Mandelbrot explorer

cuda cuda-opengl mandelbrot-viewer qt5

Last synced: 25 Nov 2024

https://github.com/neoblizz/cudagl

CUDA based Graphics Library for NVIDIA's GPUs.

cuda graphics-library graphics-programming opengl

Last synced: 30 Oct 2024

https://github.com/jaxony/pynvidia

⚙️ NVIDIA GPU utilities for Python 🔧

cuda deep-learning nvidia-gpu pip python utility

Last synced: 13 Dec 2024

https://github.com/andydevs/cudafractal

Fractal Generator using Nvidia's CUDA framework

cplusplus cuda nvidia

Last synced: 08 Nov 2024

https://github.com/yuvix25/py2cuda

Convert Python 3 code to CUDA code.

converter cuda gpu gpu-acceleration python python3

Last synced: 05 Jan 2025

https://github.com/mre/cudampi

Large hybrid CPU/GPU sorting network using CUDA and MPI

algorithms bucket bucketsort cuda filesystem gpu hybrid-cpu mpi parallel sorting-network

Last synced: 06 Feb 2025

https://github.com/cloudmercato/python-fpb

Python Floating Point Benchmark

benchmark cuda floating-point numpy pandas python

Last synced: 22 Jan 2025

https://github.com/xiaohaoo/yolo_tensorrt

Deploy the YOLOv8 model for inference using OpenCV and TensorRT in C/C++.

c cuda opencv tensorrt yolov8

Last synced: 24 Dec 2024

https://github.com/coderonion/moblas

BLAS (Basic Linear Algebra Subprograms) library written in mojo programming language.

blas blis cublas cuda eigen fortran gemm gonum hpc lapack linear-algebra math mkl mojo numpy openblas pytorch scientific-computing simd tensor

Last synced: 16 Jan 2025

https://github.com/rupeshs/anomalydetection

Anomaly Detection Using Anomalib and OpenVINO – Step by Step by Guide

anomalib anomaly anomalydetection computer-vision cpu cuda gpu intel onnx opencv pytorch

Last synced: 21 Jan 2025

https://github.com/neoheartbeats/neoheartbeats-kernel

An architecture for LLMs' continual-learning and long-term memories

cuda fine-tuning llama-factory llm

Last synced: 23 Oct 2024

https://github.com/shunk031/nvinfo-go

Rewrite of ikr7/nvinfo, a simple utility for monitoring your CUDA-enabled GPUs, with Golang

cli cuda go golang gpu nvidia nvidia-smi

Last synced: 15 Dec 2024

https://github.com/648trindade/sbac-pad-marathon-problems

Repository containing problems of the SBAC-PAD Marathon of Parallel Programming and some parallel solutions to them.

cuda high-performance-computing mpi openmp parallel-computing

Last synced: 10 Jan 2025

https://github.com/elinliu0/studentbehaviordetection

沈阳大学-学生行为检测代码仓库(基于YoloV8+CVCUDA+TensorRT)

cuda cv-cuda python tensorrt yolov8

Last synced: 11 Nov 2024

https://github.com/neural-bits/ai-programming-hub

Learn and experiment with new techniques and programming languages with a focus on ML

cpp cuda cython openai-triton python rust

Last synced: 05 Feb 2025

https://github.com/matthewhaynesonline/ai-server-setup

Setup AWS EC2 instance from scratch with NVIDIA CUDA, Docker, Packer for AI / ML.

ai ami aws cuda devops docker ml mlops packer

Last synced: 05 Feb 2025

https://github.com/pkestene/cuda_mpi_autotools_proj_template

A template project for CUDA+MPI with autotools build system

automake autotools cuda cuda-mpi mpi

Last synced: 18 Dec 2024

https://github.com/amirhoseinmasoumi/onnx-cuda-inference

A C++ project for running CUDA-accelerated ONNX model inference, using ONNX Runtime and OpenCV for image segmentation tasks.

cpp cuda inference onnxruntime onnxruntime-gpu opencv segmentation

Last synced: 05 Feb 2025

https://github.com/romankoblov/rust-nvrtc

NVRTC bindings for RUST

cuda nvrtc rust

Last synced: 23 Oct 2024

https://github.com/chiang-yuan/culsm

CUDA C++ code implementing GPU-accelerated Lattice Spring Model (CuLSM) simulations.

cuda gpu parallel-computing particles

Last synced: 18 Dec 2024

https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-c-cpp

Accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques.

cpp cuda cuda-kernels cuda-programming nsight nvidia profilling

Last synced: 04 Jan 2025

https://github.com/fabryprog/java-gpu

Support for offloading parallel-for loops in Java to NVIDIA CUDA compatible cards.

cuda gpu java nvidia parallel-computing

Last synced: 19 Dec 2024

https://github.com/phrb/nvidia-workshop-autotuning

Resources for autotuning CUDA compiler parameters

autotuning compilers cuda gpu julia nodal nvcc

Last synced: 02 Feb 2025

https://github.com/brosnanyuen/raybnn_diffeq

Differential Equation Solver using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI

arrayfire cuda differential differential-equations gpu gpu-computing opencl parallel parallel-computing parallel-programming raybnn rust

Last synced: 13 Nov 2024

https://github.com/brosnanyuen/raybnn_raytrace

Ray tracing library using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI

arrayfire cuda gpu gpu-computing opencl parallel parallel-computing ray ray-tracing raybnn raylib raytracer raytracing rust

Last synced: 13 Nov 2024

https://github.com/jtriley/gpucrate

Creates hard-linked GPU driver (currently just NVIDIA) volumes for use with docker, singularity, etc.

container cuda docker gpu singularity

Last synced: 07 Nov 2024

https://github.com/maximedebarbat/dolphin

Dolphin is a python toolkit meant to speed up inference of TensorRT by providing CUDA-Accelerated processing.

cuda python tensorrt-inference

Last synced: 01 Nov 2024

https://github.com/harrydobbs/torch_ransac3d

A high-performance implementation of 3D RANSAC (Random Sample Consensus) algorithm using PyTorch and CUDA.

3d cloud cubiod cuda cylinder plane plane-detection point point-cloud ransac segmentation

Last synced: 22 Nov 2024

https://github.com/yinguobing/yolov5-trt

YOLO v5 inference with TensorRT (C++)

cpp cuda nvidia opencv tensorrt yolov5

Last synced: 14 Nov 2024

https://github.com/ran-2012/inversion

solve geophysics using CUDA & TensorFlow

cpp cuda geophysics inversion-method python

Last synced: 10 Jan 2025

https://github.com/BrosnanYuen/RayBNN_Raytrace

Ray tracing library using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI

arrayfire cuda gpu gpu-computing opencl parallel parallel-computing ray ray-tracing raybnn raylib raytracer raytracing rust

Last synced: 05 Nov 2024

https://github.com/alejandrogallo/atrip

High Performance library for the CCSD(T) algorithm in quantum chemistry

asynchronous-programming coupled-cluster cuda literate-programming mpi quantum-chemistry

Last synced: 26 Nov 2024

https://github.com/guilt/rocm-programming-masterclass

Udemy's CUDA programming Masterclass with Examples in ROCM/HIP.

cuda hip rocm

Last synced: 15 Nov 2024

https://github.com/3zrv/raytracerincpp

A ray tracer that renders in 16-color VGA palette at 640x480 resolution.

cpp cuda nvidia

Last synced: 06 Jan 2025

https://github.com/meetps/me-766

Assignment Solutions to course ME766 High Performance Scientific Computing.

cuda gpu-computing opencl openmp parallel-computing

Last synced: 04 Jan 2025

https://github.com/tristanpenman/cuda-examples

A collection of CUDA example code

cuda

Last synced: 08 Dec 2024

https://github.com/deftruth/ptx-isa-8.2-zh

🎉持续更新:CUDA 12.2 PTX-ISA-8.2学习笔记,部分中文翻译 + 个人理解 + 内联汇编示例,讲解CUDA 12.2 PTX-ISA-8.2 汇编指令;进行中.....

asm cpp cuda ptx

Last synced: 17 Dec 2024

https://github.com/definetlynotai/llm_data

A bunch of very famous repos source code's in python as pure localdocs all in this repo to train CODE AI

c code-examples cpp cuda data data-dum jupyter-notebook llm llm-code llm-datasets programming-data programming-data-sets python3

Last synced: 26 Jan 2025

https://github.com/redhat-na-ssa/gpu-workshop

Using GPUs on Red Hat Platforms

cuda gpu nvidia opencl

Last synced: 04 Dec 2024

https://github.com/simmsb/p4haskell

P4 backend in haskell

compiler cuda gpu p4 p4c p4language

Last synced: 07 Jan 2025

https://github.com/egororachyov/spbench

Benchmark for sparse linear algebra libraries for CPU and GPU platforms.

benchmark cpp cpu cuda gpu-computing graphblas opencl sparse-matrices

Last synced: 19 Nov 2024

https://github.com/ydrmaster/cuda-driver

基于 CUDA Driver API 的 cuda 运行时环境

cuda nvidia

Last synced: 20 Nov 2024

https://github.com/zhihu/ZhiLight

A highly optimized inference acceleration engine for Llama and its variants.

cuda gpt inference-engine llama llm llm-serving pytorch

Last synced: 14 Dec 2024

https://github.com/kdeps/kdeps

Kdeps reduces the complexity of building self-hosted RAG AI Agents and APIs powered by open-source LLMs

agents ai-agents api artificial-intelligence cuda docker dockerized fine-tuning huggingface llama llm llm-agent llmops mistral mlops model-inference multimodal nvidia opensource vicuna

Last synced: 02 Feb 2025

https://github.com/gurbaaz27/cs433a-design-exercises

Solutions of design exercises in CS433A: Parallel Programming, Spring Semester 2021-22

barriers cuda gpu-programming locks openmp parallel-programming posix-threads semaphores

Last synced: 14 Nov 2024

https://github.com/tensorush/my-dev-containers

:whale: My development environments wrapped into VS Code Dev Containers (15.02.2022).

containers cuda cuda-programming dev-container development docker docker-container jax mamba micromamba python python3 vscode vscode-devcontainer

Last synced: 25 Dec 2024

https://github.com/bruce-lee-ly/cuda_back2back_hgemm

Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.

back2back-gemm back2back-hgemm cublas cuda fused-gemm fused-hgemm gemm gpu hgemm matrix-multiply nvidia tensor-core

Last synced: 15 Nov 2024

https://github.com/silviopaganini/darknet-docker-nvidia

Docker Image to run Darknet on Nvidia with CUDA 9.0 and openCV 3.4.0

cuda darknet docker nvidia-docker opencv

Last synced: 27 Nov 2024

https://github.com/fatlipp/cuda-tree

CUDA-based Tree builder

algorithms cpp cuda octree quadtree tree

Last synced: 21 Jan 2025

https://github.com/jedbrooke/cuda_bwt

CUDA accelerated burrows-wheeler transform

bioinformatics burrows-wheeler-transform bwt compression cuda

Last synced: 19 Jan 2025

https://github.com/franneck94/cuda-aes

AES Implementation (Counter Mode) in C++, OpenMP and CUDA.

aes c-plus-plus counter cuda encryption openmp parallel

Last synced: 31 Oct 2024

https://github.com/ragibson/cuda-k-means

An implementation of Lloyd's algorithm for data clustering on GPUs and computational accelerators.

clustering cuda gpu k-means unsupervised-clustering

Last synced: 05 Jan 2025

https://github.com/eggy115/cuda

CUDA

cuda

Last synced: 05 Jan 2025

https://github.com/nodef/nvgraph.sh

CLI for nvGraph, which is a GPU-based graph analytics library written by NVIDIA, using CUDA.

analytics cli console cuda gpu graph nvgraph nvidia pagerank terminal

Last synced: 24 Oct 2024

https://github.com/sean-bradley/cudalookupripemd60

RipeMD160 Lookup using parallel processing on NVidia CUDA Graphics card

cuda parallel-processing ripemd160

Last synced: 13 Nov 2024

https://github.com/lu-zero/nvidia-video-codec

Redistributable headers to build cuvid and nvenc

cuda cuvid nvenc nvidia nvidia-video-codec

Last synced: 16 Nov 2024

https://github.com/harrism/nsys_easy

Easier, quicker command-line CUDA profiling

cuda nsight-systems profiling

Last synced: 13 Oct 2024

https://github.com/thomasjo/cudalicious

C++ header library intended to reduce CUDA boilerplate code

boilerplate cpp cuda header-only

Last synced: 21 Jan 2025

https://github.com/iitii/useless

逗比脚本备份,部分自用配置文件,一些自用脚本

aria2 bash-script cuda docker doubi ffmpeg frpc frps oh-my-zsh powerlevel10k

Last synced: 20 Jan 2025

https://github.com/hahnjo/cgxx

Object-Oriented Implementation of the Conjugate Gradients Method

cg cuda hpc openacc opencl openmp

Last synced: 29 Jan 2025

https://github.com/ammaryasirnaich/deeplearning_playland

This repository contains Docker Image files, which support the common frameworks required for Deep learning implementation. The images support both the latest GPU (Nvidia CUDA) and CPU processors.

cuda cuda11 cudnn cudnn8 deep-learning docker docker-image dockerfile gpu kersa opencv pytorch pytorch-cnn scikit-learn tensorflow2

Last synced: 15 Dec 2024

https://github.com/ai-dock/python

Python docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experience.

ai cuda docker machine-learning python rocm runpod vast

Last synced: 18 Nov 2024

https://github.com/akira4o4/trtplus

tensorrt-plus framework

cpp cpu cuda gpu tensorrt yolo

Last synced: 27 Nov 2024

https://github.com/antoniopelusi/lu-solver

Assignments for High Performace Computing exam at Unimore, Modena, IT.

cuda lu-decomposition openmp

Last synced: 21 Jan 2025

https://github.com/pnnl/cuvite

Multi-GPU Graph Community Detection using CUDA

community-detection cuda graph-clustering mpi

Last synced: 25 Nov 2024

https://github.com/shalithasuranga/cudaperformance

Compare the performance of matrix multiplication among GPU shared memory, GPU global memory and CPU

cuda cuda-demo matrix-multiplication nvidia

Last synced: 19 Dec 2024