Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/ashermancinelli/cxbqn

BQN virtual machine

apl bqn cplusplus cpp20 cuda

Last synced: 09 Nov 2024

https://github.com/baderlab/ecuda

STL-like containers (array, vector, matrix, cube) useable in device code.

c-plus-plus cuda

Last synced: 05 Nov 2024

https://github.com/gjbex/python-for-hpc

Repository for participants of the "Python for HPC" training

cuda cython dask gpu hpc mpi numba python python-training scientific-computing swig training

Last synced: 22 Nov 2024

https://github.com/seonglae/llama2gptq

Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.

chatai chatbot chatgpt cuda gpt langchain llama-2 llama2 model-quantization quantization question-answering rye streamlit-chat transformers

Last synced: 12 Dec 2024

https://github.com/ahdhn/cudatemplate

Template for starting CUDA/C++ project using CMake with Github Action for CI

cmake cuda template

Last synced: 08 Nov 2024

https://github.com/niftypet/nipet

High-throughput PET image reconstruction with high quantitative accuracy and precision

analysis cuda gpu image-reconstruction medical-imaging mlem pet processing python

Last synced: 22 Jan 2025

https://github.com/marianhlavac/fft-cuda

Fast Fourier Transform implementation, computable on CUDA platform. Seminar project for MI-PRC course at FIT CTU.

c-plus-plus coursework cuda fast-fourier-transform fit-ctu nvidia python

Last synced: 08 Nov 2024

https://github.com/LatticeQCD/SIMULATeQCD

SIMULATeQCD is a multi-GPU Lattice QCD framework that makes it easy for physicists to implement lattice QCD formulas while still providing competitive performance.

cuda gpu hip hpc lattice lattice-qcd mpi parallel physics

Last synced: 30 Oct 2024

https://github.com/kwea123/python-ray-tracing-with-cuda-example

An example of cuda ray tracing in pure python syntax.

cuda numba python ray-tracing

Last synced: 30 Oct 2024

https://github.com/williamvenner/squad-mortar-helper

💣 SMH – a computer vision project for automatic, precision mortar strike calculations in Squad

computer-vision cuda cv gpu mortar mortars rust smh squad squad-game squadgame

Last synced: 02 Nov 2024

https://github.com/llnl/fpchecker

A dynamic analysis tool to detect floating-point errors in HPC applications.

cancellation clang cuda exceptions floating-point floating-point-arithmetic infinity llvm overflow overflow-detection underflow-detection

Last synced: 11 Nov 2024

https://github.com/juliagpu/nvtx.jl

Julia bindings for NVTX, for instrumenting with the Nvidia Nsight Systems profiler

cuda julia nsys nvtx profiling

Last synced: 12 Nov 2024

https://github.com/slyautomation/osrs_yolov5

Yolov5 Object Detection In OSRS using Python code, Detecting Cows - Botting

botting cuda machine-learning mlbot osrs pycharm python pytorch runescape yolov5

Last synced: 09 Oct 2024

https://github.com/deftruth/cuffpa-py

📚[WIP] FFPA: Yet another Faster Flash Prefill Attention with O(1)🎉GPU SRAM complexity for headdim > 256, ~1.5x🎉faster than SDPA EA.

attention cuda flash-attention mlsys sdpa tensor-cores

Last synced: 09 Jan 2025

https://github.com/rodrgo/OpenPH

Parallel reduction of boundary matrices for Persistent Homology with CUDA

cuda gpu-computing numerical-computation parallel-computing persistent-homology topological-data-analysis

Last synced: 12 Nov 2024

https://github.com/DefTruth/cuffpa-py

📚[WIP] FFPA: Yet another Faster Flash Prefill Attention with O(1)🎉GPU SRAM complexity for headdim > 256, ~1.5x🎉faster than SDPA EA.

attention cuda flash-attention mlsys sdpa tensor-cores

Last synced: 08 Jan 2025

https://github.com/idealab-isu/GPView

GPU Accelerated Voxelization Framework for 3D CAD models.

cpp cuda gpu voxelization

Last synced: 27 Oct 2024

https://github.com/prg-titech/dynasoar

CUDA Dynamic Memory Allocator for SOA Data Layout

cuda memory-allocation simd smmo

Last synced: 18 Nov 2024

https://github.com/SafeAILab/zkDL

zkDL, an open source toolkit for zero-knowledge proofs of deep learning powered by CUDA

cuda deep-neural-networks gpu-acceleration privacy-enhancing-technologies zero-knowledge-proof

Last synced: 12 Nov 2024

https://github.com/anibali/docker-torch

A Docker image for Lua Torch

cuda docker docker-image lua torch

Last synced: 11 Oct 2024

https://github.com/dereklstinson/gocudnn

Go Bindings for cudnn and other cuda pacakges.

convolutional-neural-networks cuda cudnn go golang machine-learning neural-network

Last synced: 24 Jan 2025

https://github.com/coderonion/cuda-beginner-course-cpp-version

bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码

cpp cublas cuda cuda-programming cudnn gpu gpu-programming nvcc nvidia parallel-programming python rust

Last synced: 19 Nov 2024

https://github.com/pdziepak/ranges-gpu

Experimental ranges for CUDA

c-plus-plus cuda range

Last synced: 07 Nov 2024

https://github.com/fclc/multi-plexer

Goal: Low power cluster capable of serving 24+ streams of 4KHDR60 source transcodes while consuming no more than 100W at peak and idling at less than 10W

arm64 clustering cuda decoding encoding ffmpeg hardware hdr jetson jetson-nano jetson-xavier-nx opencl plex pocl raspberry-pi-4 rockpro64 transcode zfs zfsonlinux

Last synced: 11 Oct 2024

https://github.com/blurgyy/jaxngp

JAX implementation of instant-ngp (NeRF part)

cuda hashgrid instant-ngp jax nerf neural-radiance-field nix python wsl

Last synced: 13 Nov 2024

https://github.com/gvaliente/pcps

CPU and GPU point cloud plane segmentation

cpp cpp11 cuda opencl pcl thrust

Last synced: 21 Nov 2024

https://github.com/enot-autodl/onnx-runtime-with-tensorrt-and-openvino

Docker scripts for building ONNX Runtime with TensorRT and OpenVINO in manylinux environment

cuda nvidia onnx onnxruntime openvino tensorrt

Last synced: 07 Nov 2024

https://github.com/ENOT-AutoDL/ONNX-Runtime-with-TensorRT-and-OpenVINO

Docker scripts for building ONNX Runtime with TensorRT and OpenVINO in manylinux environment

cuda nvidia onnx onnxruntime openvino tensorrt

Last synced: 28 Oct 2024

https://github.com/johnh2o2/cuvarbase

Python library for fast time-series analysis on CUDA GPUs

cuda fourier-methods gpu gpu-computing lomb-scargle-periodogram nfft python python-3 time-series

Last synced: 02 Nov 2024

https://github.com/juliagpu/nccl.jl

A Julia wrapper for the NVIDIA Collective Communications Library.

cuda gpu julia nccl

Last synced: 12 Nov 2024

https://github.com/m-pilia/disptools

Generate displacement fields with known volume changes

cuda image-processing jacobian python3

Last synced: 12 Nov 2024

https://github.com/whizzzkid/opencv-complete-build-cuda

Full build script for Open CV with/without cuda and bumblebee support

build build-automation cuda cudnn install installer opencv opencv-library opencv-python opencv3 opencv3-python

Last synced: 26 Dec 2024

https://github.com/jimouris/parallel-convolution

🖼️ Parallel Image Convolution, applying a blur filter to images. Written in C, optimized in three different ways: MPI, MPI & OpenMP and CUDA.

blur-filter cuda image-convolution image-processing mpi mpi-library parallel-processing

Last synced: 06 Nov 2024

https://github.com/sandialabs/lgrtk

Tool Kit for Lagrangian Grid Reconnection

cuda gpu hpc physics sandia-national-laboratories scr-2300 snl-applications

Last synced: 12 Nov 2024

https://github.com/illuhad/hipCPU

Implementation of AMD HIP for CPUs

cuda gpgpu hip hpc openmp openmp-parallelization

Last synced: 09 Nov 2024

https://github.com/ktaletsk/NCCV

Short course on computer vision and image processing using Numba+CUDA+OpenCV

computer-vision cuda jupyter-notebook numba

Last synced: 15 Nov 2024

https://github.com/illuhad/hipcpu

Implementation of AMD HIP for CPUs

cuda gpgpu hip hpc openmp openmp-parallelization

Last synced: 28 Nov 2024

https://github.com/kostyaev/sentence2vec

Deep sentence embedding using Sequence to Sequence learning

cuda sentence2vec seq2seq torch

Last synced: 28 Oct 2024

https://github.com/ktaletsk/nccv

Short course on computer vision and image processing using Numba+CUDA+OpenCV

computer-vision cuda jupyter-notebook numba

Last synced: 31 Dec 2024

https://github.com/heavyai/heavyai.jl

Julia client for OmniSci GPU-accelerated SQL engine and analytics platform

cuda data-science database gpu julia-language julia-package julialang sql

Last synced: 31 Oct 2024

https://github.com/tjyuyao/cutex

PyCUDA based PyTorch Extension Made Easy

cuda customized extension operator pycuda pytorch

Last synced: 06 Jan 2025

https://github.com/microsoft/svirl

Svirl is GPU-accelerated solver of complex Ginzburg-Landau equations for superconductivity. It consists of time-dependent solver to describe vortex dynamics and free energy minimizer to accurately find static configurations.

cuda ginzburg-landau gpu python scientific-computing superconductivity vortex

Last synced: 04 Dec 2024

https://github.com/sevagh/zen

optimized realtime harmonic/percussive source separation using the GPU (NVIDIA CUDA) and CPU (Intel IPP)

audio cuda digital-signal-processing dsp real-time source-separation thrust

Last synced: 23 Dec 2024

https://github.com/mxpv/nvml-go

golang wrapper for NVIDIA Management Library (NVML)

cuda golang golang-wrapper gpu nvidia nvidia-smi nvml

Last synced: 11 Oct 2024

https://github.com/prg-titech/ikra-cpp

C++ Library for Object-oriented Programming with Structure of Arrays Layout

cpp cuda data-layout simd

Last synced: 18 Nov 2024

https://github.com/ShahriarRezghi/Spyker

High-performance Spiking Neural Networks Library Written From Scratch with C++ and Python Interfaces.

computational-neuroscience cuda cudnn cxx high-performance neuroscience onednn python r-stdp snn stdp

Last synced: 05 Nov 2024

https://github.com/minhhn2910/cuda-half2

Convert CUDA programs from float data type to half or half2 with SIMDization

clang cuda half-precision

Last synced: 26 Dec 2024

https://github.com/pkestene/ms-hpc-ai-gpu

resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI

cuda deep-learning gpu gpu-computing machine-learning physics-informed-neural-networks pinn pinns

Last synced: 18 Dec 2024

https://github.com/bruce-lee-ly/flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

cuda cutlass flash-attention flash-attention-2 gpu inference large-language-model llm mha multi-head-attention nvidia tensor-core

Last synced: 15 Nov 2024

https://github.com/TristanBilot/mlx-GCN

MLX implementation of GCN, with benchmark on MPS, CUDA and CPU (M1 Pro, M2 Ultra, M3 Max).

apple cuda deep-learning gnn mlx pytorch

Last synced: 30 Oct 2024

https://github.com/pinto0309/20220228_intel_deeplearning_day_hitnet_demo

Special Presentation Demo at Intel IoT Planet 2021 DeepLearning Day / インテル IoT プラネット 2021 DeepLearning Dayの特別講演の発表資料 https://www.intel.co.jp/content/www/jp/ja/now/iot-planet/deep-learning-day.html

cuda docker intel onnx openvino

Last synced: 23 Oct 2024

https://github.com/yhtang/graphdot

GPU-accelerated Marginalized Graph Kernel with customizable node and edge features; Gaussian process regression.

cheminformatics cuda gpu graph-algorithms machine-learning python

Last synced: 08 Nov 2024

https://github.com/kyegomez/neva

The open source implementation of "NeVA: NeMo Vision and Language Assistant"

artificial-intelligence cuda gpt4 multi-modal multi-modal-learning multithreading neva nvidia robotics

Last synced: 09 Nov 2024

https://github.com/bdhu/gpuinfo

A minimal command-line utility written in Rust for querying GPU status

command-line-tool cuda gpu nvidia nvidia-smi nvml rust rust-lang

Last synced: 07 Nov 2024

https://github.com/tree-sitter-grammars/tree-sitter-cuda

CUDA grammar for tree-sitter

cuda parser tree-sitter

Last synced: 30 Oct 2024

https://github.com/tylerjthomas9/rapids.jl

An unofficial Julia wrapper for the RAPIDS.ai ecosystem using PythonCall.jl

cuda gpu-acceleration julia

Last synced: 14 Jan 2025

https://github.com/bwohlberg/sporco-cuda

CUDA extension for the SPORCO project

convolutional-sparse-coding cuda gpu

Last synced: 09 Dec 2024

https://github.com/niftypet/nimpa

NiftyPET: Neuro-Image Manipulation, Processing and Analysis

analysis cuda gpu medical-imaging mr pet processing python

Last synced: 13 Jan 2025

https://github.com/parker-int64/yolov5-RGBD

Qt QML based yolov5 + RGBD camera program

cuda cudnn depth opencv openvino qml-applications qt rgbd tensorrt yolov5

Last synced: 09 Nov 2024

https://github.com/ptaxom/pnn

pnn is Darknet compatible neural nets inference engine implemented in Rust.

cuda cudnn darknet rust tensorrt yolo

Last synced: 09 Nov 2024

https://github.com/iowar/kecmatch-gpu

Finds matching solidity function signatures using GPU

cuda keccak256 solidity

Last synced: 23 Nov 2024

https://github.com/dbklim/docker_image_with_cuda10_cudnn7

Dockerfiles and manual for easy build of docker image with CUDA10.X and cuDNN7.6 to run TensorFlow/PyTorch on the nvidia GPU in docker-container.

cuda cudnn docker docker-gpu docker-image docker-nvidia dockerfile gpu gpu-docker nvidia nvidia-docker pytorch pytorch-gpu tensorflow tensorflow-examples tensorflow-gpu tensorflow-gpu-docker torch

Last synced: 10 Oct 2024

https://github.com/denzp/rust-inline-cuda-tutorial

Let's jump into CUDA development with Rust

cuda rust

Last synced: 28 Oct 2024

https://github.com/harrism/cuda_event_benchmark

Unit benchmarks of CUDA event APIs.

benchmarks cuda

Last synced: 28 Oct 2024

https://github.com/jundaf2/eigenmha

Forward and backward Attention DNN operators implementationed by LibTorch, cuDNN, and Eigen.

backpropagation cuda cudnn cudnn-v8 dnn inference pytorch

Last synced: 15 Nov 2024

https://github.com/moldyn/clustering

Robust and stable clustering of molecular dynamics simulation trajectories.

biophysics clustering cpp cuda molecular-dynamics

Last synced: 11 Dec 2024

https://github.com/anroshka/snake-ai

🐍 A Snake game AI that learns to play through Deep Q-Learning. Built with PyTorch and Pygame, featuring CUDA acceleration and real-time visualization of the learning process.

artificial-intelligence collaborate collaboration cuda deep-learning deep-q-learning dqn game-ai gpu-acceleration machine-learning neural-network pygame python pytorch q-learning reinforcement-learning snake-game

Last synced: 26 Jan 2025

https://github.com/linonetwo/moss-dockerfile

用于在 Docker 里运行复旦的 MOSS 语言模型,使用 GradIO 提供 WebUI。

ai chatglm chatgpt cuda deeplearning docker gpu moss pytorch

Last synced: 23 Dec 2024

https://github.com/OMEGAMAX10/Face-Mask-Detection-Using-YOLOv4

Because of the COVID-19 pandemic of 2020, more and more people are concerned with protecting themselves using masks, thus the need of software capable of monitoring whether the people are wearing masks or not. That is why I created a Python application using OpenCV (with CUDA support) based on the YOLOv4 algorithm, capable of monitoring the safety level of a space with video surveillance.

computer-vision covid-19 cuda cuda-support face-mask-detection gui gui-application masks monitoring opencv pyqt5 python safety-level video-surveillance wearing-masks yolov4 yolov4-algorithm

Last synced: 09 Nov 2024

https://github.com/matthewfeickert/nvidia-gpu-ml-library-test

Simple tests for JAX, PyTorch, and TensorFlow to test if the installed NVIDIA drivers are being properly picked up

cuda cudnn gpu jax nvidia pytorch setup tensorflow torch

Last synced: 10 Jan 2025

https://github.com/stellar-group/blaze_cuda

WIP · CUDA compatibility for Blaze · https://bitbucket.org/blaze-lib/blaze

blaze cpp cpp14 cuda gpu hpc linear-algebra metaprogramming

Last synced: 12 Nov 2024

https://github.com/ashvardanian/cpp-cuda-python-starter-kit

Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11

cmake cuda cuda-programming hip hpc matrix-multiplication openmp parallel-computing parallel-programming pybind pybind11 python starter-kit starter-template tutorial

Last synced: 28 Oct 2024

https://github.com/yujun-shi/cfmatting_cuda_mkl

A cuda & mkl implementation of closed-form matting

cuda vision

Last synced: 14 Nov 2024

https://github.com/albertstarfield/project-zephyrine

Introducing Project Zephyrine: Elevating Your Interaction Plug and Play, and Employing GPU Acceleration within a Modernized Automata Local Graphical User Interface.

chatgpt cuda electron falcon gemma ggml gguf gpt-3 gui llama llama-2 llama-3 llm metal opencl

Last synced: 30 Nov 2024

https://github.com/mnicely/nvml_examples

Examples showing how to utilize the NVML library for GPU monitoring

cublas cuda nvidia nvml

Last synced: 15 Oct 2024

https://github.com/jtschwar/tomo_tv

C++ library for Regularized 2D and 3D Tomography Reconstructions.

3d-reconstruction cuda inverse-problems regularization tomography

Last synced: 10 Nov 2024

https://github.com/triagemd/tensorflow-builds

Tensorflow binaries and Docker images compiled with GPU support and CPU optimizations.

bazel cuda cudnn docker gpu machine-learning nvidia python tensorflow tensorflow-serving

Last synced: 20 Nov 2024

https://github.com/krassowski/gsea-api

Pandas API for multiple Gene Set Enrichment Analysis implementations in Python (GSEApy, cudaGSEA, GSEA)

bioinformatics cuda enrichment gene-set-enrichment gene-sets gsea pandas pathway-analysis python3 transcriptomics

Last synced: 13 Jan 2025

https://github.com/yashassamaga/convolutionbuildingblocks

GEMM and Winograd based convolutions using CUTLASS

convolution cuda cutlass deep-learning

Last synced: 03 Dec 2024