Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/openmlsys/openmlsys-cuda

Tutorials for writing high-performance GPU operators in AI frameworks.

cuda gpu machine-learning

Last synced: 16 Nov 2024

https://github.com/pinto0309/dmhead

Dual model head pose estimation. Fusion of SOTA models. 360° 6D HeadPose detection. All pre-processing and post-processing are fused together, allowing end-to-end processing in a single inference.

6d cuda head-pose-estimation headpose-detection headpose-estimation models onnx tensorrt

Last synced: 09 Nov 2024

https://github.com/Ar-Ray-code/darknet_ros_fp16

darknet + ROS2 Humble + OpenCV4 + CUDA 11(cuDNN, Jetson Orin)

cuda cudnn darknet object-detection opencv4 ros ros2-foxy yolo yolo-tiny yolov3 yolov7

Last synced: 27 Oct 2024

https://github.com/sh1ng/arboretum

Gradient Boosting powered by GPU(NVIDIA CUDA)

arboretum cuda gpu gradient-boosting gradient-boosting-machine machine-learning python

Last synced: 16 Nov 2024

https://github.com/ztxtech/Time-Evidence-Fusion-Network

Official implementation of "Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting" (https://arxiv.org/abs/2405.06419)

cuda deep-learning machine-learning macos neural-network neural-networks pytorch time-series time-series-analysis time-series-forecasting time-series-prediction uestc

Last synced: 02 Nov 2024

https://github.com/open-atmos/pysdm

Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab

atmospheric-modelling atmospheric-physics cuda gpu gpu-computing monte-carlo-simulation numba nvrtc particle-system physics-simulation pint pypi-package python research simulation thrust

Last synced: 29 Jan 2025

https://github.com/elftausend/custos

A minimal OpenCL, CUDA, Vulkan and host CPU array manipulation engine / framework.

array-manipulations autograd automatic-differentiation cpu cuda cuda-support custos framework gpu lazy-evaluation no-std opencl rust vulkan wgsl

Last synced: 10 Jan 2025

https://github.com/saddam213/llamastack

ASP.NET Core Web, WebApi & WPF implementations for LLama.cpp & LLamaSharp

alpaca chatgpt cuda huggingface llama llama2 llamacpp llamasharp llm

Last synced: 20 Jan 2025

https://github.com/jpuigcerver/pytorch-baidu-ctc

PyTorch bindinga for Baidu's Warp-CTC

ctc-loss cuda pytorch

Last synced: 25 Nov 2024

https://github.com/fynv/thrustrtc

CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.

cuda nvrtc thrust

Last synced: 06 Nov 2024

https://github.com/goldsborough/k-means

Code accompanying my blog post on k-means in Python, C++ and CUDA

cpp cuda k-means machine-learning parallel python

Last synced: 29 Jan 2025

https://github.com/lukeyeager/cmake-cuda-example

Example of how to use CUDA with CMake >= 3.8

cmake cuda

Last synced: 29 Oct 2024

https://github.com/dakenf/stable-diffusion-nodejs

GPU-accelerated javascript runtime for StableDiffusion. Uses modified ONNX runtime to support CUDA and DirectML.

cuda directml nodejs stable-diffusion typescript

Last synced: 08 Nov 2024

https://github.com/NickKarpowicz/LightwaveExplorer

An efficient, user-friendly solver for nonlinear light-matter interaction

c-plus-plus cuda nonlinear-optics oneapi optics-simulation simulation sycl

Last synced: 05 Nov 2024

https://github.com/brickray/gpu-pathtracer

physically based path tracer on gpu

cuda gpu pathtracing raytracing tracing

Last synced: 14 Nov 2024

https://github.com/open-atmos/PySDM

Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab

atmospheric-modelling atmospheric-physics cuda gpu gpu-computing monte-carlo-simulation numba nvrtc particle-system physics-simulation pint pypi-package python research simulation thrust

Last synced: 05 Nov 2024

https://github.com/rokibulislaam/colab-ffmpeg-cuda

FFmpeg build with CUDA support for Linux (especially for Google Colab)

colab-notebook cuda ffmpeg ffmpeg-installer h264 h265 hevc-encoder nvenc ubuntu1804

Last synced: 08 Nov 2024

https://github.com/tomrunia/pytorchsteerablepyramid

PyTorch implementation of the Complex Steerable Pyramid

batch computer-vision cuda image-processing mkl pyramid pytorch

Last synced: 13 Nov 2024

https://github.com/loeeeee/immich-in-lxc

Install Immich in LXC with optional CUDA support

bare-metal cuda guide immich install-script lxc machine-learning proxmox-ve ubuntu

Last synced: 20 Jan 2025

https://github.com/DefTruth/ffpa-attn-mma

📚[WIP] FFPA: Yet antother Faster Flash Prefill Attention with O(1)⚡️GPU SRAM complexity for headdim > 256, 1.8x~3x↑🎉faster vs SDPA EA.

attention cuda flash-attention mlsys sdpa tensor-cores

Last synced: 27 Jan 2025

https://github.com/denzp/rust-ptx-builder

Convenient `build.rs` helper for NVPTX crates

cuda nvptx rust

Last synced: 27 Oct 2024

https://github.com/gunrock/loops

🎃 GPU load-balancing library for regular and irregular computations.

cuda gpu gpu-computing hpc load-balancing parallel

Last synced: 11 Nov 2024

https://github.com/owensgroup/BGHT

BGHT: High-performance static GPU hash tables.

cuckoo cuda gpu hashing hashmap

Last synced: 19 Nov 2024

https://github.com/emptysoal/cuda-image-preprocess

Speed up image preprocess with cuda when handle image or tensorrt inference

cnn cuda cuda-demo cuda-kernels cuda-programming deep-learning image-processing tensorrt

Last synced: 06 Dec 2024

https://github.com/jeng1220/openacc_fortran_examples

Simple OpenACC Fortran Examples

cuda fortran openacc

Last synced: 28 Oct 2024

https://github.com/rbaygildin/learn-gpgpu

Algorithms implemented in CUDA + resources about GPGPU

cublas cuda curand gpgpu gpu gpu-computing image-processing nvidia opencl parallel-computing pycuda

Last synced: 19 Nov 2024

https://github.com/par4all/par4all

Par4All is an automatic parallelizing and optimizing compiler (workbench) for C and Fortran sequential programs

abstract-interpretation automatic-parallelization c99 cuda fortran interprocedural opencl parallelization polyhedral-model

Last synced: 12 Oct 2024

https://github.com/ctuning/ctuning-programs

Collective Knowledge extension with unified and customizable benchmarks (with extensible JSON meta information) to be easily integrated with customizable and portable Collective Knowledge workflows. You can easily compile and run these benchmarks using different compilers, environments, hardware and OS (Linux, MacOS, Windows, Android). More info:

c collaborative-benchmarking collaborative-optimization collective-knowledge common-benchmarks cpp crowd-benchmarking crowd-tuning cuda customizable-benchmarking fortran json-api json-metadata open-benchmarks opencl reproducible-research reproducible-workflows

Last synced: 13 Nov 2024

https://github.com/wizyoung/optical-flow-gpu-docker

Compute dense optical flow using TV-L1 algorithm with NVIDIA GPU acceleration.

cuda gpu optical-flow tvl1

Last synced: 17 Nov 2024

https://github.com/Par4All/par4all

Par4All is an automatic parallelizing and optimizing compiler (workbench) for C and Fortran sequential programs

abstract-interpretation automatic-parallelization c99 cuda fortran interprocedural opencl parallelization polyhedral-model

Last synced: 09 Nov 2024

https://github.com/khrylx/dsgpuraytracing

A GPU-based ray tracer using CUDA

cuda gpu raytracer raytracing

Last synced: 21 Nov 2024

https://github.com/bruce-lee-ly/cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

cublas cuda cuda-core gemm gemv gpu hgemm hgemv matrix-multiply nvidia tensor-core

Last synced: 19 Dec 2024

https://github.com/Bruce-Lee-LY/cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

cublas cuda cuda-core gemm gemv gpu hgemm hgemv matrix-multiply nvidia tensor-core

Last synced: 19 Nov 2024

https://github.com/1ytic/warp-rna

Recurrent Neural Aligner

cuda forward-backward rna rnn-transducer

Last synced: 15 Dec 2024

https://github.com/ingonyama-zk/fast-danksharding

Danksharding Builder with GPU acceleration

cuda danksharding icicle rust

Last synced: 14 Nov 2024

https://github.com/denzp/rust-ptx-linker

The missing puzzle piece for NVPTX experience with Rust

cuda linker llvm nvptx rust

Last synced: 27 Oct 2024

https://github.com/jefflarkin/openacc-interoperability

Interoperability examples for OpenACC.

cuda fortran gpu openacc

Last synced: 05 Dec 2024

https://github.com/kevinzakka/learn-cuda

Learning some parallel programming with CUDA

cuda gpu

Last synced: 28 Oct 2024

https://github.com/goldbattle/libelas-gpu

Implementation of LIBELAS in cuda.

cpu cuda depth-maps gpu libelas libelas-gpu

Last synced: 06 Nov 2024

https://github.com/abraham-ai/eden

Eden converts your python function into a hosted endpoint with minimal changes to your existing code :mage_man:

celery cuda fastapi python redis-client task-queue

Last synced: 09 Oct 2024

https://github.com/STEllAR-GROUP/octotiger

Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees

astrophysics cuda cuda-kernels hpx kokkos simd stellar-mergers sycl

Last synced: 05 Nov 2024

https://github.com/stellar-group/octotiger

Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees

astrophysics cuda cuda-kernels hpx kokkos simd stellar-mergers sycl

Last synced: 12 Nov 2024

https://github.com/gangliao/bazel.cmake

bazel.cmake mimics the behavior of bazel to simplify the usability of CMake

bazel cmake cpp11 cuda golang

Last synced: 02 Dec 2024

https://github.com/enp1s0/ozimmu

FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme

cuda gemm mixed-precision tensorcore tensorcores

Last synced: 06 Nov 2024

https://github.com/govertb/GPUGraphLayout

An experimental GPU accelerated implementation of ForceAtlas2

cuda forceatlas2 gephi graph-algorithms graph-layout social-network-analysis visualization

Last synced: 05 Nov 2024

https://github.com/kibae/pg_onnx

pg_onnx: ONNX Runtime integrated with PostgreSQL. Perform ML inference with data in your database.

ai contributions-welcome cuda deep-learning inference machine-learning onnx onnxruntime postgresql postgresql-extension

Last synced: 21 Nov 2024

https://github.com/lucasdelimanogueira/PyNorch

Recreating PyTorch from scratch (C/C++, CUDA and Python, with GPU support and automatic differentiation!)

c cuda deep-learning neural-network python pytorch

Last synced: 08 Jan 2025

https://github.com/enp1s0/ozIMMU

FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme

cuda gemm mixed-precision tensorcore tensorcores

Last synced: 05 Nov 2024

https://github.com/chiehpower/Setup-deeplearning-tools

Set up CI in DL/ cuda/ cudnn/ TensorRT/ onnx2trt/ onnxruntime/ onnxsim/ Pytorch/ Triton-Inference-Server/ Bazel/ Tesseract/ PaddleOCR/ NVIDIA-docker/ minIO/ Supervisord on AGX or PC from scratch.

agx ci cuda cudnn deep-learning docker installation minio nvidia onnx-simplifier onnx2trt onnxruntime paddleocr pytorch supervisord tensorrt tensorrt-inference-server tesseract-ocr triton-inference-server triton-server

Last synced: 28 Oct 2024

https://github.com/lucidrains/autoregressive-linear-attention-cuda

CUDA implementation of autoregressive linear attention, with all the latest research findings

artificial-intelligence attention-mechanisms cuda deep-learning linear-attention

Last synced: 22 Oct 2024

https://github.com/yehengchen/ubuntu-deep-learning-environment-setup

Guide to installing Tensorflow with NVIDIA GPU and Deep learning enviroment - Nvidia Drivers/cuda/cuDNN/tensorflow-gpu/中文文档

cuda cudnn deep-learning nvidia-gpu tensorflow tensorflow-gpu ubuntu

Last synced: 30 Nov 2024

https://github.com/Natsu-Akatsuki/RangeNetTrt8

tensorrt8 && cuda && libtorch implementation of rangenet++

cuda libtorch semantic-segmentation tensorrt

Last synced: 27 Oct 2024

https://github.com/AstroAccelerateOrg/astro-accelerate

AstroAccelerate is a many-core accelerated software package for processing time-domain radio-astronomy data.

cuda gpu radio-astronomy

Last synced: 02 Nov 2024

https://github.com/js1010/cusim

Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA)

cuda gensim gpu lda topic-modeling w2v word-embedding

Last synced: 02 Nov 2024

https://github.com/autodesk/neon

Multi-GPU Framework for Voxel Grid Computations

cuda gpu gpu-acceleration grid hpc lbm parallel parallel-computing

Last synced: 19 Dec 2024

https://github.com/luigifcruz/blade

Beamforming & Stuff ™

astronomy cuda dsp gpu

Last synced: 05 Nov 2024

https://github.com/deftruth/cuhgemm-py

⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, achieve peak⚡️ performance

cuda hgemm tensor-cores

Last synced: 09 Jan 2025

https://github.com/ProjectPhysX/PTXprofiler

A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.

cuda gpu gpu-acceleration gpu-computing gpu-programming hpc nvidia nvidia-cuda nvidia-gpu opencl profiler ptx ptx-utils roofline-model sycl

Last synced: 05 Nov 2024

https://github.com/projectphysx/ptxprofiler

A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.

cuda gpu gpu-acceleration gpu-computing gpu-programming hpc nvidia nvidia-cuda nvidia-gpu opencl profiler ptx ptx-utils roofline-model sycl

Last synced: 08 Nov 2024

https://github.com/abhisheknair10/llama3.cu

Lightweight Llama 3 8B Inference Engine in CUDA C

cuda llama llm-inference

Last synced: 21 Jan 2025

https://github.com/weft/warp

continuous energy monte carlo neutron transport in general geometries on GPUs

carlo cuda gpu monte monte-carlo neutron transport

Last synced: 05 Nov 2024

https://github.com/lwYeo/SoliditySHA3Miner

All-in-one mixed multi-GPU (nVidia, AMD, Intel) & CPU miner solves proof of work to mine supported EIP918 tokens in a single instance (with API).

0xbitcoin amdminer cpuminer cuda ethos gpu-miner gpu-mining gpumining hiveos igpu linux miner nvidia-miner opencl solo-mining windows-10

Last synced: 13 Nov 2024

https://github.com/andi611/apriori-and-eclat-frequent-itemset-mining

Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.

apriori apriori-algorithm cuda data-mining data-mining-algorithms eclat eclat-algorithm frequent-itemset-mining frequent-itemsets frequent-pattern-mining gcc gpu gpu-acceleration gpu-programming plot pycuda python transaction transactions

Last synced: 07 Nov 2024

https://github.com/r00tman/eventhands

Real-Time Neural 3D Hand Pose Estimation from an Event Stream [ICCV 2021]

computer-vision cuda dataset deep-learning event-camera hand-pose hand-pose-estimation hand-tracking iccv2021 mano opengl pytorch smpl

Last synced: 09 Dec 2024

https://github.com/deftruth/ffpa-attn-mma

📚[WIP] FFPA: Yet antother Faster Flash Prefill Attention with O(1)🎉GPU SRAM complexity for headdim > 256, 1.5x~2x🎉faster vs SDPA EA.

attention cuda flash-attention mlsys sdpa tensor-cores

Last synced: 13 Jan 2025

https://github.com/harrism/ranger

Generate simple index ranges in C++ and CUDA C++

cpp cuda loops ranges

Last synced: 28 Oct 2024

https://github.com/sskorol/vosk-api-gpu

Vosk ASR Docker images with GPU for Jetson boards, PCs, M1 laptops and GPC

asr cuda docker gcp gpu jetson jetson-nano jetson-xavier-nx m1 nvidia nvidia-docker vosk vosk-api

Last synced: 28 Oct 2024

https://github.com/termoshtt/link_cuda_kernel

HowTo: Compile CUDA with nvcc, and link to Rust

cuda nvcc rust

Last synced: 10 Nov 2024

https://github.com/mravanelli/pytorch_mlp_for_asr

This code implements a basic MLP for speech recognition. The MLP is trained with pytorch, while feature extraction, alignments, and decoding are performed with Kaldi. The current implementation supports dropout and batch normalization. An example for phoneme recognition using the standard TIMIT dataset is provided.

asr cuda deep-learning deep-neural-networks feedforward-neural-network kaldi kaldi-asr mlp multilayer-perceptron neural-networks python pytorch speech-recognition timit

Last synced: 02 Dec 2024

https://github.com/mravanelli/pytorch_MLP_for_ASR

This code implements a basic MLP for speech recognition. The MLP is trained with pytorch, while feature extraction, alignments, and decoding are performed with Kaldi. The current implementation supports dropout and batch normalization. An example for phoneme recognition using the standard TIMIT dataset is provided.

asr cuda deep-learning deep-neural-networks feedforward-neural-network kaldi kaldi-asr mlp multilayer-perceptron neural-networks python pytorch speech-recognition timit

Last synced: 27 Nov 2024

https://github.com/andravin/spio

Efficient CUDA kernels for training convolutional neural networks with PyTorch.

convolutional-neural-networks cuda pytorch

Last synced: 22 Nov 2024

https://github.com/gangliao/VS-Code-Cuda

support cuda grammars in Visual Studio Code

cuda vs vs-code vscode-extension

Last synced: 23 Oct 2024

https://github.com/quantumbfs/cuyao.jl

CUDA extension for Yao.jl

circuit cuda gpu quantum yao

Last synced: 06 Nov 2024

https://github.com/wdmapp/gtensor

GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.

cpp cpp14 cuda gpu hacktoberfest rocm sycl

Last synced: 05 Nov 2024

https://github.com/gangliao/vs-code-cuda

support cuda grammars in Visual Studio Code

cuda vs vs-code vscode-extension

Last synced: 02 Dec 2024

https://github.com/rust-av/nvidia-video-codec-rs

Bindings for the NVIDIA Video Codec SDK

cuda cuvid nvenc nvidia rust rust-av

Last synced: 19 Nov 2024

https://github.com/pwhiddy/pybind11-cuda

Template for GPU accelerated python libraries

cuda gpu numpy pybind11 python

Last synced: 26 Nov 2024

https://github.com/davidalgis/interopunitycuda

Demonstrate interoperability between Unity Engine and CUDA

cpp cuda dx11 gpu gpu-acceleration native-plugin opengl unity unity3d

Last synced: 10 Nov 2024

https://github.com/pkestene/euler_kokkos

Compressible hydro and magneto-hydrodynamics (2nd order Godunov) implemented with MPI+Kokkos

cea cfd cmake cpp cuda finite-volume finite-volume-method fluid-dynamics gpu kokkos magnetohydrodynamics mpi parallel-computing parallelism performance-portability

Last synced: 18 Dec 2024

https://github.com/AFD-Illinois/kharma

Kokkos-based High-Accuracy Relativistic Magnetohydrodynamics with AMR

cuda gpu grmhd hip kokkos mhd openmp sycl

Last synced: 05 Nov 2024

https://github.com/adrianpangithub/houdinipackage

Publish some small parts in my personal daily-used Houdini accessories

city cuda gpu houdini landscape pcg terrain

Last synced: 31 Oct 2024

https://github.com/pinto0309/facemesh_onnx_tensorrt

Verify that the post-processing merged into FaceMesh works correctly. The object detection model can be anything other than BlazeFace. YOLOv4 and FaceMesh committed to this repository have modified post-processing.

cuda facemesh onnx python tensorrt

Last synced: 22 Oct 2024

https://github.com/sdpython/onnx-extended

New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA

cuda machine-learning onnx onnxruntime python

Last synced: 23 Jan 2025

https://github.com/ashermancinelli/cxbqn

BQN virtual machine

apl bqn cplusplus cpp20 cuda

Last synced: 09 Nov 2024