Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/OpenPPL/ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

caffe cuda deep-learning neural-network onnx open-source pytorch quantization

Last synced: 28 Oct 2024

https://github.com/sniklaus/3d-ken-burns

an implementation of 3D Ken Burns Effect from a Single Image using PyTorch

cuda cupy deep-learning python pytorch

Last synced: 25 Jan 2025

https://github.com/openppl-public/ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

caffe cuda deep-learning neural-network onnx open-source pytorch quantization

Last synced: 05 Oct 2024

https://github.com/kevmo314/scuda

SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.

cublas cuda cudnn gpu mlops networking nvml remote-access

Last synced: 25 Jan 2025

https://github.com/pytorch/ao

PyTorch native quantization and sparsity for training and inference

brrr cuda dtypes float8 inference llama mx offloading optimizer pytorch quantization sparsity training transformer

Last synced: 23 Jan 2025

https://github.com/m4rs-mt/ilgpu

ILGPU JIT Compiler for high-performance .Net GPU programs

amd cil compiler cpu cuda dotnet gpgpu gpgpu-computing gpu ilgpu intel jit kernels msil nvidia opencl parallel ptx

Last synced: 23 Jan 2025

https://github.com/godweiyang/nn-cuda-example

Several simple examples for popular neural network toolkits calling custom CUDA operators.

cpp cuda neural-network python pytorch tensorflow

Last synced: 27 Jan 2025

https://github.com/m4rs-mt/ILGPU

ILGPU JIT Compiler for high-performance .Net GPU programs

amd cil compiler cpu cuda dotnet gpgpu gpgpu-computing gpu ilgpu intel jit kernels msil nvidia opencl parallel ptx

Last synced: 11 Nov 2024

https://github.com/AlexiaJM/Deep-learning-with-cats

Deep learning with cats (^._.^)

cat cuda deep-learning gan picture

Last synced: 27 Nov 2024

https://github.com/alexiajm/deep-learning-with-cats

Deep learning with cats (^._.^)

cat cuda deep-learning gan picture

Last synced: 27 Jan 2025

https://github.com/flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

cuda flash-attention gpu jit large-large-models llm-inference pytorch

Last synced: 23 Jan 2025

https://github.com/koide3/fast_gicp

A collection of GICP-based fast point cloud registration algorithms

cpp cuda gicp gpu icp multithreading pcl point-cloud python registration scan-matching vgicp

Last synced: 23 Jan 2025

https://github.com/FeiYull/TensorRT-Alpha

🔥🔥🔥TensorRT for YOLOv8、YOLOv8-Pose、YOLOv8-Seg、YOLOv8-Cls、YOLOv7、YOLOv6、YOLOv5、YOLONAS......🚀🚀🚀CUDA IS ALL YOU NEED.🍎🍎🍎

cuda efficientdet libfacedetection rt-detr tensorrt u2net yolonas yolor yolov3 yolov4 yolov5 yolov6 yolov7 yolov8 yolov8-pose yolov8-seg yolox

Last synced: 09 Nov 2024

https://github.com/DefTruth/CUDA-Learn-Notes

🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

block-reduce cuda cuda-programming elementwise flash-attention flash-attention-2 flash-attention-3 gemm gemv layernorm pytorch rmsnorm softmax triton warp-reduce

Last synced: 27 Oct 2024

https://github.com/godweiyang/NN-CUDA-Example

Several simple examples for popular neural network toolkits calling custom CUDA operators.

cpp cuda neural-network python pytorch tensorflow

Last synced: 28 Oct 2024

https://github.com/feiyull/tensorrt-alpha

🔥🔥🔥TensorRT for YOLOv8、YOLOv8-Pose、YOLOv8-Seg、YOLOv8-Cls、YOLOv7、YOLOv6、YOLOv5、YOLONAS......🚀🚀🚀CUDA IS ALL YOU NEED.🍎🍎🍎

cuda efficientdet libfacedetection rt-detr tensorrt u2net yolonas yolor yolov3 yolov4 yolov5 yolov6 yolov7 yolov8 yolov8-pose yolov8-seg yolox

Last synced: 24 Jan 2025

https://github.com/kwea123/ngp_pl

Instant-ngp in pytorch+cuda trained with pytorch-lightning (high quality with high speed, with only few lines of legible code)

3d-reconstruction cuda instant-ngp nerf novel-view-synthesis pytorch pytorch-lightning

Last synced: 26 Jan 2025

https://github.com/marian-nmt/marian

Fast Neural Machine Translation in C++

cuda fast gpu neural-machine-translation

Last synced: 25 Jan 2025

https://github.com/deepgraphlearning/graphvite

GraphVite: A General and High-performance Graph Embedding System

cuda data-visualization gpu knowledge-graph machine-learning network-embedding representation-learning

Last synced: 24 Jan 2025

https://github.com/juliagpu/cuda.jl

CUDA programming in Julia.

cuda gpu hacktoberfest julia

Last synced: 22 Jan 2025

https://github.com/DeepGraphLearning/graphvite

GraphVite: A General and High-performance Graph Embedding System

cuda data-visualization gpu knowledge-graph machine-learning network-embedding representation-learning

Last synced: 07 Nov 2024

https://github.com/JuliaGPU/CUDA.jl

CUDA programming in Julia.

cuda gpu hacktoberfest julia

Last synced: 19 Nov 2024

https://github.com/chengzeyi/stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

cuda deeplearnng diffusers inference-engines openai-triton performance-optimizations pytorch stable-diffusion stable-video-diffusion torch

Last synced: 23 Jan 2025

https://github.com/beehive-lab/tornadovm

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages

ai cuda gpu-acceleration gpu-computing gpus graalvm java levelzero multi-core opencl parallel-computing parallel-programming spirv

Last synced: 23 Jan 2025

https://github.com/nvidia/matx

An efficient C++17 GPU numerical computing library with Python-like syntax

cuda gpgpu gpu gpu-computing hpc

Last synced: 24 Jan 2025

https://github.com/NVIDIA/MatX

An efficient C++17 GPU numerical computing library with Python-like syntax

cuda gpgpu gpu gpu-computing hpc

Last synced: 30 Oct 2024

https://github.com/beehive-lab/TornadoVM

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages

ai cuda gpu-acceleration gpu-computing gpus graalvm java levelzero multi-core opencl spirv

Last synced: 05 Nov 2024

https://mratsim.github.io/Arraymancer/

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends

autograd automatic-differentiation cuda cudnn deep-learning gpgpu gpu-computing high-performance-computing iot linear-algebra machine-learning matrix-library multidimensional-arrays ndarray neural-networks nim opencl openmp parallel-computing tensor

Last synced: 14 Nov 2024

https://github.com/mratsim/arraymancer

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends

autograd automatic-differentiation cuda cudnn deep-learning gpgpu gpu-computing high-performance-computing iot linear-algebra machine-learning matrix-library multidimensional-arrays ndarray neural-networks nim opencl openmp parallel-computing tensor

Last synced: 25 Jan 2025

https://github.com/mratsim/Arraymancer

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends

autograd automatic-differentiation cuda cudnn deep-learning gpgpu gpu-computing high-performance-computing iot linear-algebra machine-learning matrix-library multidimensional-arrays ndarray neural-networks nim opencl openmp parallel-computing tensor

Last synced: 08 Nov 2024

https://github.com/withcatai/node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level

ai bindings catai cmake cmake-js cuda embedding function-calling gguf gpu grammar json-schema llama llama-cpp llm metal nodejs prebuilt-binaries self-hosted vulkan

Last synced: 22 Jan 2025

https://github.com/markus-perl/ffmpeg-build-script

The FFmpeg build script provides an easy way to build a static FFmpeg on OSX and Linux with non-free codecs included.

apple-m1-silicon av1 cuda debian fdk-aac ffmpeg ffmpeg-installer ffmpeg-linux ffmpeg-mac h264 h265 mp3 mp3-to-pcm ogg osx theora webm webm-conversion x264 x265

Last synced: 24 Jan 2025

https://github.com/BBuf/how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

cuda llm

Last synced: 27 Oct 2024

https://github.com/sniklaus/sepconv-slomo

an implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

cuda cupy deep-learning python pytorch

Last synced: 26 Jan 2025

https://github.com/lebedov/scikit-cuda

Python interface to GPU-powered libraries

blas cublas cuda cufft cusolver gpu lapack numerical pycuda python

Last synced: 24 Jan 2025

https://github.com/anibali/docker-pytorch

A Docker image for PyTorch

cuda docker docker-image pytorch

Last synced: 26 Jan 2025

https://github.com/mrnerf/gaussian-splatting-cuda

3D Gaussian Splatting, reimagined: Unleashing unmatched speed with C++ and CUDA from the ground up!

computer-graphics computer-vision cuda gaussian-splatting nerf optimization

Last synced: 24 Jan 2025

https://github.com/mp3guy/kintinuous

Real-time large scale dense visual SLAM system

cuda reconstruction slam

Last synced: 27 Jan 2025

https://github.com/mp3guy/Kintinuous

Real-time large scale dense visual SLAM system

cuda reconstruction slam

Last synced: 07 Nov 2024

https://github.com/acceleratehs/accelerate

Embedded language for high-performance array computations

accelerate cuda gpu gpu-computing hacktoberfest haskell llvm parallel-computing

Last synced: 22 Jan 2025

https://github.com/MrNeRF/gaussian-splatting-cuda

3D Gaussian Splatting, reimagined: Unleashing unmatched speed with C++ and CUDA from the ground up!

computer-graphics computer-vision cuda gaussian-splatting nerf optimization

Last synced: 07 Nov 2024

https://github.com/AccelerateHS/accelerate

Embedded language for high-performance array computations

accelerate cuda gpu gpu-computing hacktoberfest haskell llvm parallel-computing

Last synced: 18 Nov 2024

https://github.com/mind/wheels

Performance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)

ai avx avx2 cuda fma gpu machine-learning ml optimization sse41 sse42 tensorflow wheel

Last synced: 24 Jan 2025

https://github.com/jgbit/vuda

VUDA is a header-only library based on Vulkan that provides a CUDA Runtime API interface for writing GPU-accelerated applications.

cuda vuda vulkan

Last synced: 27 Jan 2025

https://github.com/cyclenerd/ethereum_nvidia_miner

💰 USB flash drive ISO image for Ethereum, Zcash and Monero mining with NVIDIA graphics cards and Ubuntu GNU/Linux (headless)

cuda ethereum ethereum-mining ethminer graphics-card iso iso-image linux mining monero monero-mining nvidia nvidia-card nvidia-gpu nvidia-gpus nvidia-smi ubuntu ubuntu1604 zcash zcash-mining

Last synced: 19 Jan 2025

https://github.com/babitmf/bmf

Cross-platform, customizable multimedia/video processing framework. With strong GPU acceleration, heterogeneous design, multi-language support, easy to use, multi-framework compatible and high performance, the framework is ideal for transcoding, AI inference, algorithm integration, live video streaming, and more.

ai arm bmf bytedance cpp cross-platform cuda ffmpeg gpu heterogeneous live-video mediacodec multimedia numpy nvidia opencv python tensorrt transcode x86-64

Last synced: 23 Jan 2025

https://github.com/Cyclenerd/ethereum_nvidia_miner

💰 USB flash drive ISO image for Ethereum, Zcash and Monero mining with NVIDIA graphics cards and Ubuntu GNU/Linux (headless)

cuda ethereum ethereum-mining ethminer graphics-card iso iso-image linux mining monero monero-mining nvidia nvidia-card nvidia-gpu nvidia-gpus nvidia-smi ubuntu ubuntu1604 zcash zcash-mining

Last synced: 19 Nov 2024

https://github.com/thu-ml/sageattention

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

attention cuda inference-acceleration llm quantization triton video-generation

Last synced: 24 Jan 2025

https://github.com/e-ago/bitcracker

BitCracker is the first open source password cracking tool for memory units encrypted with BitLocker

attack bitcracker bitlocker cracking cryptography cuda decryption-algorithm gpgpu gpu hash john-the-ripper microsoft opencl password-cracker passwords windows

Last synced: 27 Jan 2025

https://github.com/Celebrandil/CudaSift

A CUDA implementation of SIFT for NVidia GPUs (1.2 ms on a GTX 1060)

cuda gpu nvidia sift vision

Last synced: 13 Nov 2024

https://github.com/tracel-ai/cubecl

Multi-platform high-performance compute language extension for Rust.

cuda gpgpu gpu jit linalg rust webgpu

Last synced: 25 Jan 2025

https://github.com/zhihu/zhilight

A highly optimized LLM inference acceleration engine for Llama and its variants.

cpm cuda gpt inference-engine llama llm llm-serving minicpm pytorch qwen

Last synced: 25 Jan 2025

https://github.com/arrayfire/arrayfire-rust

Rust wrapper for ArrayFire

arrayfire cuda gpgpu gpu hpc opencl rust rust-bindings

Last synced: 22 Jan 2025

https://github.com/src-d/kmcuda

Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA

afk-mc2 cuda hacktoberfest kmeans knn-search machine-learning python yinyang

Last synced: 25 Jan 2025

https://github.com/BabitMF/bmf

Cross-platform, customizable multimedia/video processing framework. With strong GPU acceleration, heterogeneous design, multi-language support, easy to use, multi-framework compatible and high performance, the framework is ideal for transcoding, AI inference, algorithm integration, live video streaming, and more.

ai arm bmf bytedance cpp cross-platform cuda ffmpeg gpu heterogeneous live-video mediacodec multimedia numpy nvidia opencv python tensorrt transcode x86-64

Last synced: 07 Nov 2024

https://github.com/deadsix27/waifu2x-converter-cpp

Improved fork of Waifu2X C++ using OpenCL and OpenCV

2x amd cpp cuda cv intel nvidia opencl opencv upscale upscaler w2x waifu waifu2x waifu2x-converter-cpp

Last synced: 18 Jan 2025

https://github.com/DeadSix27/waifu2x-converter-cpp

Improved fork of Waifu2X C++ using OpenCL and OpenCV

2x amd cpp cuda cv intel nvidia opencl opencv upscale upscaler w2x waifu waifu2x waifu2x-converter-cpp

Last synced: 28 Oct 2024

https://github.com/bheisler/rustacuda

Rusty wrapper for the CUDA Driver API

cuda cuda-api gpu rust

Last synced: 24 Jan 2025

https://github.com/bheisler/RustaCUDA

Rusty wrapper for the CUDA Driver API

cuda cuda-api gpu rust

Last synced: 05 Nov 2024

https://github.com/ddemidov/amgcl

C++ library for solving large sparse linear systems with algebraic multigrid method

amg c-plus-plus cpp cuda gpgpu linear-solvers mpi multigrid opencl openmp scientific-computing sparse-linear-systems

Last synced: 25 Jan 2025

https://github.com/rapidsai/raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

anns building-blocks clustering cuda distance gpu information-retrieval linear-algebra llm machine-learning nearest-neighbors neighborhood-methods primitives random-sampling solvers sparse statistics vector-search vector-similarity vector-store

Last synced: 23 Jan 2025

https://github.com/QPT-Family/QPT

[内测中]QPT - 致力于让开源项目更好通往互联网世界的Python to EXE工具(Python打包)。

cuda deep-learning dml gpu noavx paddlepaddle pypi python qpt

Last synced: 18 Dec 2024

https://github.com/qpt-family/qpt

[内测中]QPT - 致力于让开源项目更好通往互联网世界的Python to EXE工具(Python打包)。

cuda deep-learning dml gpu noavx paddlepaddle pypi python qpt

Last synced: 27 Jan 2025

https://github.com/efeslab/nanoflow

A throughput-oriented high-performance serving framework for LLMs

cuda inference llama2 llm llm-serving model-serving

Last synced: 25 Jan 2025

https://github.com/ddemidov/vexcl

VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP

c-plus-plus cpp11 cuda gpgpu opencl scientific-computing

Last synced: 24 Jan 2025

https://github.com/xtra-computing/thundergbm

ThunderGBM: Fast GBDTs and Random Forests on GPUs

cuda gbdt gpu machine-learning random-forest

Last synced: 22 Jan 2025

https://github.com/mp3guy/icpcuda

Super fast implementation of ICP in CUDA for compute capable devices 3.5 or higher

cuda icp

Last synced: 25 Jan 2025

https://github.com/Xtra-Computing/thundergbm

ThunderGBM: Fast GBDTs and Random Forests on GPUs

cuda gbdt gpu machine-learning random-forest

Last synced: 07 Nov 2024

https://github.com/shibatch/sleef

SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT

aarch64 android arm avx avx512 cuda elementary-functions fft ios math-library neon powerpc quadruple-precision s390x simd sse2 sve vector-math vectorization vsx

Last synced: 24 Jan 2025

https://github.com/thu-ml/SageAttention

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

attention cuda inference-acceleration llm quantization triton video-generation

Last synced: 15 Dec 2024

https://github.com/maghoumi/pytorch-softdtw-cuda

Fast CUDA implementation of (differentiable) soft dynamic time warping for PyTorch

cuda deep-learning dynamic-time-warping pytorch soft-dtw

Last synced: 26 Jan 2025