Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with cuda

A curated list of projects in awesome lists tagged with cuda .

https://github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda gpt hpu inference inferentia llama llm llm-serving llmops mlops model-serving pytorch rocm tpu trainium transformer xpu

Last synced: 22 Dec 2024

https://github.com/hashcat/hashcat

World's fastest and most advanced password recovery utility

c cracking cuda gpgpu hashcat hashes opencl password

Last synced: 16 Dec 2024

https://github.com/NVIDIA/nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs

cuda docker gpu nvidia-docker

Last synced: 26 Oct 2024

https://github.com/nvidia/nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs

cuda docker gpu nvidia-docker

Last synced: 29 Sep 2024

https://github.com/kaldi-asr/kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

c-plus-plus cuda kaldi shell speaker-id speaker-verification speech speech-recognition speech-to-text

Last synced: 16 Dec 2024

https://github.com/numba/numba

NumPy aware dynamic Python compiler using LLVM

compiler cuda llvm numba numpy parallel python

Last synced: 16 Dec 2024

https://github.com/srush/GPU-Puzzles

Solve puzzles. Learn CUDA.

cuda machine-learning puzzles

Last synced: 30 Oct 2024

https://github.com/srush/gpu-puzzles

Solve puzzles. Learn CUDA.

cuda machine-learning puzzles

Last synced: 17 Dec 2024

https://github.com/vosen/ZLUDA

CUDA on non-NVIDIA GPUs

cuda rust

Last synced: 01 Nov 2024

https://github.com/vosen/zluda

CUDA on non-NVIDIA GPUs

cuda rust

Last synced: 17 Dec 2024

https://github.com/catboost/catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

big-data catboost categorical-features coreml cuda data-mining data-science decision-trees gbdt gbm gpu gpu-computing gradient-boosting kaggle machine-learning python r tutorial

Last synced: 16 Dec 2024

https://github.com/kroma-network/tachyon

Modular ZK(Zero Knowledge) backend accelerated by GPU

blockchain c-plus-plus cpp17 cryptocurrency cryptography cuda kroma tachyon zero-knowledge zk

Last synced: 17 Dec 2024

https://github.com/hybridgroup/gocv

Go package for computer vision using OpenCV 4 and beyond. Includes support for DNN, CUDA, OpenCV Contrib, and OpenVINO.

computer-vision computervision cuda dnn face-tracking gocv golang image-processing mjpeg mjpeg-stream object-classification object-tracking onnx opencv openvino tensorflow video video-capture yolo

Last synced: 16 Dec 2024

https://github.com/sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

cuda inference llama llama2 llama3 llama3-1 llava llm llm-serving moe pytorch transformer vlm

Last synced: 22 Dec 2024

https://github.com/nvidia/cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

cuda cuda-driver-api cuda-kernels cuda-opengl

Last synced: 17 Dec 2024

https://github.com/NVIDIA/cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

cuda cuda-driver-api cuda-kernels cuda-opengl

Last synced: 27 Oct 2024

https://github.com/oneflow-inc/oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

cuda deep-learning deep-neural-networks distributed machine-learning ml neural-network

Last synced: 17 Dec 2024

https://github.com/chainer/chainer

A flexible framework of neural networks for deep learning

chainer cuda cudnn cupy deep-learning gpu machine-learning neural-network neural-networks numpy python

Last synced: 17 Dec 2024

https://github.com/nvidia/cutlass

CUDA Templates for Linear Algebra Subroutines

cpp cuda deep-learning deep-learning-library gpu nvidia

Last synced: 16 Dec 2024

https://github.com/Oneflow-Inc/oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

cuda deep-learning deep-neural-networks distributed machine-learning ml neural-network

Last synced: 27 Oct 2024

https://github.com/NVIDIA/cutlass

CUDA Templates for Linear Algebra Subroutines

cpp cuda deep-learning deep-learning-library gpu nvidia

Last synced: 25 Oct 2024

https://github.com/nvidia/thrust

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

algorithms cpp cpp11 cpp14 cpp17 cpp20 cuda cxx cxx11 cxx14 cxx17 cxx20 gpu gpu-computing nvidia nvidia-hpc-sdk thrust

Last synced: 01 Nov 2024

https://github.com/NVIDIA/thrust

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

algorithms cpp cpp11 cpp14 cpp17 cpp20 cuda cxx cxx11 cxx14 cxx17 cxx20 gpu gpu-computing nvidia nvidia-hpc-sdk thrust

Last synced: 26 Oct 2024

https://github.com/chrxh/alien

ALIEN is a CUDA-powered artificial life simulation program.

agent-based-simulation artificial-life cuda open-ended-evolution physics-engine

Last synced: 17 Dec 2024

https://github.com/oaid/tengine

Tengine is a lite, high performance, modular inference engine for embedded device

acl arm artificial-intelligence cnn container cuda machine-learning mips npu nvdla onnx pytorch riscv supperedge tensorflow tensorrt x86-64

Last synced: 17 Dec 2024

https://github.com/OAID/Tengine

Tengine is a lite, high performance, modular inference engine for embedded device

acl arm artificial-intelligence cnn container cuda machine-learning mips npu nvdla onnx pytorch riscv supperedge tensorflow tensorrt x86-64

Last synced: 27 Oct 2024

https://github.com/arrayfire/arrayfire

ArrayFire: a general purpose GPU library.

arrayfire c c-plus-plus cpp cuda gpgpu gpu hpc opencl performance scientific-computing

Last synced: 16 Dec 2024

https://github.com/rapidsai/cuml

cuML - RAPIDS Machine Learning Library

cuda gpu machine-learning machine-learning-algorithms nvidia

Last synced: 16 Dec 2024

https://github.com/nvlabs/tiny-cuda-nn

Lightning fast C++/CUDA neural network framework

cuda deep-learning gpu mlp nerf neural-network pytorch real-time rendering

Last synced: 18 Dec 2024

https://github.com/rocm/hip

HIP: C++ Heterogeneous-Compute Interface for Portability

cuda hip hip-kernel-language hip-portability hip-runtime hipify

Last synced: 17 Dec 2024

https://github.com/NVlabs/tiny-cuda-nn

Lightning fast C++/CUDA neural network framework

cuda deep-learning gpu mlp nerf neural-network pytorch real-time rendering

Last synced: 27 Oct 2024

https://github.com/ROCm/HIP

HIP: C++ Heterogeneous-Compute Interface for Portability

cuda hip hip-kernel-language hip-portability hip-runtime hipify

Last synced: 25 Oct 2024

https://github.com/bytedance/lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

accelerate bart beam-search bert cuda diverse-decoding gpt inference multilingual-nmt sampling training transformer

Last synced: 18 Dec 2024

https://github.com/celtoys/remotery

Single C file, Realtime CPU/GPU Profiler with Remote Web Viewer

c cpu cuda d3d11 d3d12 gpu metal opengl profiler vulkan

Last synced: 18 Dec 2024

https://github.com/Celtoys/Remotery

Single C file, Realtime CPU/GPU Profiler with Remote Web Viewer

c cpu cuda d3d11 d3d12 gpu metal opengl profiler vulkan

Last synced: 27 Oct 2024

https://github.com/jittor/jittor

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

cuda deep-learning gpu jittor python

Last synced: 16 Dec 2024

https://github.com/Jittor/jittor

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

cuda deep-learning gpu jittor python

Last synced: 28 Oct 2024

https://github.com/uber/aresdb

A GPU-powered real-time analytics storage and query engine.

analytics cgo cuda data database golang gpu-programming query real-time storage

Last synced: 18 Dec 2024

https://github.com/iree-org/iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

compiler cuda jax machine-learning mlir pytorch runtime spirv tensorflow vulkan

Last synced: 14 Dec 2024

https://github.com/openxla/iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

compiler cuda jax machine-learning mlir pytorch runtime spirv tensorflow vulkan

Last synced: 09 Dec 2024

https://github.com/NVIDIA/Torch-TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

cuda deep-learning jetson libtorch machine-learning nvidia pytorch tensorrt

Last synced: 14 Dec 2024

https://github.com/pytorch/tensorrt

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

cuda deep-learning jetson libtorch machine-learning nvidia pytorch tensorrt

Last synced: 21 Dec 2024

https://github.com/diku-dk/futhark

:boom::computer::boom: A data-parallel functional programming language

boom compiler cuda futhark gpgpu gpu hacktoberfest hpc language opencl

Last synced: 20 Dec 2024

https://github.com/cvcuda/cv-cuda

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.

bytedance cloud computer-vision cpp cuda cv-cuda gpu image-processing machine-learning nvidia python

Last synced: 19 Dec 2024

https://github.com/pytorch/TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

cuda deep-learning jetson libtorch machine-learning nvidia pytorch tensorrt

Last synced: 06 Nov 2024

https://github.com/CVCUDA/CV-CUDA

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.

bytedance cloud computer-vision cpp cuda cv-cuda gpu image-processing machine-learning nvidia python

Last synced: 27 Oct 2024

https://nvidia.github.io/libcudacxx/

[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl

cpp cpp11 cpp14 cpp17 cpp20 cpp23 cuda cxx cxx11 cxx14 cxx17 cxx20 cxx23 gpu libcxx llvm nvidia nvidia-hpc-sdk standard std

Last synced: 01 Nov 2024

https://github.com/nvidia/libcudacxx

[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl

cpp cpp11 cpp14 cpp17 cpp20 cpp23 cuda cxx cxx11 cxx14 cxx17 cxx20 cxx23 gpu libcxx llvm nvidia nvidia-hpc-sdk standard std

Last synced: 28 Sep 2024

https://github.com/NVIDIA/libcudacxx

[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl

cpp cpp11 cpp14 cpp17 cpp20 cpp23 cuda cxx cxx11 cxx14 cxx17 cxx20 cxx23 gpu libcxx llvm nvidia nvidia-hpc-sdk standard std

Last synced: 09 Nov 2024

https://github.com/enpeizhao/cvprojects

computer vision projects | 计算机视觉相关好玩的AI项目(Python、C++、embedded system)

computer-vision cpp cuda deep-learning embedded-systems machine-learning python tensorrt

Last synced: 20 Dec 2024

https://github.com/coincheung/pytorch-loss

label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful

amsoftmax cuda dice-loss ema focal-loss label-smoothing lovasz-softmax mish partial-fc pytorch triplet-loss

Last synced: 19 Dec 2024

https://github.com/CoinCheung/pytorch-loss

label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful

amsoftmax cuda dice-loss ema focal-loss label-smoothing lovasz-softmax mish partial-fc pytorch triplet-loss

Last synced: 15 Nov 2024

https://github.com/enpeizhao/CVprojects

computer vision projects | 计算机视觉相关好玩的AI项目(Python、C++、embedded system)

computer-vision cpp cuda deep-learning embedded-systems machine-learning python tensorrt

Last synced: 27 Oct 2024

https://github.com/shader-slang/slang

Making it easier to work with shaders

cuda d3d12 glsl hlsl shaders vulkan

Last synced: 18 Dec 2024

https://github.com/nvidia/transformerengine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

cuda deep-learning fp8 gpu jax machine-learning python pytorch

Last synced: 19 Dec 2024

https://github.com/pytorch/torchrec

Pytorch domain library for recommendation systems

cuda deep-learning gpu pytorch recommendation-system recommender-system sharding

Last synced: 21 Dec 2024

https://github.com/nvidia/gpu-operator

NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes

cuda gpu kubernetes nvidia

Last synced: 17 Dec 2024

https://github.com/inducer/pycuda

CUDA integration for Python, plus shiny features

array cuda gpu gpu-computing multidimensional-arrays pycuda python scientific-computing

Last synced: 17 Dec 2024

https://github.com/NVIDIA/gpu-operator

NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes

cuda gpu kubernetes nvidia

Last synced: 09 Nov 2024

https://github.com/mp3guy/elasticfusion

Real-time dense visual SLAM system

cuda reconstruction slam

Last synced: 20 Dec 2024

https://github.com/roflcoopter/viseron

Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.

coral cuda darknet edgetpu face-recognition google-coral hacktoberfest hardware-acceleration ip-camera license-plate-recognition motion-detection network-video-capture network-video-recorder nvr object-detection rtsp surveillance tensorflow viseron yolo

Last synced: 19 Dec 2024

https://github.com/pykeen/pykeen

🤖 A Python library for learning and evaluating knowledge graph embeddings

cuda deep-learning knowledge-base-completion knowledge-graph-embeddings knowledge-graphs link-prediction machine-learning pykeen python torch

Last synced: 17 Dec 2024

https://github.com/NVIDIA/cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

algorithms cpp cpp11 cpp14 cpp17 cpp20 cub cuda cxx cxx11 cxx14 cxx17 cxx20 gpu nvidia nvidia-hpc-sdk

Last synced: 27 Oct 2024

https://github.com/deftruth/cuda-learn-notes

📚Tensor/CUDA Cores, 📖150+ CUDA Kernels, ⚡️⚡️toy-hgemm library with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).

cuda gemm gemv hgemm

Last synced: 19 Dec 2024

https://github.com/openppl/ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

caffe cuda deep-learning neural-network onnx open-source pytorch quantization

Last synced: 21 Dec 2024

https://github.com/xtra-computing/thundersvm

ThunderSVM: A Fast SVM Library on GPUs and CPUs

c-plus-plus classification cuda gpu libsvm one-class-learning regression

Last synced: 19 Dec 2024

https://github.com/Xtra-Computing/thundersvm

ThunderSVM: A Fast SVM Library on GPUs and CPUs

c-plus-plus classification cuda gpu libsvm one-class-learning regression

Last synced: 07 Nov 2024

https://github.com/dtolm/vkfft

Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library

c2r convolution cuda dct fft hip hpc levelzero metal opencl r2c r2r vulkan

Last synced: 19 Dec 2024

https://github.com/bbuf/how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

cuda llm

Last synced: 17 Dec 2024

https://github.com/els-rd/kernl

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

cuda cuda-kernel pytorch transformer triton

Last synced: 21 Dec 2024

https://github.com/DTolm/VkFFT

Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library

c2r convolution cuda dct fft hip hpc levelzero metal opencl r2c r2r vulkan

Last synced: 30 Oct 2024

https://github.com/ELS-RD/kernl

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

cuda cuda-kernel pytorch transformer triton

Last synced: 12 Nov 2024

https://github.com/OpenPPL/ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

caffe cuda deep-learning neural-network onnx open-source pytorch quantization

Last synced: 28 Oct 2024

https://github.com/marcoslucianops/deepstream-yolo

NVIDIA DeepStream SDK 7.1 / 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models

cuda darknet deepstream mmyolo nvidia nvidia-deepstream-sdk object-detection paddle ppyoloe pytorch rtdetr rtmdet tensorrt ultralytics yolo

Last synced: 19 Dec 2024

https://github.com/deepmodeling/deepmd-kit

A deep learning package for many-body potential energy representation and molecular dynamics

ase c computational-chemistry cpp cuda deep-learning deepmd ipi jax lammps materials-science molecular-dynamics nodejs paddle potential-energy python pytorch rocm tensorflow

Last synced: 17 Dec 2024