Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with cuda
A curated list of projects in awesome lists tagged with cuda .
https://github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
amd cuda gpt hpu inference inferentia llama llm llm-serving llmops mlops model-serving pytorch rocm tpu trainium transformer xpu
Last synced: 22 Dec 2024
https://github.com/NVIDIA/nvidia-docker
Build and run Docker containers leveraging NVIDIA GPUs
Last synced: 26 Oct 2024
https://github.com/nvidia/nvidia-docker
Build and run Docker containers leveraging NVIDIA GPUs
Last synced: 29 Sep 2024
https://github.com/nvlabs/instant-ngp
Instant neural graphics primitives: lightning fast NeRF and more
3d-reconstruction computer-graphics computer-vision cuda function-approximation machine-learning nerf neural-network real-time real-time-rendering realtime signed-distance-functions
Last synced: 16 Dec 2024
https://github.com/NVlabs/instant-ngp
Instant neural graphics primitives: lightning fast NeRF and more
3d-reconstruction computer-graphics computer-vision cuda function-approximation machine-learning nerf neural-network real-time real-time-rendering realtime signed-distance-functions
Last synced: 29 Oct 2024
https://nvlabs.github.io/instant-ngp/
Instant neural graphics primitives: lightning fast NeRF and more
3d-reconstruction computer-graphics computer-vision cuda function-approximation machine-learning nerf neural-network real-time real-time-rendering realtime signed-distance-functions
Last synced: 28 Oct 2024
https://github.com/kaldi-asr/kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
c-plus-plus cuda kaldi shell speaker-id speaker-verification speech speech-recognition speech-to-text
Last synced: 16 Dec 2024
https://github.com/isl-org/open3d
Open3D: A Modern Library for 3D Data Processing
3d 3d-perception arm computer-graphics cpp cuda gpu gui machine-learning mesh-processing odometry opengl pointcloud python pytorch reconstruction registration rendering tensorflow visualization
Last synced: 16 Dec 2024
https://github.com/isl-org/Open3D
Open3D: A Modern Library for 3D Data Processing
3d 3d-perception arm computer-graphics cpp cuda gpu gui machine-learning mesh-processing odometry opengl pointcloud python pytorch reconstruction registration rendering tensorflow visualization
Last synced: 27 Oct 2024
https://github.com/rapidsai/cudf
cuDF - GPU DataFrame Library
arrow cpp cuda cudf dask data-analysis data-science dataframe gpu pandas pydata python rapids
Last synced: 16 Dec 2024
https://github.com/replicate/cog
Containers for machine learning
ai containers cuda deep-learning docker machine-learning pytorch tensorflow
Last synced: 16 Dec 2024
https://github.com/catboost/catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
big-data catboost categorical-features coreml cuda data-mining data-science decision-trees gbdt gbm gpu gpu-computing gradient-boosting kaggle machine-learning python r tutorial
Last synced: 16 Dec 2024
https://github.com/kroma-network/tachyon
Modular ZK(Zero Knowledge) backend accelerated by GPU
blockchain c-plus-plus cpp17 cryptocurrency cryptography cuda kroma tachyon zero-knowledge zk
Last synced: 17 Dec 2024
https://github.com/hybridgroup/gocv
Go package for computer vision using OpenCV 4 and beyond. Includes support for DNN, CUDA, OpenCV Contrib, and OpenVINO.
computer-vision computervision cuda dnn face-tracking gocv golang image-processing mjpeg mjpeg-stream object-classification object-tracking onnx opencv openvino tensorflow video video-capture yolo
Last synced: 16 Dec 2024
https://github.com/sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
cuda inference llama llama2 llama3 llama3-1 llava llm llm-serving moe pytorch transformer vlm
Last synced: 22 Dec 2024
https://github.com/nvidia/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
cuda cuda-driver-api cuda-kernels cuda-opengl
Last synced: 17 Dec 2024
https://github.com/NVIDIA/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
cuda cuda-driver-api cuda-kernels cuda-opengl
Last synced: 27 Oct 2024
https://github.com/oneflow-inc/oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
cuda deep-learning deep-neural-networks distributed machine-learning ml neural-network
Last synced: 17 Dec 2024
https://github.com/chainer/chainer
A flexible framework of neural networks for deep learning
chainer cuda cudnn cupy deep-learning gpu machine-learning neural-network neural-networks numpy python
Last synced: 17 Dec 2024
https://github.com/nvidia/cutlass
CUDA Templates for Linear Algebra Subroutines
cpp cuda deep-learning deep-learning-library gpu nvidia
Last synced: 16 Dec 2024
https://github.com/Oneflow-Inc/oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
cuda deep-learning deep-neural-networks distributed machine-learning ml neural-network
Last synced: 27 Oct 2024
https://github.com/NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
cpp cuda deep-learning deep-learning-library gpu nvidia
Last synced: 25 Oct 2024
https://github.com/nvidia/thrust
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
algorithms cpp cpp11 cpp14 cpp17 cpp20 cuda cxx cxx11 cxx14 cxx17 cxx20 gpu gpu-computing nvidia nvidia-hpc-sdk thrust
Last synced: 01 Nov 2024
https://github.com/NVIDIA/thrust
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
algorithms cpp cpp11 cpp14 cpp17 cpp20 cuda cxx cxx11 cxx14 cxx17 cxx20 gpu gpu-computing nvidia nvidia-hpc-sdk thrust
Last synced: 26 Oct 2024
https://github.com/chrxh/alien
ALIEN is a CUDA-powered artificial life simulation program.
agent-based-simulation artificial-life cuda open-ended-evolution physics-engine
Last synced: 17 Dec 2024
https://github.com/xuehaipan/nvitop
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
command-line-tool console cuda curses gpu gpu-monitoring grafana grafana-dashboard htop monitoring monitoring-tool nvidia nvidia-smi nvml process-monitoring prometheus prometheus-exporter resource-monitor top
Last synced: 16 Dec 2024
https://github.com/XuehaiPan/nvitop
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
command-line-tool console cuda curses gpu gpu-monitoring grafana grafana-dashboard htop monitoring monitoring-tool nvidia nvidia-smi nvml process-monitoring prometheus prometheus-exporter resource-monitor top
Last synced: 28 Oct 2024
https://github.com/oaid/tengine
Tengine is a lite, high performance, modular inference engine for embedded device
acl arm artificial-intelligence cnn container cuda machine-learning mips npu nvdla onnx pytorch riscv supperedge tensorflow tensorrt x86-64
Last synced: 17 Dec 2024
https://github.com/OAID/Tengine
Tengine is a lite, high performance, modular inference engine for embedded device
acl arm artificial-intelligence cnn container cuda machine-learning mips npu nvdla onnx pytorch riscv supperedge tensorflow tensorrt x86-64
Last synced: 27 Oct 2024
https://github.com/arrayfire/arrayfire
ArrayFire: a general purpose GPU library.
arrayfire c c-plus-plus cpp cuda gpgpu gpu hpc opencl performance scientific-computing
Last synced: 16 Dec 2024
https://github.com/nvidiagameworks/kaolin
A PyTorch Library for Accelerating 3D Deep Learning Research
3d-deep-learning artificial-intelligence camera-api cuda differentiable-lighting differentiable-rendering neural-networks pytorch rasterization
Last synced: 18 Dec 2024
https://github.com/NVIDIAGameWorks/kaolin
A PyTorch Library for Accelerating 3D Deep Learning Research
3d-deep-learning artificial-intelligence camera-api cuda differentiable-lighting differentiable-rendering neural-networks pytorch rasterization
Last synced: 28 Oct 2024
https://github.com/rapidsai/cuml
cuML - RAPIDS Machine Learning Library
cuda gpu machine-learning machine-learning-algorithms nvidia
Last synced: 16 Dec 2024
https://github.com/nvlabs/tiny-cuda-nn
Lightning fast C++/CUDA neural network framework
cuda deep-learning gpu mlp nerf neural-network pytorch real-time rendering
Last synced: 18 Dec 2024
https://github.com/rocm/hip
HIP: C++ Heterogeneous-Compute Interface for Portability
cuda hip hip-kernel-language hip-portability hip-runtime hipify
Last synced: 17 Dec 2024
https://github.com/NVlabs/tiny-cuda-nn
Lightning fast C++/CUDA neural network framework
cuda deep-learning gpu mlp nerf neural-network pytorch real-time rendering
Last synced: 27 Oct 2024
https://github.com/ROCm/HIP
HIP: C++ Heterogeneous-Compute Interface for Portability
cuda hip hip-kernel-language hip-portability hip-runtime hipify
Last synced: 25 Oct 2024
https://github.com/opennmt/ctranslate2
Fast inference engine for Transformer models
avx avx2 cpp cuda deep-learning deep-neural-networks gemm inference intrinsics machine-translation mkl neon neural-machine-translation onednn openmp opennmt parallel-computing quantization thrust transformer-models
Last synced: 16 Dec 2024
https://github.com/OpenNMT/CTranslate2
Fast inference engine for Transformer models
avx avx2 cpp cuda deep-learning deep-neural-networks gemm inference intrinsics machine-translation mkl neon neural-machine-translation onednn openmp opennmt parallel-computing quantization thrust transformer-models
Last synced: 03 Nov 2024
https://github.com/bytedance/lightseq
LightSeq: A High Performance Library for Sequence Processing and Generation
accelerate bart beam-search bert cuda diverse-decoding gpt inference multilingual-nmt sampling training transformer
Last synced: 18 Dec 2024
https://github.com/jittor/jittor
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
cuda deep-learning gpu jittor python
Last synced: 16 Dec 2024
https://github.com/Jittor/jittor
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
cuda deep-learning gpu jittor python
Last synced: 28 Oct 2024
https://github.com/heavyai/heavydb
HeavyDB (formerly OmniSciDB)
cuda database gpu heavyai interactive llvm machine-learning mapd olap omnisci real-time sql visualization
Last synced: 16 Dec 2024
https://github.com/iree-org/iree
A retargetable MLIR-based machine learning compiler and runtime toolkit.
compiler cuda jax machine-learning mlir pytorch runtime spirv tensorflow vulkan
Last synced: 14 Dec 2024
https://github.com/openxla/iree
A retargetable MLIR-based machine learning compiler and runtime toolkit.
compiler cuda jax machine-learning mlir pytorch runtime spirv tensorflow vulkan
Last synced: 09 Dec 2024
https://github.com/NVIDIA/Torch-TensorRT
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
cuda deep-learning jetson libtorch machine-learning nvidia pytorch tensorrt
Last synced: 14 Dec 2024
https://github.com/pytorch/tensorrt
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
cuda deep-learning jetson libtorch machine-learning nvidia pytorch tensorrt
Last synced: 21 Dec 2024
https://nvidia.github.io/MinkowskiEngine/
Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
3d-convolutional-network 3d-vision 4d-convolutional-neural-network auto-differentiation computer-vision convolutional-neural-networks cuda deep-learning high-dimensional-data high-dimensional-inference minkowski-engine neural-network pytorch semantic-segmentation space-time sparse-convolution sparse-tensor-network sparse-tensors spatio-temporal-analysis trilateral-filter
Last synced: 14 Nov 2024
https://github.com/nvidia/minkowskiengine
Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
3d-convolutional-network 3d-vision 4d-convolutional-neural-network auto-differentiation computer-vision convolutional-neural-networks cuda deep-learning high-dimensional-data high-dimensional-inference minkowski-engine neural-network pytorch semantic-segmentation space-time sparse-convolution sparse-tensor-network sparse-tensors spatio-temporal-analysis trilateral-filter
Last synced: 19 Dec 2024
https://github.com/cvcuda/cv-cuda
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
bytedance cloud computer-vision cpp cuda cv-cuda gpu image-processing machine-learning nvidia python
Last synced: 19 Dec 2024
https://github.com/pytorch/TensorRT
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
cuda deep-learning jetson libtorch machine-learning nvidia pytorch tensorrt
Last synced: 06 Nov 2024
https://github.com/CVCUDA/CV-CUDA
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
bytedance cloud computer-vision cpp cuda cv-cuda gpu image-processing machine-learning nvidia python
Last synced: 27 Oct 2024
https://github.com/NVIDIA/MinkowskiEngine
Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
3d-convolutional-network 3d-vision 4d-convolutional-neural-network auto-differentiation computer-vision convolutional-neural-networks cuda deep-learning high-dimensional-data high-dimensional-inference minkowski-engine neural-network pytorch semantic-segmentation space-time sparse-convolution sparse-tensor-network sparse-tensors spatio-temporal-analysis trilateral-filter
Last synced: 28 Oct 2024
https://github.com/enpeizhao/cvprojects
computer vision projects | 计算机视觉相关好玩的AI项目(Python、C++、embedded system)
computer-vision cpp cuda deep-learning embedded-systems machine-learning python tensorrt
Last synced: 20 Dec 2024
https://github.com/coincheung/pytorch-loss
label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful
amsoftmax cuda dice-loss ema focal-loss label-smoothing lovasz-softmax mish partial-fc pytorch triplet-loss
Last synced: 19 Dec 2024
https://github.com/CoinCheung/pytorch-loss
label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful
amsoftmax cuda dice-loss ema focal-loss label-smoothing lovasz-softmax mish partial-fc pytorch triplet-loss
Last synced: 15 Nov 2024
https://github.com/enpeizhao/CVprojects
computer vision projects | 计算机视觉相关好玩的AI项目(Python、C++、embedded system)
computer-vision cpp cuda deep-learning embedded-systems machine-learning python tensorrt
Last synced: 27 Oct 2024
https://github.com/nvidia/transformerengine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
cuda deep-learning fp8 gpu jax machine-learning python pytorch
Last synced: 19 Dec 2024
https://github.com/pytorch/torchrec
Pytorch domain library for recommendation systems
cuda deep-learning gpu pytorch recommendation-system recommender-system sharding
Last synced: 21 Dec 2024
https://github.com/nvidia/gpu-operator
NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
Last synced: 17 Dec 2024
https://github.com/inducer/pycuda
CUDA integration for Python, plus shiny features
array cuda gpu gpu-computing multidimensional-arrays pycuda python scientific-computing
Last synced: 17 Dec 2024
https://github.com/NVIDIA/gpu-operator
NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
Last synced: 09 Nov 2024
https://github.com/cannylab/tsne-cuda
GPU Accelerated t-SNE for CUDA with Python bindings
barnes-hut barnes-hut-tsne cuda data-analysis data-visualization fit-tsne gpu mnist multithreading python tsne tsne-algorithm tsne-cuda
Last synced: 17 Dec 2024
https://github.com/CannyLab/tsne-cuda
GPU Accelerated t-SNE for CUDA with Python bindings
barnes-hut barnes-hut-tsne cuda data-analysis data-visualization fit-tsne gpu mnist multithreading python tsne tsne-algorithm tsne-cuda
Last synced: 26 Oct 2024
https://github.com/roflcoopter/viseron
Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.
coral cuda darknet edgetpu face-recognition google-coral hacktoberfest hardware-acceleration ip-camera license-plate-recognition motion-detection network-video-capture network-video-recorder nvr object-detection rtsp surveillance tensorflow viseron yolo
Last synced: 19 Dec 2024
https://github.com/coreylowman/dfdx
Deep learning in Rust, with shape checked tensors and neural networks
autodiff autodifferentiation autograd backpropagation cuda cuda-kernels cuda-support cuda-toolkit cudnn deep-learning deep-neural-networks gpu gpu-acceleration gpu-computing machine-learning neural-network rust rust-lang tensor
Last synced: 17 Dec 2024
https://github.com/siliconflow/onediff
OneDiff: An out-of-the-box acceleration library for diffusion models.
aigc-serving comfyui comfyui-workflow cuda diffusers diffusion-models inference-engine lcm lcm-lora lora performance-optimization pytorch sd-webui sdxl sdxl-turbo stable-diffusion stable-video-diffusion
Last synced: 17 Dec 2024
https://github.com/rapidsai/cugraph
cuGraph - RAPIDS Graph Analytics Library
complex-networks cuda gpu graph graph-algorithms graph-analysis graph-framework graphml nvidia rapids
Last synced: 18 Dec 2024
https://github.com/pykeen/pykeen
🤖 A Python library for learning and evaluating knowledge graph embeddings
cuda deep-learning knowledge-base-completion knowledge-graph-embeddings knowledge-graphs link-prediction machine-learning pykeen python torch
Last synced: 17 Dec 2024
https://github.com/dmlc/nnvm
computation-graph cuda deep-learning deployment metal nnvm opencl optimization rocm tvm
Last synced: 13 Nov 2024
https://github.com/deftruth/cuda-learn-notes
📚Tensor/CUDA Cores, 📖150+ CUDA Kernels, ⚡️⚡️toy-hgemm library with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).
Last synced: 19 Dec 2024
https://github.com/openppl/ppq
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
caffe cuda deep-learning neural-network onnx open-source pytorch quantization
Last synced: 21 Dec 2024
https://github.com/xtra-computing/thundersvm
ThunderSVM: A Fast SVM Library on GPUs and CPUs
c-plus-plus classification cuda gpu libsvm one-class-learning regression
Last synced: 19 Dec 2024
https://github.com/Xtra-Computing/thundersvm
ThunderSVM: A Fast SVM Library on GPUs and CPUs
c-plus-plus classification cuda gpu libsvm one-class-learning regression
Last synced: 07 Nov 2024
https://github.com/bbuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
Last synced: 17 Dec 2024
https://github.com/els-rd/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
cuda cuda-kernel pytorch transformer triton
Last synced: 21 Dec 2024
https://github.com/ELS-RD/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
cuda cuda-kernel pytorch transformer triton
Last synced: 12 Nov 2024
https://github.com/OpenPPL/ppq
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
caffe cuda deep-learning neural-network onnx open-source pytorch quantization
Last synced: 28 Oct 2024
https://github.com/marcoslucianops/deepstream-yolo
NVIDIA DeepStream SDK 7.1 / 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models
cuda darknet deepstream mmyolo nvidia nvidia-deepstream-sdk object-detection paddle ppyoloe pytorch rtdetr rtmdet tensorrt ultralytics yolo
Last synced: 19 Dec 2024
https://github.com/deepmodeling/deepmd-kit
A deep learning package for many-body potential energy representation and molecular dynamics
ase c computational-chemistry cpp cuda deep-learning deepmd ipi jax lammps materials-science molecular-dynamics nodejs paddle potential-energy python pytorch rocm tensorflow
Last synced: 17 Dec 2024