CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-15 00:07:19 UTC
- JSON Representation
https://github.com/jasmcaus/caer
High-performance Vision library in Python. Scale your research, not boilerplate.
ai artificial-intelligence augmentation caer computer-vision cuda data-science deep-learning gpu image-classification image-processing image-segmentation machine-learning neural-network opencv python segmentation type-checking video-processing vision
Last synced: 15 May 2025
https://github.com/bheisler/RustaCUDA
Rusty wrapper for the CUDA Driver API
Last synced: 06 Apr 2025
https://github.com/bheisler/rustacuda
Rusty wrapper for the CUDA Driver API
Last synced: 14 Apr 2025
https://github.com/QPT-Family/QPT
[内测中]QPT - 致力于让开源项目更好通往互联网世界的Python to EXE工具(Python打包)。
cuda deep-learning dml gpu noavx paddlepaddle pypi python qpt
Last synced: 18 Aug 2025
https://github.com/andyzeng/tsdf-fusion
Fuse multiple depth frames into a TSDF voxel volume.
3d 3d-deep-learning 3d-reconstruction artificial-intelligence cuda depth-camera kinect-fusion rgbd tsdf vision volumetric-data
Last synced: 05 Apr 2025
https://github.com/qpt-family/qpt
[内测中]QPT - 致力于让开源项目更好通往互联网世界的Python to EXE工具(Python打包)。
cuda deep-learning dml gpu noavx paddlepaddle pypi python qpt
Last synced: 12 Apr 2025
https://github.com/xmrig/xmrig-nvidia
Monero (XMR) NVIDIA miner
aeon cryptonight cuda electroneum gpu-mining monero nvidia-miner sumokoin xmr xmrig
Last synced: 13 Apr 2025
https://github.com/LuisaGroup/LuisaCompute
High-Performance Rendering Framework on Stream Architectures
cpu cross-platform cuda directx dsl dxr gpu graphics high-performance ispc llvm metal optix raytracing rendering rtx siggraph-asia-2022
Last synced: 09 Jul 2025
https://github.com/mp3guy/icpcuda
Super fast implementation of ICP in CUDA for compute capable devices 3.5 or higher
Last synced: 04 Apr 2025
https://github.com/ddemidov/vexcl
VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP
c-plus-plus cpp11 cuda gpgpu opencl scientific-computing
Last synced: 16 May 2025
https://github.com/shibatch/sleef
SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT
aarch64 avx2 avx512 cuda elementary-functions fft fourier-transform fourier-transform-library math-library powerpc quadruple-precision s390x simd sse2 vector-math vectorization vsx
Last synced: 13 Apr 2025
https://github.com/cresset-template/cresset
Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.
build cuda deep-learning deep-learning-tutorial docker docker-compose machine-learning makefile mlops mlops-template python pytorch source source-python template template-repository wheel
Last synced: 04 Apr 2025
https://github.com/Devsh-Graphics-Programming/Nabla
Vulkan, OptiX and CUDA Interoperation Modular Rendering Library and Framework for PC/Linux/Android
computer-graphics cuda cuda-opengl gpgpu gpu graphics-engine graphics-library hlsl optix optix-denoiser path-tracing pathtracing pbr raytracing rendering shaders spir-v vulkan
Last synced: 10 Jun 2026
https://github.com/Xtra-Computing/thundergbm
ThunderGBM: Fast GBDTs and Random Forests on GPUs
cuda gbdt gpu machine-learning random-forest
Last synced: 12 Apr 2025
https://github.com/xtra-computing/thundergbm
ThunderGBM: Fast GBDTs and Random Forests on GPUs
cuda gbdt gpu machine-learning random-forest
Last synced: 14 Dec 2025
https://github.com/gprMax/gprMax
gprMax is open source software that simulates electromagnetic wave propagation using the Finite-Difference Time-Domain (FDTD) method for numerical modelling of Ground Penetrating Radar (GPR)
antenna cuda electromagnetic fdtd gpr gpu modelling nvidia python simulation soil
Last synced: 10 May 2025
https://github.com/santosh-gupta/speedtorch
Library for faster pinned CPU <-> GPU transfer in Pytorch
cpu-gpu-transfer cpu-pinned-tensors cuda cuda-tensors cuda-variables cupy data-transfer embeddings embeddings-trained gpu gpu-transfer machine-learning natural-language-processing nlp pinned-cpu-tensors pytorch pytorch-tensors pytorch-variables sparse sparse-modeling
Last synced: 12 Apr 2025
https://github.com/Santosh-Gupta/SpeedTorch
Library for faster pinned CPU <-> GPU transfer in Pytorch
cpu-gpu-transfer cpu-pinned-tensors cuda cuda-tensors cuda-variables cupy data-transfer embeddings embeddings-trained gpu gpu-transfer machine-learning natural-language-processing nlp pinned-cpu-tensors pytorch pytorch-tensors pytorch-variables sparse sparse-modeling
Last synced: 08 May 2025
https://github.com/thu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
attention cuda inference-acceleration llm quantization triton video-generation
Last synced: 15 Aug 2025
https://github.com/uxlfoundation/onemath
oneAPI Math Library (oneMath)
api blas cpu cuda dpcpp gpu hpc intel math-libraries oneapi onemkl parallel-computing parallel-programming performance rng
Last synced: 15 May 2025
https://github.com/maghoumi/pytorch-softdtw-cuda
Fast CUDA implementation of (differentiable) soft dynamic time warping for PyTorch
cuda deep-learning dynamic-time-warping pytorch soft-dtw
Last synced: 04 Apr 2025
https://github.com/cern/tigre
TIGRE: Tomographic Iterative GPU-based Reconstruction Toolbox
cuda gpus image-reconstruction matlab python tigre tomography toolbox x-ray
Last synced: 15 May 2025
https://github.com/hedronvision/bazel-compile-commands-extractor
Goal: Enable awesome tooling for Bazel users of the C language family.
bazel bazel-build c ccls clang clang-tidy clang-tooling clangd contributions-welcome cpp cross-platform cuda hacktoberfest objective-c objective-c-plus-plus tools
Last synced: 23 Mar 2025
https://github.com/insight-platform/Savant
Python Computer Vision & Video Analytics Framework With Batteries Included
computer-vision cuda deep-learning deepstream edge-computing inference-engine instance-segmentation machine-learning nvidia nvidia-deepstream-sdk object-detection opencv peoplenet tensorrt video yolo yolov5-face yolov8 yolov8-face
Last synced: 21 Apr 2025
https://github.com/MarioSieg/magnetron
(WIP) A small but powerful, homemade PyTorch from scratch.
artificial-intelligence cpp cuda high-performance-computing machine-learning neuronal-network python pytorch research-project tensorflow tiny
Last synced: 15 Sep 2025
https://github.com/NVIDIA/nvbench
CUDA Kernel Benchmarking Library
benchmark cuda cuda-kernels gpu kernel-benchmark nvidia performance
Last synced: 16 May 2025
https://github.com/nvidia/nvbench
CUDA Kernel Benchmarking Library
benchmark cuda cuda-kernels gpu kernel-benchmark nvidia performance
Last synced: 14 Apr 2025
https://github.com/cudamat/cudamat
Python module for performing basic dense linear algebra computations on the GPU using CUDA.
Last synced: 10 Mar 2026
https://github.com/inducer/loopy
A code generator for array-based code on CPUs and GPUs
array code-generation code-generator code-optimization code-transformation cuda ispc loop-optimization multidimensional-arrays opencl performance performance-analysis prefix-sum python reduction scan scientific-computing
Last synced: 14 May 2025
https://github.com/hpcaitech/fastfold
Optimizing AlphaFold Training and Inference on GPU Clusters
alphafold2 cuda evoformer gpu habana-gaudi parallelism protein-folding protein-structure pytorch
Last synced: 13 Apr 2025
https://github.com/hpcaitech/FastFold
Optimizing AlphaFold Training and Inference on GPU Clusters
alphafold2 cuda evoformer gpu habana-gaudi parallelism protein-folding protein-structure pytorch
Last synced: 01 May 2025
https://github.com/tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
arm64 armv7 cuda cuda-kernel gemm-optimization int4 ptx vulkan
Last synced: 04 Apr 2025
https://github.com/rapidsai/rmm
RAPIDS Memory Manager
cuda memory-allocation memory-management rapids
Last synced: 14 May 2025
https://github.com/BobMcDear/attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
cuda deep-learning machine-learning openai openai-triton pytorch triton
Last synced: 23 Aug 2025
https://github.com/gprmax/gprmax
gprMax is open source software that simulates electromagnetic wave propagation using the Finite-Difference Time-Domain (FDTD) method for numerical modelling of Ground Penetrating Radar (GPR)
antenna cuda electromagnetic fdtd gpr gpu modelling nvidia python simulation soil
Last synced: 15 May 2025
https://github.com/sergio0694/neuralnetwork.net
A TensorFlow-inspired neural network library built from scratch in C# 7.3 for .NET Standard 2.0, with GPU support through cuDNN
ai backpropagation-algorithm classification-algorithims cnn convolutional-neural-networks csharp cuda gpu-acceleration gradient-descent machine-learning net-framework netstandard neural-network supervised-learning visual-studio
Last synced: 16 May 2025
https://github.com/Sergio0694/NeuralNetwork.NET
A TensorFlow-inspired neural network library built from scratch in C# 7.3 for .NET Standard 2.0, with GPU support through cuDNN
ai backpropagation-algorithm classification-algorithims cnn convolutional-neural-networks csharp cuda gpu-acceleration gradient-descent machine-learning net-framework netstandard neural-network supervised-learning visual-studio
Last synced: 02 Apr 2025
https://github.com/kwea123/gaussian_splatting_notes
A detailed formulae explanation on gaussian splatting
Last synced: 05 Apr 2025
https://github.com/stochasticai/x-stable-diffusion
Real-time inference for Stable Diffusion - 0.88s latency. Covers AITemplate, nvFuser, TensorRT, FlashAttention. Join our Discord communty: https://discord.com/invite/TgHXuSJEk6
aitemplate automl cuda docker inference notebook nvfuser onnx onnxruntime pytorch stable-diffusion tensorrt
Last synced: 04 Apr 2025
https://github.com/laugh12321/TensorRT-YOLO
🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast inference speeds.
cuda cuda-graph cuda-kernels cuda-programming detection onnx ppyoloe tensorrt yolov10 yolov3 yolov5 yolov6 yolov7 yolov8 yolov9
Last synced: 18 Mar 2025
https://github.com/laugh12321/tensorrt-yolo
🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast inference speeds.
cuda cuda-graph cuda-kernels cuda-programming detection onnx ppyoloe tensorrt yolov10 yolov3 yolov5 yolov6 yolov7 yolov8 yolov9
Last synced: 14 May 2025
https://github.com/tencent/forward
A library for high performance deep learning inference on NVIDIA GPUs.
cuda deep-learning forward gpu inference inference-engine keras neural-network onnx pytorch tensorflow tensorrt
Last synced: 05 Apr 2025
https://github.com/brucefan1983/GPUMD
Graphics Processing Units Molecular Dynamics
cuda gpu gpumd heat-transport high-performance-computing machine-learning machine-learning-potential molecular-dynamics molecular-dynamics-simulation natural-evolution-strategies neural-network neuroevolution phonon physics-simulation simulation
Last synced: 04 May 2025
https://github.com/Tencent/Forward
A library for high performance deep learning inference on NVIDIA GPUs.
cuda deep-learning forward gpu inference inference-engine keras neural-network onnx pytorch tensorflow tensorrt
Last synced: 18 Apr 2025
https://github.com/cvxgrp/pymde
Minimum-distortion embedding with PyTorch
cuda dimensionality-reduction embedding feature-vectors gpu graph-embedding machine-learning pytorch visualization
Last synced: 16 May 2025
https://github.com/mariosieg/magnetron
(WIP) A small but powerful, homemade PyTorch from scratch.
artificial-intelligence cpp cuda high-performance-computing machine-learning neuronal-network python pytorch research-project tensorflow tiny
Last synced: 08 Apr 2025
https://github.com/zhihu/cubert
Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL
bert cuda deep-learning inference mkl predict tensorflow transformer
Last synced: 05 Apr 2025
https://github.com/zeux/calm
CUDA/Metal accelerated language model inference
Last synced: 10 Apr 2025
https://github.com/nvidia/cucollections
cpp cpp17 cuda datastructures gpu hashmap hashset hashtable
Last synced: 15 May 2025
https://github.com/nvidia/jitify
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
cpp cuda jit-compilation nvrtc runtime-compilation single-header
Last synced: 15 May 2025
https://github.com/luisagroup/luisarender
High-Performance Cross-Platform Monte Carlo Renderer Based on LuisaCompute
cpp cuda gpu high-performance ispc metal optix path-tracing ray-tracing renderer rendering siggraph-asia-2022
Last synced: 12 Apr 2025
https://github.com/openhackathons-org/gpubootcamp
This repository consists for gpu bootcamp material for HPC and AI
ai4hpc cuda data-science deep-learning deepstream gpu hpc machine-learning mpi openacc openmp rapidsai
Last synced: 27 Mar 2025
https://github.com/spcl/dace
DaCe - Data Centric Parallel Programming
cuda fpga high-level-synthesis high-performance-computing programming-language vivado-hls
Last synced: 14 May 2025
https://github.com/zhihu/cuBERT
Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL
bert cuda deep-learning inference mkl predict tensorflow transformer
Last synced: 02 Apr 2025
https://github.com/huggingface/llm_training_handbook
An open collection of methodologies to help with successful training of large language models.
cuda large-language-models llm nccl nlp performance python pytorch scalability troubleshooting
Last synced: 14 Oct 2025
https://github.com/gridhead/nvidia-auto-installer-for-fedora-linux
A CLI tool which lets you install proprietary NVIDIA drivers and much more easily on Fedora Linux (32 or above and Rawhide)
cuda fedora hacktoberfest nvidia optimus rpmfusion
Last synced: 15 May 2025
https://github.com/rnd-team-dev/plotoptix
Data visualisation and ray tracing in Python based on OptiX 8.1 framework.
3d-graphics animation cuda generative-art gpu nvidia optix path-tracing pathtracing plot ray-tracing raytracer raytracing real-time rtx visualization
Last synced: 15 May 2025
https://github.com/Kaixhin/dockerfiles
Compilation of Dockerfiles with automated builds enabled on the Docker Registry
cuda deep-learning docker dockerfiles machine-learning vnc
Last synced: 20 Mar 2025
https://github.com/radarsimx/radarsimpy
Radar Simulator built with Python and C++
cuda radar raytracing simulation
Last synced: 04 Apr 2026
https://github.com/ashvardanian/less_slow.cpp
Learning how to write "Less Slow" code in C++ 20, C 99, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO
assembly assembly-language avx512 benchmark coroutines cpp cpp-programming cpp17 cpp20 cuda gcc google-benchmark hpc io-uring linux-kernel llvm ptx ranges tutorial tutorials
Last synced: 08 Apr 2025
https://github.com/gorgonia/cu
package cu provides an idiomatic interface to the CUDA Driver API.
cuda cuda-driver-api go golang
Last synced: 04 Apr 2025
https://github.com/alternbits/awesome-cuda-books
A curated list of best cuda programming books
cpp cuda cuda-basics cuda-book cuda-cpp cuda-programming cuda-tutorial gpu gpu-computing gpu-optimization gpu-programming nvidia
Last synced: 03 Jun 2026
https://github.com/shi-labs/natten
Neighborhood Attention Extension. Bringing attention to a neighborhood near you!
cuda neighborhood-attention pytorch
Last synced: 08 Feb 2026
https://github.com/salesforce/warp-drive
Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning Framework on a GPU (JMLR 2022)
cuda deep-learning gpu high-throughput multiagent-reinforcement-learning numba pytorch reinforcement-learning
Last synced: 16 May 2025
https://github.com/huggingface/large_language_model_training_playbook
An open collection of implementation tips, tricks and resources for training large language models
cuda large-language-models llm nccl nlp performance python pytorch scalability troubleshooting
Last synced: 14 Oct 2025
https://github.com/ccsb-scripps/AutoDock-GPU
AutoDock for GPUs and other accelerators
autodock4 cuda gpu-computing molecular-docking multicore-cpu opencl
Last synced: 21 Nov 2025
https://github.com/nvidia/cuquantum
Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples
cuda cuquantum custatevec cutensornet nvidia quantum-computing
Last synced: 01 May 2026
https://github.com/cloudcores/cuassembler
An unofficial cuda assembler, for all generations of SASS, hopefully :)
Last synced: 05 Apr 2025
https://zielon.github.io/insta/
INSTA - Instant Volumetric Head Avatars [CVPR2023]
3dmm avatars cuda flame instant-ngp nerf neural-network volumetric-rendering
Last synced: 26 Mar 2025
https://github.com/megviirobot/megba
MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment
bundleadjustment cuda distributed gpu-acceleration graph-optimization high-performance
Last synced: 24 Jun 2025
https://github.com/h2oai/h2o4gpu
H2Oai GPU Edition
c-plus-plus cpu cuda elastic-net glm gpu lasso machine-learning pca python r rstats svd
Last synced: 14 May 2025
https://github.com/sinkingsugar/nimtorch
PyTorch - Python + Nim
artificial-intelligence artificial-neural-networks cuda machine-learning nim pytorch wasm
Last synced: 13 Aug 2025
https://github.com/ginkgo-project/ginkgo
Numerical linear algebra software package
cuda dpcpp gpu-computing hip hpc krylov-methods linear-algebra oneapi openmp preconditioning sparse-linear-systems spmv
Last synced: 15 May 2025
https://github.com/cloudcores/CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully :)
Last synced: 20 Mar 2025
https://github.com/ServerlessLLM/ServerlessLLM
Serverless LLM Serving for Everyone.
cuda huggingface-transformers large-language-models model-as-a-service model-serving pytorch serverless-inference
Last synced: 07 May 2025
https://github.com/mumax/3
GPU-accelerated micromagnetic simulator
cuda finite-difference-time-domain go micromagnetics scientific-computing
Last synced: 23 Jun 2025
https://github.com/ccsb-scripps/autodock-gpu
AutoDock for GPUs and other accelerators
autodock4 cuda gpu-computing molecular-docking multicore-cpu opencl
Last synced: 15 May 2025
https://github.com/cnstark/pytorch-docker
Pure Pytorch Docker Images.
centos cuda deep-learning docker nvidia pytorch ubuntu
Last synced: 06 Oct 2025
https://github.com/termoshtt/accel
(Mirror of GitLab) GPGPU Framework for Rust
Last synced: 10 Jan 2026
https://github.com/petercunha/pine
:evergreen_tree: Aimbot powered by real-time object detection with neural networks, GPU accelerated with Nvidia. Optimized for use with CS:GO.
aimbot csgo cuda darknet detection fortnite fps game-hacking hacking neural-network neural-networks nvidia object-detection opencl opencv overwatch pine python yolo yolov3
Last synced: 17 Mar 2026
https://github.com/petercunha/Pine
:evergreen_tree: Aimbot powered by real-time object detection with neural networks, GPU accelerated with Nvidia. Optimized for use with CS:GO.
aimbot csgo cuda darknet detection fortnite fps game-hacking hacking neural-network neural-networks nvidia object-detection opencl opencv overwatch pine python yolo yolov3
Last synced: 17 Apr 2025
https://github.com/MegviiRobot/MegBA
MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment
bundleadjustment cuda distributed gpu-acceleration graph-optimization high-performance
Last synced: 07 May 2025
https://github.com/patwie/tensorflow-cmake
TensorFlow examples in C, C++, Go and Python without bazel but with cmake and FindTensorFlow.cmake
c cmake cpp cuda deep-learning golang inference opencv tensorflow tensorflow-cc tensorflow-cmake tensorflow-examples tensorflow-gpu
Last synced: 06 Apr 2025
https://github.com/vectorch-ai/ScaleLLM
A high-performance inference system for large language models, designed for production environments.
cuda efficiency gpu inference llama llama3 llm llm-inference model performance production serving speculative transformer
Last synced: 09 May 2025
https://github.com/tlkh/ai-lab
All-in-one AI container for rapid prototyping
cuda data-science deep-learning docker jupyter nvidia pytorch tensorflow
Last synced: 05 Apr 2025
https://github.com/ibm/aihwkit
IBM Analog Hardware Acceleration Kit
ai analog-devices cuda neural-networks pytorch
Last synced: 08 Oct 2025
https://github.com/colin97/msn-point-cloud-completion
Morphing and Sampling Network for Dense Point Cloud Completion (AAAI2020)
3d-reconstruction auction-algorithm cuda earth-mover-distance earth-movers-distance minimum-spanning-tree point-cloud point-cloud-completion point-cloud-processing shape-completion
Last synced: 07 Apr 2025
https://github.com/DerryHub/BEVFormer_tensorrt
BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).
bevformer cuda int8-inference pytorch quantization tensorrt-plugins
Last synced: 20 Mar 2025
https://github.com/toverainc/willow-inference-server
Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS
cuda deep-learning llama llm privacy speech-recognition speech-to-text text-to-speech vicuna webrtc whisper willow
Last synced: 05 Apr 2025
https://github.com/vectorch-ai/scalellm
A high-performance inference system for large language models, designed for production environments.
cuda efficiency gpu inference llama llama3 llm llm-inference model performance production serving speculative transformer
Last synced: 14 Apr 2025
https://github.com/lambdalabsml/distributed-training-guide
Best practices & guides on how to write distributed pytorch training code
cluster cuda deepspeed distributed-training fsdp gpu gpu-cluster kuberentes lambdalabs mpi nccl pytorch sharding slurm
Last synced: 16 May 2025
https://github.com/arrayfire/arrayfire-python
Python bindings for ArrayFire: A general purpose GPU library.
arrayfire cuda gpgpu gpu hpc opencl python python-bindings
Last synced: 02 Apr 2025
https://github.com/uncomplicate/deep-diamond
A fast Clojure Tensor & Deep Learning library
clojure cuda deep-learning deep-neural-networks dnnl gpu java nvidia
Last synced: 12 Apr 2025
https://github.com/alicevision/popsift
PopSift is an implementation of the SIFT algorithm in CUDA.
computer-vision cuda feature-extraction gpu image-processing sift
Last synced: 05 Apr 2025
https://github.com/ingonyama-zk/icicle
A hardware acceleration library for compute intensive cryptography :ice_cube:
cpu cryptography cuda golang msm ntt rust zero-knowledge
Last synced: 14 May 2025
https://github.com/serverlessllm/serverlessllm
Serverless LLM Serving for Everyone.
cuda huggingface-transformers large-language-models model-as-a-service model-serving pytorch serverless-inference
Last synced: 15 May 2025
https://github.com/xmrig/xmrig-cuda
NVIDIA CUDA plugin for XMRig miner
cryptonight cuda randomx xmrig
Last synced: 15 May 2025
https://github.com/rapidsai/cuvs
cuVS - a library for vector search and clustering on the GPU
anns clustering cuda distance gpu information-retrieval llm machine-learning nearest-neighbors neighborhood-methods similarity-search sparse statistics vector-search vector-similarity vector-store
Last synced: 08 Apr 2026