Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2025-01-27 00:06:46 UTC
- JSON Representation
https://github.com/OpenPPL/ppq
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
caffe cuda deep-learning neural-network onnx open-source pytorch quantization
Last synced: 28 Oct 2024
https://github.com/sniklaus/3d-ken-burns
an implementation of 3D Ken Burns Effect from a Single Image using PyTorch
cuda cupy deep-learning python pytorch
Last synced: 25 Jan 2025
https://github.com/openppl-public/ppq
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
caffe cuda deep-learning neural-network onnx open-source pytorch quantization
Last synced: 05 Oct 2024
https://github.com/kevmo314/scuda
SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.
cublas cuda cudnn gpu mlops networking nvml remote-access
Last synced: 25 Jan 2025
https://github.com/wangzhaode/mnn-llm
llm deploy project based mnn.
baichuan2-7b chatglm-6b chatglm2-6b codegeex2-6b cpp cuda mnn opencl qwen-7b
Last synced: 23 Jan 2025
https://github.com/pytorch/ao
PyTorch native quantization and sparsity for training and inference
brrr cuda dtypes float8 inference llama mx offloading optimizer pytorch quantization sparsity training transformer
Last synced: 23 Jan 2025
https://github.com/nvidia/cccl
CUDA Core Compute Libraries
accelerated-computing cpp cpp-programming cuda cuda-cpp cuda-kernels cuda-library cuda-programming gpu gpu-acceleration gpu-computing gpu-programming hpc modern-cpp nvidia nvidia-gpu parallel-algorithm parallel-computing parallel-programming
Last synced: 24 Jan 2025
https://github.com/godweiyang/nn-cuda-example
Several simple examples for popular neural network toolkits calling custom CUDA operators.
cpp cuda neural-network python pytorch tensorflow
Last synced: 27 Jan 2025
https://github.com/AlexiaJM/Deep-learning-with-cats
Deep learning with cats (^._.^)
cat cuda deep-learning gan picture
Last synced: 27 Nov 2024
https://github.com/alexiajm/deep-learning-with-cats
Deep learning with cats (^._.^)
cat cuda deep-learning gan picture
Last synced: 27 Jan 2025
https://github.com/flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
cuda flash-attention gpu jit large-large-models llm-inference pytorch
Last synced: 23 Jan 2025
https://github.com/koide3/fast_gicp
A collection of GICP-based fast point cloud registration algorithms
cpp cuda gicp gpu icp multithreading pcl point-cloud python registration scan-matching vgicp
Last synced: 23 Jan 2025
https://github.com/FeiYull/TensorRT-Alpha
🔥🔥🔥TensorRT for YOLOv8、YOLOv8-Pose、YOLOv8-Seg、YOLOv8-Cls、YOLOv7、YOLOv6、YOLOv5、YOLONAS......🚀🚀🚀CUDA IS ALL YOU NEED.🍎🍎🍎
cuda efficientdet libfacedetection rt-detr tensorrt u2net yolonas yolor yolov3 yolov4 yolov5 yolov6 yolov7 yolov8 yolov8-pose yolov8-seg yolox
Last synced: 09 Nov 2024
https://github.com/DefTruth/CUDA-Learn-Notes
🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
block-reduce cuda cuda-programming elementwise flash-attention flash-attention-2 flash-attention-3 gemm gemv layernorm pytorch rmsnorm softmax triton warp-reduce
Last synced: 27 Oct 2024
https://github.com/godweiyang/NN-CUDA-Example
Several simple examples for popular neural network toolkits calling custom CUDA operators.
cpp cuda neural-network python pytorch tensorflow
Last synced: 28 Oct 2024
https://github.com/feiyull/tensorrt-alpha
🔥🔥🔥TensorRT for YOLOv8、YOLOv8-Pose、YOLOv8-Seg、YOLOv8-Cls、YOLOv7、YOLOv6、YOLOv5、YOLONAS......🚀🚀🚀CUDA IS ALL YOU NEED.🍎🍎🍎
cuda efficientdet libfacedetection rt-detr tensorrt u2net yolonas yolor yolov3 yolov4 yolov5 yolov6 yolov7 yolov8 yolov8-pose yolov8-seg yolox
Last synced: 24 Jan 2025
https://github.com/kwea123/ngp_pl
Instant-ngp in pytorch+cuda trained with pytorch-lightning (high quality with high speed, with only few lines of legible code)
3d-reconstruction cuda instant-ngp nerf novel-view-synthesis pytorch pytorch-lightning
Last synced: 26 Jan 2025
https://github.com/andyzeng/tsdf-fusion-python
Python code to fuse multiple RGB-D images into a TSDF voxel volume.
3d 3d-deep-learning 3d-reconstruction artificial-intelligence cuda depth-camera kinect-fusion rgbd tsdf vision volumetric-data
Last synced: 26 Jan 2025
https://github.com/NVIDIA/cccl
CUDA Core Compute Libraries
accelerated-computing cpp cpp-programming cuda cuda-cpp cuda-kernels cuda-library cuda-programming gpu gpu-acceleration gpu-computing gpu-programming hpc modern-cpp nvidia nvidia-gpu parallel-algorithm parallel-computing parallel-programming
Last synced: 19 Nov 2024
https://github.com/marian-nmt/marian
Fast Neural Machine Translation in C++
cuda fast gpu neural-machine-translation
Last synced: 25 Jan 2025
https://github.com/deepgraphlearning/graphvite
GraphVite: A General and High-performance Graph Embedding System
cuda data-visualization gpu knowledge-graph machine-learning network-embedding representation-learning
Last synced: 24 Jan 2025
https://github.com/aphrodite-engine/aphrodite-engine
Large-scale LLM inference engine
api-rest cuda inference-engine inferentia intel lora machine-learning rocm speculative-decoding tpu
Last synced: 22 Jan 2025
https://github.com/DeepGraphLearning/graphvite
GraphVite: A General and High-performance Graph Embedding System
cuda data-visualization gpu knowledge-graph machine-learning network-embedding representation-learning
Last synced: 07 Nov 2024
https://github.com/chengzeyi/stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
cuda deeplearnng diffusers inference-engines openai-triton performance-optimizations pytorch stable-diffusion stable-video-diffusion torch
Last synced: 23 Jan 2025
https://github.com/beehive-lab/tornadovm
TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
ai cuda gpu-acceleration gpu-computing gpus graalvm java levelzero multi-core opencl parallel-computing parallel-programming spirv
Last synced: 23 Jan 2025
https://github.com/nvidia/matx
An efficient C++17 GPU numerical computing library with Python-like syntax
cuda gpgpu gpu gpu-computing hpc
Last synced: 24 Jan 2025
https://github.com/pygmalionai/aphrodite-engine
Large-scale LLM inference engine
api-rest cuda inference-engine inferentia intel lora machine-learning rocm speculative-decoding tpu
Last synced: 02 Jan 2025
https://github.com/NVIDIA/MatX
An efficient C++17 GPU numerical computing library with Python-like syntax
cuda gpgpu gpu gpu-computing hpc
Last synced: 30 Oct 2024
https://github.com/beehive-lab/TornadoVM
TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
ai cuda gpu-acceleration gpu-computing gpus graalvm java levelzero multi-core opencl spirv
Last synced: 05 Nov 2024
https://mratsim.github.io/Arraymancer/
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
autograd automatic-differentiation cuda cudnn deep-learning gpgpu gpu-computing high-performance-computing iot linear-algebra machine-learning matrix-library multidimensional-arrays ndarray neural-networks nim opencl openmp parallel-computing tensor
Last synced: 14 Nov 2024
https://github.com/mratsim/arraymancer
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
autograd automatic-differentiation cuda cudnn deep-learning gpgpu gpu-computing high-performance-computing iot linear-algebra machine-learning matrix-library multidimensional-arrays ndarray neural-networks nim opencl openmp parallel-computing tensor
Last synced: 25 Jan 2025
https://github.com/stotko/stdgpu
stdgpu: Efficient STL-like Data Structures on the GPU
cpp cpp17 cpp20 cuda data-structures gpgpu gpu gpu-acceleration gpu-computing hip modern-cpp openmp rocm stl stl-containers stl-like
Last synced: 24 Jan 2025
https://github.com/mratsim/Arraymancer
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
autograd automatic-differentiation cuda cudnn deep-learning gpgpu gpu-computing high-performance-computing iot linear-algebra machine-learning matrix-library multidimensional-arrays ndarray neural-networks nim opencl openmp parallel-computing tensor
Last synced: 08 Nov 2024
https://github.com/luxcorerender/luxcore
LuxCore source repository
3d-graphics bidirectional-path-tracing cuda gpu-computing luxcorerender luxrender opencl optix path-tracing pathtracer ray ray-tracer ray-tracing raytracer raytracing rtx visualization
Last synced: 23 Jan 2025
https://github.com/withcatai/node-llama-cpp
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
ai bindings catai cmake cmake-js cuda embedding function-calling gguf gpu grammar json-schema llama llama-cpp llm metal nodejs prebuilt-binaries self-hosted vulkan
Last synced: 22 Jan 2025
https://github.com/fff-rs/juice
The Hacker's Machine Learning Engine
agnostic coaster cuda extinsible framework hacktoberfest juice machine-learning opencl rust
Last synced: 23 Jan 2025
https://github.com/PygmalionAI/aphrodite-engine
Large-scale LLM inference engine
api-rest cuda inference-engine inferentia intel lora machine-learning rocm speculative-decoding tpu
Last synced: 03 Nov 2024
https://github.com/uncomplicate/neanderthal
Fast Clojure Matrix Library
api clojure clojure-library cuda gpgpu gpu gpu-computing high-performance-computing java matrix matrix-calculations matrix-factorization matrix-functions matrix-multiplication opencl vectorization
Last synced: 22 Jan 2025
https://github.com/inducer/pyopencl
OpenCL integration for Python, plus shiny features
amd array cuda gpu heterogeneous-parallel-programming multidimensional-arrays nvidia opencl opengl parallel-algorithm parallel-computing performance prefix-sum pyopencl python reduction scientific-computing shared-memory sorting
Last synced: 21 Jan 2025
https://github.com/markus-perl/ffmpeg-build-script
The FFmpeg build script provides an easy way to build a static FFmpeg on OSX and Linux with non-free codecs included.
apple-m1-silicon av1 cuda debian fdk-aac ffmpeg ffmpeg-installer ffmpeg-linux ffmpeg-mac h264 h265 mp3 mp3-to-pcm ogg osx theora webm webm-conversion x264 x265
Last synced: 24 Jan 2025
https://github.com/BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
Last synced: 27 Oct 2024
https://github.com/sniklaus/sepconv-slomo
an implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch
cuda cupy deep-learning python pytorch
Last synced: 26 Jan 2025
https://github.com/gunrock/gunrock
Programmable CUDA/C++ GPU Graph Analytics
algorithm algorithms cpp cuda cxx essentials gnn gpu graph graph-algorithms graph-analytics graph-engine graph-neural-networks graph-primitives graph-processing gunrock hpc parallel-computing sparse-matrix
Last synced: 24 Jan 2025
https://github.com/anibali/docker-pytorch
A Docker image for PyTorch
cuda docker docker-image pytorch
Last synced: 26 Jan 2025
https://github.com/mrnerf/gaussian-splatting-cuda
3D Gaussian Splatting, reimagined: Unleashing unmatched speed with C++ and CUDA from the ground up!
computer-graphics computer-vision cuda gaussian-splatting nerf optimization
Last synced: 24 Jan 2025
https://github.com/neka-nat/cupoch
Robotics with GPU computing
collision-detection cuda distance-transform gpgpu gpu jetson occupancy-grid-map odometry pathfinding point-cloud pybind11 python registration robotics ros triangle-mesh visual-odometry voxel
Last synced: 23 Jan 2025
https://github.com/mp3guy/kintinuous
Real-time large scale dense visual SLAM system
Last synced: 27 Jan 2025
https://github.com/mp3guy/Kintinuous
Real-time large scale dense visual SLAM system
Last synced: 07 Nov 2024
https://github.com/acceleratehs/accelerate
Embedded language for high-performance array computations
accelerate cuda gpu gpu-computing hacktoberfest haskell llvm parallel-computing
Last synced: 22 Jan 2025
https://github.com/MrNeRF/gaussian-splatting-cuda
3D Gaussian Splatting, reimagined: Unleashing unmatched speed with C++ and CUDA from the ground up!
computer-graphics computer-vision cuda gaussian-splatting nerf optimization
Last synced: 07 Nov 2024
https://github.com/AccelerateHS/accelerate
Embedded language for high-performance array computations
accelerate cuda gpu gpu-computing hacktoberfest haskell llvm parallel-computing
Last synced: 18 Nov 2024
https://github.com/mind/wheels
Performance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)
ai avx avx2 cuda fma gpu machine-learning ml optimization sse41 sse42 tensorflow wheel
Last synced: 24 Jan 2025
https://github.com/jgbit/vuda
VUDA is a header-only library based on Vulkan that provides a CUDA Runtime API interface for writing GPU-accelerated applications.
Last synced: 27 Jan 2025
https://github.com/cyclenerd/ethereum_nvidia_miner
💰 USB flash drive ISO image for Ethereum, Zcash and Monero mining with NVIDIA graphics cards and Ubuntu GNU/Linux (headless)
cuda ethereum ethereum-mining ethminer graphics-card iso iso-image linux mining monero monero-mining nvidia nvidia-card nvidia-gpu nvidia-gpus nvidia-smi ubuntu ubuntu1604 zcash zcash-mining
Last synced: 19 Jan 2025
https://github.com/babitmf/bmf
Cross-platform, customizable multimedia/video processing framework. With strong GPU acceleration, heterogeneous design, multi-language support, easy to use, multi-framework compatible and high performance, the framework is ideal for transcoding, AI inference, algorithm integration, live video streaming, and more.
ai arm bmf bytedance cpp cross-platform cuda ffmpeg gpu heterogeneous live-video mediacodec multimedia numpy nvidia opencv python tensorrt transcode x86-64
Last synced: 23 Jan 2025
https://github.com/Cyclenerd/ethereum_nvidia_miner
💰 USB flash drive ISO image for Ethereum, Zcash and Monero mining with NVIDIA graphics cards and Ubuntu GNU/Linux (headless)
cuda ethereum ethereum-mining ethminer graphics-card iso iso-image linux mining monero monero-mining nvidia nvidia-card nvidia-gpu nvidia-gpus nvidia-smi ubuntu ubuntu1604 zcash zcash-mining
Last synced: 19 Nov 2024
https://github.com/thu-ml/sageattention
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
attention cuda inference-acceleration llm quantization triton video-generation
Last synced: 24 Jan 2025
https://github.com/e-ago/bitcracker
BitCracker is the first open source password cracking tool for memory units encrypted with BitLocker
attack bitcracker bitlocker cracking cryptography cuda decryption-algorithm gpgpu gpu hash john-the-ripper microsoft opencl password-cracker passwords windows
Last synced: 27 Jan 2025
https://github.com/zhihu/zhilight
A highly optimized LLM inference acceleration engine for Llama and its variants.
cpm cuda gpt inference-engine llama llm llm-serving minicpm pytorch qwen
Last synced: 25 Jan 2025
https://github.com/arrayfire/arrayfire-rust
Rust wrapper for ArrayFire
arrayfire cuda gpgpu gpu hpc opencl rust rust-bindings
Last synced: 22 Jan 2025
https://github.com/src-d/kmcuda
Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
afk-mc2 cuda hacktoberfest kmeans knn-search machine-learning python yinyang
Last synced: 25 Jan 2025
https://github.com/luisagroup/luisacompute
High-Performance Rendering Framework on Stream Architectures
cpu cross-platform cuda directx dsl dxr gpu graphics high-performance ispc llvm metal optix raytracing rendering rtx siggraph-asia-2022
Last synced: 23 Jan 2025
https://github.com/BabitMF/bmf
Cross-platform, customizable multimedia/video processing framework. With strong GPU acceleration, heterogeneous design, multi-language support, easy to use, multi-framework compatible and high performance, the framework is ideal for transcoding, AI inference, algorithm integration, live video streaming, and more.
ai arm bmf bytedance cpp cross-platform cuda ffmpeg gpu heterogeneous live-video mediacodec multimedia numpy nvidia opencv python tensorrt transcode x86-64
Last synced: 07 Nov 2024
https://github.com/eyalroz/cuda-api-wrappers
Thin, unified, C++-flavored wrappers for the CUDA APIs
api-wrapper cuda cuda-api-wrappers cuda-device cuda-driver cuda-driver-api cuda-programming cuda-runtime-api cuda-toolkit gpgpu gpgpu-computing gpu gpu-computing gpu-memory modern-cpp
Last synced: 09 Nov 2024
https://github.com/jasmcaus/caer
High-performance Vision library in Python. Scale your research, not boilerplate.
ai artificial-intelligence augmentation caer computer-vision cuda data-science deep-learning gpu image-classification image-processing image-segmentation machine-learning neural-network opencv python segmentation type-checking video-processing vision
Last synced: 25 Jan 2025
https://github.com/bheisler/rustacuda
Rusty wrapper for the CUDA Driver API
Last synced: 24 Jan 2025
https://github.com/bheisler/RustaCUDA
Rusty wrapper for the CUDA Driver API
Last synced: 05 Nov 2024
https://github.com/qengineering/jetson-nano-ubuntu-20-image
Jetson Nano with Ubuntu 20.04 image
cuda deep-learning jetson-nano opencv pytorch sd-card-image team-viewer tensorflow tensorrt torch torchvision ubuntu2004
Last synced: 24 Jan 2025
https://github.com/ddemidov/amgcl
C++ library for solving large sparse linear systems with algebraic multigrid method
amg c-plus-plus cpp cuda gpgpu linear-solvers mpi multigrid opencl openmp scientific-computing sparse-linear-systems
Last synced: 25 Jan 2025
https://github.com/rapidsai/raft
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
anns building-blocks clustering cuda distance gpu information-retrieval linear-algebra llm machine-learning nearest-neighbors neighborhood-methods primitives random-sampling solvers sparse statistics vector-search vector-similarity vector-store
Last synced: 23 Jan 2025
https://github.com/andyzeng/tsdf-fusion
Fuse multiple depth frames into a TSDF voxel volume.
3d 3d-deep-learning 3d-reconstruction artificial-intelligence cuda depth-camera kinect-fusion rgbd tsdf vision volumetric-data
Last synced: 26 Jan 2025
https://github.com/QPT-Family/QPT
[内测中]QPT - 致力于让开源项目更好通往互联网世界的Python to EXE工具(Python打包)。
cuda deep-learning dml gpu noavx paddlepaddle pypi python qpt
Last synced: 18 Dec 2024
https://github.com/qpt-family/qpt
[内测中]QPT - 致力于让开源项目更好通往互联网世界的Python to EXE工具(Python打包)。
cuda deep-learning dml gpu noavx paddlepaddle pypi python qpt
Last synced: 27 Jan 2025
https://github.com/mryab/efficient-dl-systems
Efficient Deep Learning Systems course materials (HSE, YSDA)
cuda deep-learning distributed-training efficient-deep-learning machine-learning ml-infrastructure mlops pytorch
Last synced: 27 Jan 2025
https://github.com/LuisaGroup/LuisaCompute
High-Performance Rendering Framework on Stream Architectures
cpu cross-platform cuda directx dsl dxr gpu graphics high-performance ispc llvm metal optix raytracing rendering rtx siggraph-asia-2022
Last synced: 20 Nov 2024
https://github.com/efeslab/nanoflow
A throughput-oriented high-performance serving framework for LLMs
cuda inference llama2 llm llm-serving model-serving
Last synced: 25 Jan 2025
https://github.com/xmrig/xmrig-nvidia
Monero (XMR) NVIDIA miner
aeon cryptonight cuda electroneum gpu-mining monero nvidia-miner sumokoin xmr xmrig
Last synced: 22 Jan 2025
https://github.com/ddemidov/vexcl
VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP
c-plus-plus cpp11 cuda gpgpu opencl scientific-computing
Last synced: 24 Jan 2025
https://github.com/cresset-template/cresset
Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.
build cuda deep-learning deep-learning-tutorial docker docker-compose machine-learning makefile mlops mlops-template python pytorch source source-python template template-repository wheel
Last synced: 05 Nov 2024
https://github.com/xtra-computing/thundergbm
ThunderGBM: Fast GBDTs and Random Forests on GPUs
cuda gbdt gpu machine-learning random-forest
Last synced: 22 Jan 2025
https://github.com/mp3guy/icpcuda
Super fast implementation of ICP in CUDA for compute capable devices 3.5 or higher
Last synced: 25 Jan 2025
https://github.com/Xtra-Computing/thundergbm
ThunderGBM: Fast GBDTs and Random Forests on GPUs
cuda gbdt gpu machine-learning random-forest
Last synced: 07 Nov 2024
https://github.com/santosh-gupta/speedtorch
Library for faster pinned CPU <-> GPU transfer in Pytorch
cpu-gpu-transfer cpu-pinned-tensors cuda cuda-tensors cuda-variables cupy data-transfer embeddings embeddings-trained gpu gpu-transfer machine-learning natural-language-processing nlp pinned-cpu-tensors pytorch pytorch-tensors pytorch-variables sparse sparse-modeling
Last synced: 22 Jan 2025
https://github.com/coreylowman/cudarc
Safe rust wrapper around CUDA toolkit
cublas cuda cuda-kernels cuda-programming cuda-toolkit cudnn curand gpu gpu-acceleration nccl nvrtc rust
Last synced: 23 Jan 2025
https://github.com/shibatch/sleef
SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT
aarch64 android arm avx avx512 cuda elementary-functions fft ios math-library neon powerpc quadruple-precision s390x simd sse2 sve vector-math vectorization vsx
Last synced: 24 Jan 2025
https://github.com/thu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
attention cuda inference-acceleration llm quantization triton video-generation
Last synced: 15 Dec 2024
https://github.com/Santosh-Gupta/SpeedTorch
Library for faster pinned CPU <-> GPU transfer in Pytorch
cpu-gpu-transfer cpu-pinned-tensors cuda cuda-tensors cuda-variables cupy data-transfer embeddings embeddings-trained gpu gpu-transfer machine-learning natural-language-processing nlp pinned-cpu-tensors pytorch pytorch-tensors pytorch-variables sparse sparse-modeling
Last synced: 15 Nov 2024
https://github.com/maghoumi/pytorch-softdtw-cuda
Fast CUDA implementation of (differentiable) soft dynamic time warping for PyTorch
cuda deep-learning dynamic-time-warping pytorch soft-dtw
Last synced: 26 Jan 2025
https://github.com/hedronvision/bazel-compile-commands-extractor
Goal: Enable awesome tooling for Bazel users of the C language family.
bazel bazel-build c ccls clang clang-tidy clang-tooling clangd contributions-welcome cpp cross-platform cuda hacktoberfest objective-c objective-c-plus-plus tools
Last synced: 28 Oct 2024