Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/NVIDIA-Genomics-Research/GenomeWorks

SDK for GPU accelerated genome assembly and analysis

alignment cuda genomics gpu mapping nvidia partial-order-alignment poa python-api

Last synced: 15 Nov 2024

https://github.com/GoodAI/BrainSimulator

Brain Simulator is a platform for visual prototyping of artificial intelligence architectures.

ai brain-simulator cuda machine-learning

Last synced: 20 Nov 2024

https://github.com/gezp/docker-ubuntu-desktop

Docker Image for Ubuntu Desktop which support HW GPU accelerated GUI apps. you can access the Container with ssh or remote desktop, just like Cloud VM.

cuda docker kasmvnc nomachine nvidia-gpu opengl remote-desktop ubuntu virtualgl

Last synced: 07 Nov 2024

https://github.com/JuliaGPU/CuArrays.jl

A Curious Cumulation of CUDA Cuisine

cuda gpu-programming julia

Last synced: 29 Nov 2024

https://github.com/bytedance/flux

A fast communication-overlapping library for tensor parallelism on GPUs.

cuda cutlass gpu pytorch

Last synced: 25 Jan 2025

https://github.com/rentainhe/pytorch-distributed-training

Simple tutorials on Pytorch DDP training

apex cuda ddp-training deep-learning pytorch

Last synced: 22 Jan 2025

https://github.com/ashvardanian/less_slow.cpp

Learning how to write "Less Slow" code in C++ 20, C 99, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

assembly assembly-language avx512 benchmark coroutines cpp cpp-programming cpp17 cpp20 cuda gcc google-benchmark hpc io-uring linux-kernel llvm ranges tutorial tutorials

Last synced: 27 Jan 2025

https://github.com/nvidia/cuda-checkpoint

CUDA checkpoint and restore utility

checkpoint cuda

Last synced: 22 Jan 2025

https://github.com/zjhellofss/kuiperllama

校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

cpp cuda inference-engine llama2 llama3 llm llm-inference qwen qwen2

Last synced: 27 Jan 2025

https://github.com/llnl/blt

A streamlined CMake build system foundation for developing HPC software

blt build-system build-tools cmake cpp cuda hpc radiuss testing

Last synced: 25 Jan 2025

https://github.com/pcb9382/FaceAlgorithm

face detection face recognition包含人脸检测(retinaface,yolov5face,yolov7face,yolov8face),人脸检测跟踪(ByteTracker),人脸角度计算(Face_Angle)人脸矫正(Face_Aligner),人脸识别(Arcface),口罩检测(MaskRecognitiion),年龄性别检测(Gender_age),静默活体检测(Silent_Face_Anti_Spoofing),FaceAlignment(106keypoints)

cuda face-alignment face-detection face-recognition tensorrt yolov5face yolov7face yolov8face

Last synced: 27 Oct 2024

https://github.com/LLNL/blt

A streamlined CMake build system foundation for developing HPC software

blt build-system build-tools cmake cpp cuda hpc radiuss testing

Last synced: 09 Nov 2024

https://github.com/trinkle23897/fast-poisson-image-editing

A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

cpp cuda high-performance-computing image-processing jacobi-iteration jacobi-method mpi numpy openmp parallel-computing poisson-image-editing pybind11 python

Last synced: 26 Jan 2025

https://github.com/marian-nmt/marian-dev

Fast Neural Machine Translation in C++ - development repository

cpp11 cuda fast gpu-acceleration neural-machine-translation

Last synced: 25 Jan 2025

https://github.com/zjhellofss/KuiperLLama

校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

cpp cuda inference-engine llama2 llama3 llm llm-inference qwen qwen2

Last synced: 03 Jan 2025

https://github.com/Trinkle23897/Fast-Poisson-Image-Editing

A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

cpp cuda high-performance-computing image-processing jacobi-iteration jacobi-method mpi numpy openmp parallel-computing poisson-image-editing pybind11 python

Last synced: 03 Nov 2024

https://github.com/jcuda/jcuda

JCuda - Java bindings for CUDA

cuda gpu java

Last synced: 26 Jan 2025

https://github.com/AmusementClub/vs-mlrt

Efficient CPU/GPU/Vulkan ML Runtimes for VapourSynth (with built-in support for waifu2x, DPIR, RealESRGANv2/v3, Real-CUGAN, RIFE, SCUNet and more!)

artificial-intelligence cuda deep-learning directml dpir gpu migraphx ncnn neural-network onnx onnxruntime openvino real-cugan real-esrgan rife tensorrt vapoursynth vulkan waifu2x

Last synced: 29 Oct 2024

https://github.com/koide3/gtsam_points

A collection of GTSAM factors and optimizers for point cloud SLAM

bundle-adjustment continuous-time cuda factor-graph gpu gtsam kdtree localization mapping point-cloud registration slam voxelmap

Last synced: 25 Jan 2025

https://github.com/ritchieng/dlami

A Deep Learning Amazon Web Service (AWS) AMI that is open, free and works. Run in less than 5 minutes. TensorFlow, Keras, PyTorch, Theano, MXNet, CNTK, Caffe and all dependencies.

ami aws cuda cudnn5 keras python tensorflow ubuntu

Last synced: 26 Jan 2025

https://github.com/shapelets/khiva

An open-source library of algorithms to analyse time series in GPU and CPU.

clustering cpp cuda data-series discords distances gpu khiva kshape matrix-profile motifs multicore opencl shapelets snippets time-series timeseries

Last synced: 27 Dec 2024

https://github.com/pmeier/light-the-torch

Install PyTorch distributions with computation backend auto-detection

cuda install pip pytorch

Last synced: 25 Jan 2025

https://github.com/opendilab/di-hpc

OpenDILab RL HPC OP Lib, including CUDA and Triton kernel

cuda hpc lstm pytorch reinforcement-learning triton

Last synced: 21 Jan 2025

https://github.com/marnovo/macos-egpu-cuda-guide

Set up CUDA for machine learning (and gaming) on macOS using a NVIDIA eGPU

apple cuda deep-learning egpu gaming gpu guide hacktoberfest mac machine-learning macos nvidia

Last synced: 19 Dec 2024

https://github.com/marnovo/macOS-eGPU-CUDA-guide

Set up CUDA for machine learning (and gaming) on macOS using a NVIDIA eGPU

apple cuda deep-learning egpu gaming gpu guide hacktoberfest mac machine-learning macos nvidia

Last synced: 22 Nov 2024

https://github.com/Hellisotherpeople/CX_DB8

a contextual, biasable, word-or-sentence-or-paragraph extractive summarizer powered by the latest in text embeddings (Bert, Universal Sentence Encoder, Flair)

contextual-summarization cuda debate-evidence embeddings extractive-summarization flair python semantic-search semantic-summarization summarization summarizer token-level-summarization universal-sentence-encoder

Last synced: 22 Nov 2024

https://github.com/bh107/bohrium

Automatic parallelization of Python/NumPy, C, and C++ codes on Linux and MacOSX

cuda gpu gpu-acceleration multi-core numpy opencl parallel-computing

Last synced: 12 Nov 2024

https://github.com/bytedance/abq-llm

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

cuda llm-inference mlsys quantized-networks research

Last synced: 28 Jan 2025

https://github.com/openucx/ucc

Unified Collective Communication Library

collectives cuda deep-learning hpc infiniband mpi openshmem pgas pytorch roce sharp

Last synced: 24 Jan 2025

https://github.com/DeMoriarty/TorchPQ

Approximate nearest neighbor search with product quantization on GPU in pytorch and cuda

cuda nearest-neighbor-search pytorch

Last synced: 02 Nov 2024

https://github.com/andrewkchan/yalm

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

cpp cuda inference-engine llama llamacpp llm llm-inference machine-learning mistral

Last synced: 25 Jan 2025

https://github.com/modelscope/dash-infer

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.

cpu cuda guided-decoding llm llm-inference native-engine

Last synced: 26 Jan 2025

https://github.com/demoriarty/torchpq

Approximate nearest neighbor search with product quantization on GPU in pytorch and cuda

cuda nearest-neighbor-search pytorch

Last synced: 26 Jan 2025

https://github.com/helmut-hoffer-von-ankershoffen/jetson

Helmut Hoffer von Ankershoffen experimenting with arm64 based NVIDIA Jetson (Nano and AGX Xavier) edge devices running Kubernetes (K8s) for machine learning (ML) including Jupyter Notebooks, TensorFlow Training and TensorFlow Serving using CUDA for smart IoT.

ansible archiconda cuda docker edge-devices hoffer-von-ankershoffen jupyter k8s kubeflow kubernetes kustomize machine-learning ml nvidia-jetson-nano nvidia-jetson-xavier skaffold smart-iot software-engineering tensorflow-serving virtualbox

Last synced: 07 Jan 2025

https://github.com/turlucode/ros-docker-gui

ROS Docker Containers with X11 (GUI) support [Linux]

cuda docker gui nvidia ros ros2

Last synced: 26 Jan 2025

https://github.com/CEED/libCEED

CEED Library: Code for Efficient Extensible Discretizations

api ceed cuda ecp exascale-computing gpu high-order high-performance-computing hpc julia linear-algebra

Last synced: 14 Nov 2024

https://github.com/ceed/libceed

CEED Library: Code for Efficient Extensible Discretizations

api ceed cuda ecp exascale-computing gpu high-order high-performance-computing hpc julia linear-algebra

Last synced: 24 Jan 2025

https://github.com/QINZHAOYU/CudaSteps

基于《cuda编程-基础与实践》(樊哲勇 著)的cuda学习之路。

cuda gpu nvidia

Last synced: 19 Nov 2024

https://github.com/dividiti/ck-caffe

Collective Knowledge workflow for Caffe to automate installation across diverse platforms and to collaboratively evaluate and optimize Caffe-based workloads across diverse hardware, software and data sets (compilers, libraries, tools, models, inputs):

accuracy android caffe collaborative-optimization collective-knowledge costs cuda customizable-workflows dnn-as-a-service dnn-optimization json-api linux opencl performance-portability portable-package-manager reproducible-experiments resources windows

Last synced: 13 Nov 2024

https://github.com/nobuyuki83/delfem2

Research prototyping framework for physics simulation written in C++

cuda fem-simulation finite-element-methods geometry-processing opengl physics-simulation simulation

Last synced: 26 Jan 2025

https://github.com/mkeeter/mpr

Reference implementation for "Massively Parallel Rendering of Complex Closed-Form Implicit Surfaces" (SIGGRAPH 2020)

cad cuda gpu implicit-surfaces rendering

Last synced: 27 Oct 2024

https://github.com/rapidsai/node

GPU-accelerated data science and visualization in node

cuda data-science data-visualization gpgpu gpu nodejs

Last synced: 27 Jan 2025

https://github.com/LambdaLabsML/distributed-training-guide

Best practices & guides on how to write distributed pytorch training code

cluster cuda deepspeed distributed-training fsdp gpu gpu-cluster kuberentes lambdalabs mpi nccl pytorch sharding slurm

Last synced: 21 Oct 2024

https://github.com/wangzyon/NVIDIA_SGEMM_PRACTICE

Step-by-step optimization of CUDA SGEMM

cuda sgemm

Last synced: 05 Nov 2024

https://github.com/zhongkaifu/seq2seqsharp

Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.

attention-model cuda deep-learning encoder-decoder gpu image lstm machine-translation neural-network seq2seq sequence-to-sequence tensor text transformer transformer-architecture transformer-encoder translation vision-transformer

Last synced: 25 Jan 2025

https://github.com/lxxue/frnn

Fixed Radius Nearest Neighbor Search on GPU

cuda nearest-neighbor-search pytorch

Last synced: 24 Jan 2025

https://github.com/toruniina/lbvh

an implementation of parallel linear BVH (LBVH) on GPU

bvh cuda gpu nearest-neighbor-search parallel thrust

Last synced: 20 Dec 2024

https://github.com/nvidia/gmat

A toolkit showing GPU's all-round capability in video processing

codec cpp cuda deep-learning ffmpeg gpu image-processing nvidia video video-codec

Last synced: 03 Jan 2025

https://github.com/xmrminer/xmrminer

:ant: A CUDA based miner for Monero

cuda gpu monero nvidia xmr

Last synced: 10 Jan 2025

https://github.com/xmrMiner/xmrMiner

:ant: A CUDA based miner for Monero

cuda gpu monero nvidia xmr

Last synced: 18 Nov 2024

https://github.com/NVIDIA/GMAT

A toolkit showing GPU's all-round capability in video processing

codec cpp cuda deep-learning ffmpeg gpu image-processing nvidia video video-codec

Last synced: 05 Nov 2024

https://github.com/pykeio/diffusers

A modular Rust library for super fast Stable Diffusion inference - 45% faster than PyTorch 🔮

cuda diffusion-models onnx onnxruntime onnxruntime-gpu rust stable-diffusion stable-diffusion-v2

Last synced: 31 Oct 2024

https://github.com/hmunachi/cuda-repo

From zero to hero CUDA for accelerating maths and machine learning on GPU.

cuda cuda-kernels cuda-programming machine-learning maths

Last synced: 26 Jan 2025

https://github.com/zpzim/scamp

The fastest way to compute matrix profiles on CPU and GPU!

cuda gpu matrix-profile python time-series time-series-analysis

Last synced: 27 Jan 2025

https://github.com/yilingqiao/dmrf

Dynamic Mesh-Aware Radiance Fields (ICCV2023): Raytracing rendering and interactive simulating mesh with NeRF

cuda nerf raytracing simulation

Last synced: 23 Jan 2025

https://github.com/cuMF/cumf_als

CUDA Matrix Factorization Library with Alternating Least Square (ALS)

als cuda gpu machine machine-learning matrix-factorization

Last synced: 13 Nov 2024

https://github.com/cnugteren/cltune

CLTune: An automatic OpenCL & CUDA kernel tuner

auto-tuning cuda opencl tuner

Last synced: 19 Dec 2024

https://github.com/HMUNACHI/cuda-repo

From zero to hero CUDA for accelerating maths and machine learning on GPU.

cuda cuda-kernels cuda-programming machine-learning maths

Last synced: 12 Nov 2024

https://github.com/librapid/librapid

A highly optimised C++ library for mathematical applications and neural networks.

array cpp cpp20 cpp23 cuda gpu high-performance-computing library matrix multidimensional-arrays multithreading parallel-programming pypy pypy3 python python3 simd

Last synced: 24 Jan 2025

https://github.com/LibRapid/librapid

A highly optimised C++ library for mathematical applications and neural networks.

array cpp cpp20 cpp23 cuda gpu high-performance-computing library matrix multidimensional-arrays multithreading parallel-programming pypy pypy3 python python3 simd

Last synced: 06 Dec 2024

https://github.com/nvidia/dl4agx

Deep Learning tools and applications for NVIDIA AGX platforms.

autonomous-driving computer-vision cuda deep-learning drive-agx embedded

Last synced: 25 Jan 2025

https://github.com/rocm/gpufort

GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify

cuda cuda-fortran fortran gpgpu gpu hip interoperability openacc openmp rocm

Last synced: 19 Dec 2024

https://github.com/rocm/rocprim

ROCm Parallel Primitives

amd cuda gpu hip parallel primitive rocm

Last synced: 26 Jan 2025

https://github.com/ROCm/gpufort

GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify

cuda cuda-fortran fortran gpgpu gpu hip interoperability openacc openmp rocm

Last synced: 23 Oct 2024

https://github.com/jimver/cuda-toolkit

GitHub Action to install CUDA

action cuda cuda-toolkit github-actions nvidia nvidia-cuda

Last synced: 24 Jan 2025

https://github.com/qengineering/install-opencv-jetson-nano

OpenCV installation script with CUDA and cuDNN support

cuda cudnn jetson-nano jetson-xavier opencv opencv4

Last synced: 28 Jan 2025

https://github.com/pythonlessons/tensorflow-object-detection-tutorial

The purpose of this tutorial is to learn how to install and prepare TensorFlow framework to train your own convolutional neural network object detection classifier for multiple objects, starting from scratch

classifier cuda cudnn detection detection-api detection-classifier detection-tutorial gpu grabscreen labels object-detection pil python-mss tensorflow tensorflow-cpu tensorflow-gpu tensorflow-models tutorial

Last synced: 09 Oct 2024

https://github.com/dvlab-research/SparseTransformer

A fast and memory-efficient libarary for sparse transformer with varying token numbers (e.g., 3D point cloud).

3d-point-cloud cuda sparse-transformer transformer

Last synced: 28 Oct 2024

https://github.com/proger/accelerated-scan

Accelerated First Order Parallel Associative Scan

cuda cumulative-sum recurrent-neural-networks state-space-model torch

Last synced: 04 Nov 2024

https://github.com/hijkzzz/cuda-neural-network

Convolutional Neural Network with CUDA (MNIST 99.23%)

cnn cpp cuda mnist neural-network

Last synced: 12 Nov 2024

https://github.com/sjtu-ipads/phoenixos

Fast OS-level support for GPU checkpoint and restore

checkpoint-restore criu cuda gpu

Last synced: 24 Jan 2025

https://github.com/coderonion/awesome-cuda-and-hpc

🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT and High Performance Computing (HPC) projects.

awesome blas cublas cuda cudnn fortran gemm gpu hpc lapack llama llm mojo numpy openblas parallel-computing pytorch scipy tensorrt yolo

Last synced: 05 Oct 2024

https://github.com/patwie/cuda-design-patterns

Some CUDA design patterns and a bit of template magic for CUDA

bazel cpp11 cuda cuda-development cuda-device cuda-kernels cuda-utils gpu template-metaprogramming

Last synced: 01 Nov 2024

https://github.com/CHIP-SPV/chipStar

chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.

cuda hip hpc level0 llvm opencl spir-v

Last synced: 05 Nov 2024

https://github.com/chenhunghan/ialacol

🪶 Lightweight OpenAI drop-in replacement for Kubernetes

ai cloudnative cuda ggml gptq gpu helm kubernetes langchain llamacpp llm llm-inference llm-serving openai python

Last synced: 20 Jan 2025

https://github.com/bobmcdear/attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

cuda deep-learning machine-learning openai openai-triton pytorch triton

Last synced: 20 Dec 2024

https://github.com/kibae/onnxruntime-server

ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.

ai contributions-welcome cuda deep-learning inference-server machine-learning nueral-networks onnx onnxruntime

Last synced: 27 Jan 2025

https://github.com/BobMcDear/attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

cuda deep-learning machine-learning openai openai-triton pytorch triton

Last synced: 21 Dec 2024

https://github.com/libmir/dcompute

DCompute: Native execution of D on GPUs and other Accelerators

cuda d fpga gpgpu gpu ldc opencl

Last synced: 19 Dec 2024

https://github.com/Dr-Noob/gpufetch

Simple yet fancy GPU architecture fetching tool

cuda gpu igpu intel nvidia

Last synced: 02 Nov 2024

https://github.com/1461521844lijin/trt_yolo_video_pipeline

TensorRT+YOLO系列的 多路 多卡 多实例 并行视频分析处理案例

cuda ffmpeg opencv video-processing yolo yolov8

Last synced: 26 Nov 2024