Projects in Awesome Lists by xlite-dev
A curated list of projects in awesome lists by xlite-dev .
https://github.com/xlite-dev/leetcuda
📚LeetCUDA: 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.
cuda cuda-12 cuda-cpp cuda-demo cuda-kernel cuda-kernels cuda-library cuda-toolkit flash-attention hgemm learn-cuda leet-cuda
Last synced: 13 Feb 2026
https://github.com/deftruth/cuda-learn-notes
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥
cuda cuda-12 cuda-cpp cuda-demo cuda-kernel cuda-kernels cuda-library cuda-toolkit flash-attention hgemm learn-cuda leet-cuda
Last synced: 14 May 2025
https://github.com/xlite-dev/lite.ai.toolkit
🛠 A lite C++ AI toolkit: 100+🎉 models (Stable-Diffusion, Face-Fusion, YOLO series, Det, Seg, Matting) with MNN, ORT and TRT.
facefusion mnn mnn-model ncnn onnx onnxruntime robustvideomatting stable-diffusion tensorrt tnn yolov5 yolov6 yolov8 yolox
Last synced: 13 May 2025
https://github.com/xlite-dev/cuda-learn-notes
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm
Last synced: 15 Apr 2025
https://github.com/xlite-dev/CUDA-Learn-Notes
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm
Last synced: 26 Mar 2025
https://github.com/xlite-dev/statistic-learning-r-note
📒统计学习方法-李航: 笔记-从原理到实现, Book, 200-page PDF with detailed explanations of various math formulas, implemented in R.🎉
lihang ml r statistic-notes statistics statistics-learning
Last synced: 16 May 2025
https://github.com/xlite-dev/torchlm
💎A high level python lib for face landmarks detection: training, eval, export, inference(Python/C++) and 100+ data augmentations.
albumentations data-augmentation face-landmarks heatmap mobilenet pip pipnet regression shufflenet torchvision yolov5 yolov6 yolov7 yolox
Last synced: 13 Dec 2025
https://github.com/xlite-dev/ffpa-attn
📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.
attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores
Last synced: 11 Jun 2025
https://github.com/deftruth/ffpa-attn-mma
📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) SRAM complexity large headdim (D > 256), ~2x↑🎉vs SDPA EA.
attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores
Last synced: 06 Apr 2025
https://github.com/xlite-dev/ffpa-attn-mma
📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.
attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores
Last synced: 30 Mar 2025
https://github.com/xlite-dev/RVM-Inference
🔥Robust Video Matting C++ inference toolkit with ONNXRuntime、MNN、NCNN and TNN, via lite.ai.toolkit.
cpp matting mnn ncnn onnx onnxruntime robustvideomatting tnn
Last synced: 09 Oct 2025
https://github.com/xlite-dev/hgemm
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
Last synced: 11 Jun 2025
https://github.com/xlite-dev/yolov5face-toolkit
🍅 YOLO5Face 2021 with MNN/NCNN/TNN/ONNXRuntime
cpp yolo5-face yolo5face yolov5-face yolov5-face-landmark yolov5face yolov7-face yolov8-face
Last synced: 19 Oct 2025
https://github.com/xlite-dev/scrfd-toolkit
Super fast accurate face detector ! SCRFD(CVPR 2021) with MNN/TNN/NCNN/ONNXRuntime C++.
mnn ncnn onnxruntime scrfd tnn
Last synced: 04 Oct 2025
https://github.com/xlite-dev/fsanet-toolkit
🍅🍅FSANet: 1 Mb!! Head Pose Estimation with MNN、TNN and ONNXRuntime C++. (https://github.com/DefTruth/lite.ai.toolkit)
Last synced: 07 Mar 2026
https://github.com/xlite-dev/netron-vscode-extension
☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.
deeplearning mnn ncnn netron netron-vscode netron-vscode-extension onnx onnxruntime paddle tnn vscode-netron yolov5 yolov8
Last synced: 26 Mar 2025
https://github.com/xlite-dev/yolop-toolkit
YOLOP with ONNXRuntime C++/MNN/TNN/NCNN
mnn ncnn onnxruntime tnn yolop
Last synced: 25 Sep 2025
https://github.com/xlite-dev/mgmatting-toolkit
🍅MGMatting with MNN/TNN/ONNXRuntime C++, GPU/CPU, support dynamic shape. (https://github.com/DefTruth/lite.ai.toolkit)
cpp matting mgmatting mnn onnxruntime tnn
Last synced: 12 Jul 2025
https://github.com/xlite-dev/ssrnet-toolkit
🍅🍅 SSRNet: 190 Kb!! Super fast Age Estimation with MNN/TNN/ONNXRuntime C++.
Last synced: 12 Jun 2025