An open API service indexing awesome lists of open source software.

Projects in Awesome Lists by xlite-dev

A curated list of projects in awesome lists by xlite-dev .

https://github.com/deftruth/cuda-learn-notes

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

cuda cuda-12 cuda-cpp cuda-demo cuda-kernel cuda-kernels cuda-library cuda-toolkit flash-attention hgemm learn-cuda leet-cuda

Last synced: 14 May 2025

https://github.com/xlite-dev/lite.ai.toolkit

🛠 A lite C++ AI toolkit: 100+🎉 models (Stable-Diffusion, Face-Fusion, YOLO series, Det, Seg, Matting) with MNN, ORT and TRT.

facefusion mnn mnn-model ncnn onnx onnxruntime robustvideomatting stable-diffusion tensorrt tnn yolov5 yolov6 yolov8 yolox

Last synced: 13 May 2025

https://github.com/xlite-dev/cuda-learn-notes

📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm

Last synced: 15 Apr 2025

https://github.com/xlite-dev/CUDA-Learn-Notes

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm

Last synced: 26 Mar 2025

https://github.com/xlite-dev/statistic-learning-r-note

📒统计学习方法-李航: 笔记-从原理到实现, Book, 200-page PDF with detailed explanations of various math formulas, implemented in R.🎉

lihang ml r statistic-notes statistics statistics-learning

Last synced: 16 May 2025

https://github.com/xlite-dev/torchlm

💎A high level python lib for face landmarks detection: training, eval, export, inference(Python/C++) and 100+ data augmentations.

albumentations data-augmentation face-landmarks heatmap mobilenet pip pipnet regression shufflenet torchvision yolov5 yolov6 yolov7 yolox

Last synced: 13 Dec 2025

https://github.com/xlite-dev/ffpa-attn

📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.

attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores

Last synced: 11 Jun 2025

https://github.com/deftruth/ffpa-attn-mma

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) SRAM complexity large headdim (D > 256), ~2x↑🎉vs SDPA EA.

attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores

Last synced: 06 Apr 2025

https://github.com/xlite-dev/ffpa-attn-mma

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores

Last synced: 30 Mar 2025

https://github.com/xlite-dev/RVM-Inference

🔥Robust Video Matting C++ inference toolkit with ONNXRuntime、MNN、NCNN and TNN, via lite.ai.toolkit.

cpp matting mnn ncnn onnx onnxruntime robustvideomatting tnn

Last synced: 09 Oct 2025

https://github.com/xlite-dev/hgemm

⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

cuda hgemm tensor-cores

Last synced: 11 Jun 2025

https://github.com/xlite-dev/scrfd-toolkit

Super fast accurate face detector ! SCRFD(CVPR 2021) with MNN/TNN/NCNN/ONNXRuntime C++.

mnn ncnn onnxruntime scrfd tnn

Last synced: 04 Oct 2025

https://github.com/xlite-dev/fsanet-toolkit

🍅🍅FSANet: 1 Mb!! Head Pose Estimation with MNN、TNN and ONNXRuntime C++. (https://github.com/DefTruth/lite.ai.toolkit)

fsanet

Last synced: 07 Mar 2026

https://github.com/xlite-dev/netron-vscode-extension

☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.

deeplearning mnn ncnn netron netron-vscode netron-vscode-extension onnx onnxruntime paddle tnn vscode-netron yolov5 yolov8

Last synced: 26 Mar 2025

https://github.com/xlite-dev/yolop-toolkit

YOLOP with ONNXRuntime C++/MNN/TNN/NCNN

mnn ncnn onnxruntime tnn yolop

Last synced: 25 Sep 2025

https://github.com/xlite-dev/mgmatting-toolkit

🍅MGMatting with MNN/TNN/ONNXRuntime C++, GPU/CPU, support dynamic shape. (https://github.com/DefTruth/lite.ai.toolkit)

cpp matting mgmatting mnn onnxruntime tnn

Last synced: 12 Jul 2025

https://github.com/xlite-dev/ssrnet-toolkit

🍅🍅 SSRNet: 190 Kb!! Super fast Age Estimation with MNN/TNN/ONNXRuntime C++.

ssrnet

Last synced: 12 Jun 2025

https://github.com/xlite-dev/xlite-cli

The cli version of lite.ai.toolkit

Last synced: 11 Jun 2025

https://github.com/xlite-dev/.github

Last synced: 08 Feb 2026