Projects in Awesome Lists tagged with mlsys
A curated list of projects in awesome lists tagged with mlsys .
https://github.com/nunchaku-ai/nunchaku
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
comfyui diffusion-models flux genai iclr iclr2025 lora mlsys quantization
Last synced: 13 Feb 2026
https://github.com/HuaizhengZhang/Awesome-System-for-Machine-Learning
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑💻 Video Tutorials.
ai-infra genai large-language-models llmsys mlsys model-serving model-training
Last synced: 09 Apr 2025
https://github.com/nunchaku-tech/ComfyUI-nunchaku
ComfyUI Plugin of Nunchaku
comfyui diffusion flux genai mlsys quantization
Last synced: 02 Sep 2025
https://github.com/mit-han-lab/nunchaku
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
comfyui diffusion-models flux genai iclr iclr2025 lora mlsys quantization
Last synced: 13 May 2025
https://github.com/thu-ml/sageattention
Quantized Attention achieves speedup of 2-3x and 3-5x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
attention cuda efficient-attention inference-acceleration llm llm-infra mlsys quantization triton video-generate video-generation vit
Last synced: 14 May 2025
https://github.com/SymbioticLab/FedScale
FedScale is a scalable and extensible open-source federated learning (FL) platform.
benchmark dataset deep-learning deployment distributed federated-learning icml machine-learning mlsys osdi pytorch tensorflow
Last synced: 02 May 2025
https://github.com/bytedance/byteir
A model compilation solution for various hardware
llm llvm mlir mlsys onnx pytorch tensorflow
Last synced: 05 Apr 2025
https://github.com/sbu-fsl/kernel-ml
Machine Learning Framework for Operating Systems - Brings ML to Linux kernel
auto-tuning kernel-module linux-kernel machine-learning mlsys operating-systems
Last synced: 16 Jul 2025
https://github.com/ml-energy/zeus
Deep Learning Energy Measurement and Optimization
Last synced: 31 Mar 2025
https://github.com/bytedance/abq-llm
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
cuda llm-inference mlsys quantized-networks research
Last synced: 04 Apr 2025
https://github.com/MLSysOps/alaas
A scalable & efficient active learning/data selection system for everyone.
active-learning automl deep-learning machine-learning mlops mlsys pytorch
Last synced: 02 May 2025
https://github.com/huaizhengzhang/active-learning-as-a-service
A scalable & efficient active learning/data selection system for everyone.
active-learning automl deep-learning machine-learning mlops mlsys pytorch
Last synced: 09 Apr 2025
https://github.com/HuaizhengZhang/Active-Learning-as-a-Service
A scalable & efficient active learning/data selection system for everyone.
active-learning automl deep-learning machine-learning mlops mlsys pytorch
Last synced: 08 May 2025
https://github.com/xlite-dev/ffpa-attn
📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.
attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores
Last synced: 11 Jun 2025
https://github.com/deftruth/ffpa-attn-mma
📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) SRAM complexity large headdim (D > 256), ~2x↑🎉vs SDPA EA.
attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores
Last synced: 06 Apr 2025
https://github.com/xlite-dev/ffpa-attn-mma
📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.
attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores
Last synced: 30 Mar 2025
https://github.com/xiyanghu/OSDT
Optimal Sparse Decision Trees
accelerate acceleration-model algorithm algorithm-optimization data-mining data-science interpretable-ml machine-learning ml-system mlsys neurips python python3
Last synced: 27 Mar 2025
https://github.com/guanhuawang/sensai
sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data
cifar-10 cifar-100 cifar10 cifar100 cnn-classification deep-learning deep-neural-networks distributed-deep-learning distributed-machine-learning distributed-systems imagenet imagenet1k machine-learning mlsys mobilenet-v2 resnet shufflenet-v2 sysml vgg
Last synced: 15 Apr 2025
https://github.com/tanyuqian/redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
differential-privacy diffusion-models distributed-training fedavg federated-learning flan-t5-xxl gemma image-captioning jax large-language-models llama maml meta-learning mixed-precision mlsys model-parallelism ppo reinforcement-learning seq2seq stable-diffusion
Last synced: 06 Apr 2025
https://github.com/DefTruth/ffpa-attn-mma
📚[WIP] FFPA: Yet antother Faster Flash Prefill Attention with O(1)⚡️GPU SRAM complexity for headdim > 256, 1.8x~3x↑🎉faster vs SDPA EA.
attention cuda flash-attention mlsys sdpa tensor-cores
Last synced: 09 Oct 2025
https://github.com/actypedef/mixedgemm
a mixed-precision gemm with quantize and reorder kernel.
cuda inference-acceleration llm mlsys quantization
Last synced: 15 Jun 2025
https://github.com/rraghavkaushik/nlp-learning-resources
List of latest papers and blogs for NLP
llm-papers llms mechanistic-interpretability mlsys natural-language-processing nlp-learning-resources nlp-papers reinforcement-learning rlhf scaling-laws transformers
Last synced: 08 Oct 2025
https://github.com/iamncj/yuangpt
GPT-like Large Language Model Pretrained on Inspur's Yuan Dataset
gpt gpt-2 large-language-models llm mlsys pytorch
Last synced: 22 Aug 2025
https://github.com/shreyansh26/accelerating-cross-encoder-inference
Leveraging torch.compile to accelerate cross-encoder inference
cross-encoder inference-optimization jina mlsys torch-compile
Last synced: 11 Mar 2025