Projects in Awesome Lists tagged with mlsys

https://github.com/Infrasys-AI/AISystem

AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

ai aiinfra aisys dlsys mlsys

Last synced: 23 Jun 2025

https://github.com/chenzomi12/aisystem

AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

ai aiinfra aisys dlsys mlsys

Last synced: 14 May 2025

https://github.com/chenzomi12/AISystem

AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

ai aiinfra aisys dlsys mlsys

Last synced: 20 Mar 2025

https://github.com/inclusionai/areal

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

agent llm llm-agent llm-reasoning machine-learning-systems mlsys reinforcement-learning rl

Last synced: 04 Mar 2026

https://github.com/nunchaku-ai/nunchaku

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

comfyui diffusion-models flux genai iclr iclr2025 lora mlsys quantization

Last synced: 06 Mar 2026

https://github.com/HuaizhengZhang/Awesome-System-for-Machine-Learning

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

ai-infra genai large-language-models llmsys mlsys model-serving model-training

Last synced: 09 Apr 2025

https://github.com/nunchaku-tech/ComfyUI-nunchaku

ComfyUI Plugin of Nunchaku

comfyui diffusion flux genai mlsys quantization

Last synced: 02 Sep 2025

https://github.com/mit-han-lab/nunchaku

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

comfyui diffusion-models flux genai iclr iclr2025 lora mlsys quantization

Last synced: 13 May 2025

https://github.com/thu-ml/sageattention

Quantized Attention achieves speedup of 2-3x and 3-5x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

attention cuda efficient-attention inference-acceleration llm llm-infra mlsys quantization triton video-generate video-generation vit

Last synced: 14 May 2025

https://github.com/SymbioticLab/FedScale

FedScale is a scalable and extensible open-source federated learning (FL) platform.

benchmark dataset deep-learning deployment distributed federated-learning icml machine-learning mlsys osdi pytorch tensorflow

Last synced: 02 May 2025

https://github.com/bytedance/byteir

A model compilation solution for various hardware

llm llvm mlir mlsys onnx pytorch tensorflow

Last synced: 05 Apr 2025

https://github.com/sbu-fsl/kernel-ml

Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

auto-tuning kernel-module linux-kernel machine-learning mlsys operating-systems

Last synced: 07 Mar 2026

https://github.com/ml-energy/zeus

Deep Learning Energy Measurement and Optimization

deep-learning energy mlsys

Last synced: 31 Mar 2025

https://github.com/bytedance/abq-llm

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

cuda llm-inference mlsys quantized-networks research

Last synced: 04 Apr 2025

https://github.com/MLSysOps/alaas

A scalable & efficient active learning/data selection system for everyone.

active-learning automl deep-learning machine-learning mlops mlsys pytorch

Last synced: 02 May 2025

https://github.com/huaizhengzhang/active-learning-as-a-service

A scalable & efficient active learning/data selection system for everyone.

active-learning automl deep-learning machine-learning mlops mlsys pytorch

Last synced: 09 Apr 2025

https://github.com/HuaizhengZhang/Active-Learning-as-a-Service

A scalable & efficient active learning/data selection system for everyone.

active-learning automl deep-learning machine-learning mlops mlsys pytorch

Last synced: 08 May 2025

https://github.com/xlite-dev/ffpa-attn

📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.

attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores

Last synced: 11 Jun 2025

https://github.com/deftruth/ffpa-attn-mma

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) SRAM complexity large headdim (D > 256), ~2x↑🎉vs SDPA EA.

attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores

Last synced: 06 Apr 2025

https://github.com/xlite-dev/ffpa-attn-mma

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores

Last synced: 30 Mar 2025

https://github.com/xiyanghu/OSDT

Optimal Sparse Decision Trees

accelerate acceleration-model algorithm algorithm-optimization data-mining data-science interpretable-ml machine-learning ml-system mlsys neurips python python3

Last synced: 27 Mar 2025

https://github.com/guanhuawang/sensai

sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data

cifar-10 cifar-100 cifar10 cifar100 cnn-classification deep-learning deep-neural-networks distributed-deep-learning distributed-machine-learning distributed-systems imagenet imagenet1k machine-learning mlsys mobilenet-v2 resnet shufflenet-v2 sysml vgg

Last synced: 15 Apr 2025

https://github.com/tanyuqian/redco

NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference

differential-privacy diffusion-models distributed-training fedavg federated-learning flan-t5-xxl gemma image-captioning jax large-language-models llama maml meta-learning mixed-precision mlsys model-parallelism ppo reinforcement-learning seq2seq stable-diffusion

Last synced: 06 Apr 2025

https://github.com/DefTruth/ffpa-attn-mma

📚[WIP] FFPA: Yet antother Faster Flash Prefill Attention with O(1)⚡️GPU SRAM complexity for headdim > 256, 1.8x~3x↑🎉faster vs SDPA EA.

attention cuda flash-attention mlsys sdpa tensor-cores

Last synced: 09 Oct 2025

https://github.com/actypedef/mixedgemm

a mixed-precision gemm with quantize and reorder kernel.

cuda inference-acceleration llm mlsys quantization

Last synced: 15 Jun 2025

https://github.com/rraghavkaushik/nlp-learning-resources

List of latest papers and blogs for NLP

llm-papers llms mechanistic-interpretability mlsys natural-language-processing nlp-learning-resources nlp-papers reinforcement-learning rlhf scaling-laws transformers

Last synced: 08 Oct 2025

https://github.com/iamncj/yuangpt

GPT-like Large Language Model Pretrained on Inspur's Yuan Dataset

gpt gpt-2 large-language-models llm mlsys pytorch

Last synced: 22 Aug 2025

https://github.com/shreyansh26/accelerating-cross-encoder-inference

Leveraging torch.compile to accelerate cross-encoder inference

cross-encoder inference-optimization jina mlsys torch-compile

Last synced: 11 Mar 2025

https://github.com/jason-cs18/mlsys_code

code samples for MLSys blogs

mlsys

Last synced: 25 Jan 2026

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome