An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with mlsys

A curated list of projects in awesome lists tagged with mlsys .

https://github.com/Infrasys-AI/AISystem

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

ai aiinfra aisys dlsys mlsys

Last synced: 23 Jun 2025

https://github.com/chenzomi12/aisystem

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

ai aiinfra aisys dlsys mlsys

Last synced: 14 May 2025

https://github.com/chenzomi12/AISystem

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

ai aiinfra aisys dlsys mlsys

Last synced: 20 Mar 2025

https://github.com/nunchaku-ai/nunchaku

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

comfyui diffusion-models flux genai iclr iclr2025 lora mlsys quantization

Last synced: 13 Feb 2026

https://github.com/HuaizhengZhang/Awesome-System-for-Machine-Learning

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

ai-infra genai large-language-models llmsys mlsys model-serving model-training

Last synced: 09 Apr 2025

https://github.com/mit-han-lab/nunchaku

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

comfyui diffusion-models flux genai iclr iclr2025 lora mlsys quantization

Last synced: 13 May 2025

https://github.com/thu-ml/sageattention

Quantized Attention achieves speedup of 2-3x and 3-5x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

attention cuda efficient-attention inference-acceleration llm llm-infra mlsys quantization triton video-generate video-generation vit

Last synced: 14 May 2025

https://github.com/SymbioticLab/FedScale

FedScale is a scalable and extensible open-source federated learning (FL) platform.

benchmark dataset deep-learning deployment distributed federated-learning icml machine-learning mlsys osdi pytorch tensorflow

Last synced: 02 May 2025

https://github.com/bytedance/byteir

A model compilation solution for various hardware

llm llvm mlir mlsys onnx pytorch tensorflow

Last synced: 05 Apr 2025

https://github.com/sbu-fsl/kernel-ml

Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

auto-tuning kernel-module linux-kernel machine-learning mlsys operating-systems

Last synced: 16 Jul 2025

https://github.com/ml-energy/zeus

Deep Learning Energy Measurement and Optimization

deep-learning energy mlsys

Last synced: 31 Mar 2025

https://github.com/bytedance/abq-llm

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

cuda llm-inference mlsys quantized-networks research

Last synced: 04 Apr 2025

https://github.com/MLSysOps/alaas

A scalable & efficient active learning/data selection system for everyone.

active-learning automl deep-learning machine-learning mlops mlsys pytorch

Last synced: 02 May 2025

https://github.com/huaizhengzhang/active-learning-as-a-service

A scalable & efficient active learning/data selection system for everyone.

active-learning automl deep-learning machine-learning mlops mlsys pytorch

Last synced: 09 Apr 2025

https://github.com/HuaizhengZhang/Active-Learning-as-a-Service

A scalable & efficient active learning/data selection system for everyone.

active-learning automl deep-learning machine-learning mlops mlsys pytorch

Last synced: 08 May 2025

https://github.com/xlite-dev/ffpa-attn

📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.

attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores

Last synced: 11 Jun 2025

https://github.com/deftruth/ffpa-attn-mma

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) SRAM complexity large headdim (D > 256), ~2x↑🎉vs SDPA EA.

attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores

Last synced: 06 Apr 2025

https://github.com/xlite-dev/ffpa-attn-mma

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores

Last synced: 30 Mar 2025

https://github.com/DefTruth/ffpa-attn-mma

📚[WIP] FFPA: Yet antother Faster Flash Prefill Attention with O(1)⚡️GPU SRAM complexity for headdim > 256, 1.8x~3x↑🎉faster vs SDPA EA.

attention cuda flash-attention mlsys sdpa tensor-cores

Last synced: 09 Oct 2025

https://github.com/actypedef/mixedgemm

a mixed-precision gemm with quantize and reorder kernel.

cuda inference-acceleration llm mlsys quantization

Last synced: 15 Jun 2025

https://github.com/iamncj/yuangpt

GPT-like Large Language Model Pretrained on Inspur's Yuan Dataset

gpt gpt-2 large-language-models llm mlsys pytorch

Last synced: 22 Aug 2025

https://github.com/jason-cs18/mlsys_code

code samples for MLSys blogs

mlsys

Last synced: 25 Jan 2026

https://github.com/shreyansh26/accelerating-cross-encoder-inference

Leveraging torch.compile to accelerate cross-encoder inference

cross-encoder inference-optimization jina mlsys torch-compile

Last synced: 11 Mar 2025