https://github.com/sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.
https://github.com/sgl-project/sglang

cuda deepseek deepseek-llm deepseek-r1 deepseek-r1-zero deepseek-v3 inference llama llama3 llama3-1 llava llm llm-serving moe pytorch transformer vlm

Last synced: 6 months ago
JSON representation

SGLang is a fast serving framework for large language models and vision language models.

Host: GitHub
URL: https://github.com/sgl-project/sglang
Owner: sgl-project
License: apache-2.0
Created: 2024-01-08T04:15:52.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-04-28T09:43:43.000Z (6 months ago)
Last Synced: 2025-04-28T10:15:36.113Z (6 months ago)
Topics: cuda, deepseek, deepseek-llm, deepseek-r1, deepseek-r1-zero, deepseek-v3, inference, llama, llama3, llama3-1, llava, llm, llm-serving, moe, pytorch, transformer, vlm
Language: Python
Homepage: https://docs.sglang.ai/
Size: 15.5 MB
Stars: 13,619
Watchers: 99
Forks: 1,605
Open Issues: 787
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
- Support: docs/supported_models/embedding_models.md

Awesome Lists containing this project

awesome-llm - SGLang - 高效的LLM和视觉语言模型推理框架。 (LLM部署 / LLM 评估工具)
awesome-llm-json - SGLang - 2.0) allows specifying JSON schemas using regular expressions or Pydantic models for constrained decoding. Its high-performance runtime accelerates JSON decoding. (Python Libraries)
awesome-ai-repositories - sglang - project/sglang><img src="https://img.shields.io/github/stars/sgl-project/sglang?style=social" width=100/></a> | (Structured Generation)
alan_awesome_llm - SGLang
jimsghstars - sgl-project/sglang - SGLang is a fast serving framework for large language models and vision language models. (Python)
StarryDivineSky - sgl-project/sglang - mistral），易于扩展以集成新模型。活跃的社区：SGLang 是开源的，并由一个活跃的社区提供支持，并得到行业采用。与 TensorRT LLM 和 vLLM 相比，SGLang Runtime 在在线和离线场景中始终如一地提供卓越或有竞争力的性能，使用 FP8 和 FP16 处理从 Llama-8B 到 Llama-405B 的模型，以及在 A100 和 H100 GPU 上。SGLang 的性能始终优于 vLLM，在 Llama-70B 上的通量提高了 3.1 倍。它也经常匹配或有时优于 TensorRT LLM 。更重要的是，SGLang 是完全开源的，用纯 Python 编写，核心调度器在不到 4K 行的代码中实现。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
awesome-llm-and-aigc - SGLang - project/sglang?style=social"/> : SGLang is a fast serving framework for large language models and vision language models. [docs.sglang.ai/](https://docs.sglang.ai/) (Summary)
Awesome-LLM - SGLang - SGLang is a fast serving framework for large language models and vision language models. (LLM Deployment)
awesome-cuda-and-hpc - SGLang - project/sglang?style=social"/> : SGLang is a fast serving framework for large language models and vision language models. [docs.sglang.ai/](https://docs.sglang.ai/) (Frameworks)
awesome-production-machine-learning - SGLang - project/sglang.svg?style=social) - SGLang is a fast serving framework for large language models and vision language models. (Deployment and Serving)
awesome-LLM-resources - SGLang (`🔥`)
awesome-local-ai - SGLang - 3-5x higher throughput than vLLM (Control flow, RadixAttention, KV cache reuse) | Safetensor / AWQ / GPTQ | GPU | ❌ | Python | Text-Gen | (Inference Engine)
awesome-private-ai - sglang - Fast, privacy-first LLM inference and programming language for building composable, local AI workflows. (Agents & Orchestration)
awesome-private-ai - sglang - Fast, privacy-first LLM inference and programming language for building composable, local AI workflows. (Agents & Orchestration)
Awesome-LLMOps - SGLang - project/sglang.svg?style=flat&color=green) ![Contributors](https://img.shields.io/github/contributors/sgl-project/sglang?color=green) ![LastCommit](https://img.shields.io/github/last-commit/sgl-project/sglang?color=green) (Inference / Inference Engine)
awesome-ChatGPT-repositories - sglang - SGLang is a fast serving framework for large language models and vision language models. (Langchain)
Awesome-LLMOps - SGLang - project/sglang.svg?style=flat&color=green) ![Contributors](https://img.shields.io/github/contributors/sgl-project/sglang?color=green) ![LastCommit](https://img.shields.io/github/last-commit/sgl-project/sglang?color=green) (Inference / Inference Engine)
awesome - sgl-project/sglang - SGLang is a fast serving framework for large language models and vision language models. (Python)
awesome-hacking-lists - sgl-project/sglang - SGLang is a fast serving framework for large language models and vision language models. (Python)
awesome-llm - SGLang - 高效的LLM和视觉语言模型推理框架。 (LLM部署 / LLM 评估工具)
alan_awesome_llm - SGLang
AiTreasureBox - sgl-project/sglang - 10-21_19062_0](https://img.shields.io/github/stars/sgl-project/sglang.svg)|SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with LLMs faster and more controllable.| (Repos)
awesome-production-genai - SGLang - project/sglang.svg?style=social) - SGLang is a fast serving framework for large language models and vision language models. (Model Deployment)
awesome - sgl-project/sglang - SGLang is a fast serving framework for large language models and vision language models. (Python)

README

          




[![PyPI](https://img.shields.io/pypi/v/sglang)](https://pypi.org/project/sglang)

![PyPI - Downloads](https://img.shields.io/pypi/dm/sglang)

[![license](https://img.shields.io/github/license/sgl-project/sglang.svg)](https://github.com/sgl-project/sglang/tree/main/LICENSE)

[![issue resolution](https://img.shields.io/github/issues-closed-raw/sgl-project/sglang)](https://github.com/sgl-project/sglang/issues)

[![open issues](https://img.shields.io/github/issues-raw/sgl-project/sglang)](https://github.com/sgl-project/sglang/issues)

[![](https://img.shields.io/badge/Gurubase-(experimental)-006BFF)](https://gurubase.io/g/sglang)



--------------------------------------------------------------------------------

| [**Blog**](https://lmsys.org/blog/2024-07-25-sglang-llama3/)

| [**Documentation**](https://docs.sglang.ai/)

| [**Join Slack**](https://slack.sglang.ai/)

| [**Join Bi-Weekly Development Meeting**](https://meeting.sglang.ai/)

| [**Roadmap**](https://github.com/sgl-project/sglang/issues/4042)

| [**Slides**](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#slides) |

## News

- [2025/05] 🔥 Deploying DeepSeek with PD Disaggregation and Large-scale Expert Parallelism on 96 H100 GPUs ([blog](https://lmsys.org/blog/2025-05-05-large-scale-ep/)).

- [2025/03] Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X ([AMD blog](https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR1-Part2/README.html))

- [2025/03] SGLang Joins PyTorch Ecosystem: Efficient LLM Serving Engine ([PyTorch blog](https://pytorch.org/blog/sglang-joins-pytorch/))

- [2025/01] 🔥 SGLang provides day one support for DeepSeek V3/R1 models on NVIDIA and AMD GPUs with DeepSeek-specific optimizations. ([instructions](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3), [AMD blog](https://www.amd.com/en/developer/resources/technical-articles/amd-instinct-gpus-power-deepseek-v3-revolutionizing-ai-development-with-sglang.html), [10+ other companies](https://x.com/lmsysorg/status/1887262321636221412))

- [2024/12] 🔥 v0.4 Release: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs ([blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/)).

- [2024/07] v0.2 Release: Faster Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) ([blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/)).

More

- [2025/02] Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU ([AMD blog](https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR1_Perf/README.html))

- [2024/10] The First SGLang Online Meetup ([slides](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#the-first-sglang-online-meetup)).

- [2024/09] v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision ([blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/)).

- [2024/02] SGLang enables **3x faster JSON decoding** with compressed finite state machine ([blog](https://lmsys.org/blog/2024-02-05-compressed-fsm/)).

- [2024/01] SGLang provides up to **5x faster inference** with RadixAttention ([blog](https://lmsys.org/blog/2024-01-17-sglang/)).

- [2024/01] SGLang powers the serving of the official **LLaVA v1.6** release demo ([usage](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#demo)).

## About

SGLang is a fast serving framework for large language models and vision language models.

It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.

The core features include:

- **Fast Backend Runtime**: Provides efficient serving with RadixAttention for prefix caching, zero-overhead CPU scheduler, continuous batching, token attention (paged attention), speculative decoding, tensor parallelism, chunked prefill, structured outputs, quantization (FP8/INT4/AWQ/GPTQ), and multi-lora batching.

- **Flexible Frontend Language**: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.

- **Extensive Model Support**: Supports a wide range of generative models (Llama, Gemma, Mistral, Qwen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.

- **Active Community**: SGLang is open-source and backed by an active community with industry adoption.

## Getting Started

- [Install SGLang](https://docs.sglang.ai/start/install.html)

- [Quick Start](https://docs.sglang.ai/backend/send_request.html)

- [Backend Tutorial](https://docs.sglang.ai/backend/openai_api_completions.html)

- [Frontend Tutorial](https://docs.sglang.ai/frontend/frontend.html)

- [Contribution Guide](https://docs.sglang.ai/references/contribution_guide.html)

## Benchmark and Performance

Learn more in the release blogs: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/), [v0.4 blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/)

## Roadmap

[Development Roadmap (2025 H1)](https://github.com/sgl-project/sglang/issues/4042)

## Adoption and Sponsorship

The project has been deployed to large-scale production, generating trillions of tokens every day.

It is supported by the following institutions: AMD, Atlas Cloud, Baseten, Cursor, DataCrunch, Etched, Google Cloud, Hyperbolic, Iflytek, InnoMatrix, Jam & Tea Studios, LinkedIn, LMSYS, Meituan, Nebius, Novita AI, NVIDIA, Oracle, RunPod, Stanford, UC Berkeley, UCLA, xAI, and 01.AI.



## Contact Us

For enterprises interested in adopting or deploying SGLang at scale, including technical consulting, sponsorship opportunities, or partnership inquiries, please contact us at contact@sglang.ai.

## Acknowledgment

We learned the design and reused code from the following projects: [Guidance](https://github.com/guidance-ai/guidance), [vLLM](https://github.com/vllm-project/vllm), [LightLLM](https://github.com/ModelTC/lightllm), [FlashInfer](https://github.com/flashinfer-ai/flashinfer), [Outlines](https://github.com/outlines-dev/outlines), and [LMQL](https://github.com/eth-sri/lmql).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sgl-project/sglang

Awesome Lists containing this project

README