Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
https://github.com/sgl-project/sglang
cuda deepseek deepseek-llm deepseek-v3 inference llama llama2 llama3 llama3-1 llava llm llm-serving moe pytorch transformer vlm
Last synced: 5 days ago
JSON representation
SGLang is a fast serving framework for large language models and vision language models.
- Host: GitHub
- URL: https://github.com/sgl-project/sglang
- Owner: sgl-project
- License: apache-2.0
- Created: 2024-01-08T04:15:52.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-02T09:22:27.000Z (9 days ago)
- Last Synced: 2025-01-02T09:45:37.222Z (9 days ago)
- Topics: cuda, deepseek, deepseek-llm, deepseek-v3, inference, llama, llama2, llama3, llama3-1, llava, llm, llm-serving, moe, pytorch, transformer, vlm
- Language: Python
- Homepage: https://sgl-project.github.io/
- Size: 8.38 MB
- Stars: 6,978
- Watchers: 62
- Forks: 642
- Open Issues: 186
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
- awesome-llm - SGLang - 高效的LLM和视觉语言模型推理框架。 (LLM部署 / LLM 评估工具)
- awesome-llm - SGLang - 高效的LLM和视觉语言模型推理框架。 (LLM部署 / LLM 评估工具)
- awesome-LLM-resourses - SGLang
- awesome-llm-json - SGLang - 2.0) allows specifying JSON schemas using regular expressions or Pydantic models for constrained decoding. Its high-performance runtime accelerates JSON decoding. (Python Libraries)
- jimsghstars - sgl-project/sglang - SGLang is a fast serving framework for large language models and vision language models. (Python)
- StarryDivineSky - sgl-project/sglang - mistral),易于扩展以集成新模型。活跃的社区:SGLang 是开源的,并由一个活跃的社区提供支持,并得到行业采用。与 TensorRT LLM 和 vLLM 相比,SGLang Runtime 在在线和离线场景中始终如一地提供卓越或有竞争力的性能,使用 FP8 和 FP16 处理从 Llama-8B 到 Llama-405B 的模型,以及在 A100 和 H100 GPU 上。SGLang 的性能始终优于 vLLM,在 Llama-70B 上的通量提高了 3.1 倍。它也经常匹配或有时优于 TensorRT LLM 。更重要的是,SGLang 是完全开源的,用纯 Python 编写,核心调度器在不到 4K 行的代码中实现。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
- Awesome-LLM - SGLang - SGLang is a fast serving framework for large language models and vision language models. (LLM Deployment)
- awesome-ai-repositories - sglang - project/sglang><img src="https://img.shields.io/github/stars/sgl-project/sglang?style=social" width=100/></a> | (Structured Generation)
- awesome-ai-repositories - sglang - project/sglang><img src="https://img.shields.io/github/stars/sgl-project/sglang?style=social" width=100/></a> | (Structured Generation)
- awesome-local-ai - SGLang - 3-5x higher throughput than vLLM (Control flow, RadixAttention, KV cache reuse) | Safetensor / AWQ / GPTQ | GPU | ❌ | Python | Text-Gen | (Inference Engine)
- alan_awesome_llm - SGLang
- alan_awesome_llm - SGLang
- AiTreasureBox - sgl-project/sglang - 01-01_6951_18](https://img.shields.io/github/stars/sgl-project/sglang.svg)|SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with LLMs faster and more controllable.| (Repos)
- Awesome-LLMOps - SGLang - project/sglang.svg) | ![Release](https://img.shields.io/github/release/sgl-project/sglang) | ![Contributors](https://img.shields.io/github/contributors/sgl-project/sglang) | SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable. | | (Inference)
- Awesome-LLMOps - SGLang - project/sglang.svg) | ![Release](https://img.shields.io/github/release/sgl-project/sglang) | ![Contributors](https://img.shields.io/github/contributors/sgl-project/sglang) | SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable. | | (Inference)
README
[![PyPI](https://img.shields.io/pypi/v/sglang)](https://pypi.org/project/sglang)
![PyPI - Downloads](https://img.shields.io/pypi/dm/sglang)
[![license](https://img.shields.io/github/license/sgl-project/sglang.svg)](https://github.com/sgl-project/sglang/tree/main/LICENSE)
[![issue resolution](https://img.shields.io/github/issues-closed-raw/sgl-project/sglang)](https://github.com/sgl-project/sglang/issues)
[![open issues](https://img.shields.io/github/issues-raw/sgl-project/sglang)](https://github.com/sgl-project/sglang/issues)
[![](https://img.shields.io/badge/Gurubase-(experimental)-006BFF)](https://gurubase.io/g/sglang)--------------------------------------------------------------------------------
| [**Blog**](https://lmsys.org/blog/2024-07-25-sglang-llama3/)
| [**Documentation**](https://sgl-project.github.io/)
| [**Join Slack**](https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2um0ad92q-LkU19KQTxCGzlCgRiOiQEw)
| [**Join Bi-Weekly Development Meeting**](https://docs.google.com/document/d/1xEow4eIM152xNcRxqZz9VEcOiTQo8-CEuuQ5qTmkt-E/edit?usp=sharing)
| [**Slides**](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#slides) |## News
- [2024/12] 🔥 SGLang v0.4: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs ([blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/)).
- [2024/10] 🔥 The First SGLang Online Meetup ([slides](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#the-first-sglang-online-meetup)).
- [2024/09] SGLang v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision ([blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/)).
- [2024/07] Faster Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) ([blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/)).More
- [2024/02] SGLang enables **3x faster JSON decoding** with compressed finite state machine ([blog](https://lmsys.org/blog/2024-02-05-compressed-fsm/)).
- [2024/04] SGLang is used by the official **LLaVA-NeXT (video)** release ([blog](https://llava-vl.github.io/blog/2024-04-30-llava-next-video/)).
- [2024/01] SGLang provides up to **5x faster inference** with RadixAttention ([blog](https://lmsys.org/blog/2024-01-17-sglang/)).
- [2024/01] SGLang powers the serving of the official **LLaVA v1.6** release demo ([usage](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#demo)).## About
SGLang is a fast serving framework for large language models and vision language models.
It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.
The core features include:- **Fast Backend Runtime**: Provides efficient serving with RadixAttention for prefix caching, jump-forward constrained decoding, overhead-free CPU scheduler, continuous batching, token attention (paged attention), tensor parallelism, FlashInfer kernels, chunked prefill, and quantization (FP8/INT4/AWQ/GPTQ).
- **Flexible Frontend Language**: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
- **Extensive Model Support**: Supports a wide range of generative models (Llama, Gemma, Mistral, QWen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.
- **Active Community**: SGLang is open-source and backed by an active community with industry adoption.## Getting Started
- [Install SGLang](https://sgl-project.github.io/start/install.html)
- [Quick Start](https://sgl-project.github.io/start/send_request.html)
- [Backend Tutorial](https://sgl-project.github.io/backend/openai_api_completions.html)
- [Frontend Tutorial](https://sgl-project.github.io/frontend/frontend.html)
- [Contribution Guide](https://sgl-project.github.io/references/contribution_guide.html)## Benchmark and Performance
Learn more in our release blogs: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/), [v0.4 blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/)## Roadmap
[Development Roadmap (2024 Q4)](https://github.com/sgl-project/sglang/issues/1487)## Adoption and Sponsorship
The project is supported by (alphabetically): AMD, Baseten, DataCrunch, Etched, Hyperbolic, Jam & Tea Studios, LinkedIn, LMSYS.org, Meituan, NVIDIA, RunPod, Stanford, UC Berkeley, UCLA, xAI, 01.AI.## Acknowledgment and Citation
We learned the design and reused code from the following projects: [Guidance](https://github.com/guidance-ai/guidance), [vLLM](https://github.com/vllm-project/vllm), [LightLLM](https://github.com/ModelTC/lightllm), [FlashInfer](https://github.com/flashinfer-ai/flashinfer), [Outlines](https://github.com/outlines-dev/outlines), and [LMQL](https://github.com/eth-sri/lmql).
Please cite the paper, [SGLang: Efficient Execution of Structured Language Model Programs](https://arxiv.org/abs/2312.07104), if you find the project useful.