README.md 7.18 KB
Newer Older
Lianmin Zheng's avatar
Lianmin Zheng committed
1
<div align="center" id="sglangtop">
Kushal Agrawal's avatar
Kushal Agrawal committed
2
<img src="https://raw.githubusercontent.com/sgl-project/sglang/main/assets/logo.png" alt="logo" width="400" margin="10px"></img>
Lianmin Zheng's avatar
Lianmin Zheng committed
3

Yineng Zhang's avatar
Yineng Zhang committed
4
5
6
7
8
[![PyPI](https://img.shields.io/pypi/v/sglang)](https://pypi.org/project/sglang)
![PyPI - Downloads](https://img.shields.io/pypi/dm/sglang)
[![license](https://img.shields.io/github/license/sgl-project/sglang.svg)](https://github.com/sgl-project/sglang/tree/main/LICENSE)
[![issue resolution](https://img.shields.io/github/issues-closed-raw/sgl-project/sglang)](https://github.com/sgl-project/sglang/issues)
[![open issues](https://img.shields.io/github/issues-raw/sgl-project/sglang)](https://github.com/sgl-project/sglang/issues)
9
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/sgl-project/sglang)
Yineng Zhang's avatar
Yineng Zhang committed
10

Yineng Zhang's avatar
Yineng Zhang committed
11
12
</div>

Lianmin Zheng's avatar
Lianmin Zheng committed
13
14
--------------------------------------------------------------------------------

Lianmin Zheng's avatar
Lianmin Zheng committed
15
| [**Blog**](https://lmsys.org/blog/2025-05-05-large-scale-ep/)
Yineng Zhang's avatar
Yineng Zhang committed
16
17
18
| [**Documentation**](https://docs.sglang.ai/)
| [**Join Slack**](https://slack.sglang.ai/)
| [**Join Bi-Weekly Development Meeting**](https://meeting.sglang.ai/)
Lianmin Zheng's avatar
Lianmin Zheng committed
19
| [**Roadmap**](https://github.com/sgl-project/sglang/issues/4042)
Lianmin Zheng's avatar
Lianmin Zheng committed
20
| [**Slides**](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#slides) |
Lianmin Zheng's avatar
Lianmin Zheng committed
21

Lianmin Zheng's avatar
Lianmin Zheng committed
22
## News
23
24
- [2025/06] 🔥 SGLang, the high-performance serving infrastructure powering trillions of tokens daily, has been awarded the third batch of the Open Source AI Grant by a16z ([a16z blog](https://a16z.com/advancing-open-source-ai-through-benchmarks-and-bold-experimentation/)).
- [2025/06] 🔥 Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part I): 2.7x Higher Decoding Throughput ([blog](https://lmsys.org/blog/2025-06-16-gb200-part-1/)).
Yineng Zhang's avatar
Yineng Zhang committed
25
- [2025/05] 🔥 Deploying DeepSeek with PD Disaggregation and Large-scale Expert Parallelism on 96 H100 GPUs ([blog](https://lmsys.org/blog/2025-05-05-large-scale-ep/)).
Yineng Zhang's avatar
Yineng Zhang committed
26
27
- [2025/03] Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X ([AMD blog](https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR1-Part2/README.html))
- [2025/03] SGLang Joins PyTorch Ecosystem: Efficient LLM Serving Engine ([PyTorch blog](https://pytorch.org/blog/sglang-joins-pytorch/))
Lianmin Zheng's avatar
Lianmin Zheng committed
28
- [2025/01] 🔥 SGLang provides day one support for DeepSeek V3/R1 models on NVIDIA and AMD GPUs with DeepSeek-specific optimizations. ([instructions](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3), [AMD blog](https://www.amd.com/en/developer/resources/technical-articles/amd-instinct-gpus-power-deepseek-v3-revolutionizing-ai-development-with-sglang.html), [10+ other companies](https://x.com/lmsysorg/status/1887262321636221412))
29
30
- [2024/12] 🔥 v0.4 Release: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs ([blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/)).
- [2024/07] v0.2 Release: Faster Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) ([blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/)).
Ying Sheng's avatar
Ying Sheng committed
31

Ying Sheng's avatar
Ying Sheng committed
32
33
34
<details>
<summary>More</summary>

Lianmin Zheng's avatar
Lianmin Zheng committed
35
- [2025/02] Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU ([AMD blog](https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR1_Perf/README.html))
36
- [2024/10] The First SGLang Online Meetup ([slides](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#the-first-sglang-online-meetup)).
Lianmin Zheng's avatar
Lianmin Zheng committed
37
- [2024/09] v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision ([blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/)).
38
- [2024/02] SGLang enables **3x faster JSON decoding** with compressed finite state machine ([blog](https://lmsys.org/blog/2024-02-05-compressed-fsm/)).
Ying Sheng's avatar
Ying Sheng committed
39
- [2024/01] SGLang provides up to **5x faster inference** with RadixAttention ([blog](https://lmsys.org/blog/2024-01-17-sglang/)).
Ying Sheng's avatar
Ying Sheng committed
40
41
42
43
- [2024/01] SGLang powers the serving of the official **LLaVA v1.6** release demo ([usage](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#demo)).

</details>

Ying Sheng's avatar
Ying Sheng committed
44
45
46
47
48
## About
SGLang is a fast serving framework for large language models and vision language models.
It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.
The core features include:

Lianmin Zheng's avatar
Lianmin Zheng committed
49
- **Fast Backend Runtime**: Provides efficient serving with RadixAttention for prefix caching, zero-overhead CPU scheduler, prefill-decode disaggregation, speculative decoding, continuous batching, paged attention, tensor parallelism, pipeline parallelism, expert parallelism, structured outputs, chunked prefill, quantization (FP8/INT4/AWQ/GPTQ), and multi-lora batching.
Ying Sheng's avatar
Ying Sheng committed
50
- **Flexible Frontend Language**: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
江家瑋's avatar
江家瑋 committed
51
- **Extensive Model Support**: Supports a wide range of generative models (Llama, Gemma, Mistral, Qwen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.
Ying Sheng's avatar
Ying Sheng committed
52
53
- **Active Community**: SGLang is open-source and backed by an active community with industry adoption.

Chayenne's avatar
Chayenne committed
54
## Getting Started
Yineng Zhang's avatar
Yineng Zhang committed
55
- [Install SGLang](https://docs.sglang.ai/start/install.html)
Trayvon Pan's avatar
Trayvon Pan committed
56
- [Quick Start](https://docs.sglang.ai/backend/send_request.html)
Yineng Zhang's avatar
Yineng Zhang committed
57
58
59
- [Backend Tutorial](https://docs.sglang.ai/backend/openai_api_completions.html)
- [Frontend Tutorial](https://docs.sglang.ai/frontend/frontend.html)
- [Contribution Guide](https://docs.sglang.ai/references/contribution_guide.html)
Lianmin Zheng's avatar
Lianmin Zheng committed
60

61
## Benchmark and Performance
Lianmin Zheng's avatar
Lianmin Zheng committed
62
Learn more in the release blogs: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/), [v0.4 blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/).
Lianmin Zheng's avatar
Lianmin Zheng committed
63

Lianmin Zheng's avatar
Lianmin Zheng committed
64
## Roadmap
Yineng Zhang's avatar
Yineng Zhang committed
65
[Development Roadmap (2025 H1)](https://github.com/sgl-project/sglang/issues/4042)
Lianmin Zheng's avatar
Lianmin Zheng committed
66

Lianmin Zheng's avatar
Lianmin Zheng committed
67
## Adoption and Sponsorship
Lianmin Zheng's avatar
Lianmin Zheng committed
68
SGLang has been deployed at large scale, generating trillions of tokens in production every day. It is trusted and adopted by a broad range of leading enterprises and institutions, including xAI, NVIDIA, AMD, Google Cloud, Oracle Cloud, LinkedIn, Cursor, Voltage Park, Atlas Cloud, DataCrunch, Baseten, Nebius, Novita, InnoMatrix, RunPod, Stanford, UC Berkeley, UCLA, ETCHED, Jam & Tea Studios, Hyperbolic, as well as major technology organizations across North America and Asia. As an open-source LLM inference engine, SGLang has become the de facto standard in the industry, with production deployments running on over 100,000 GPUs worldwide.
Lianmin Zheng's avatar
Lianmin Zheng committed
69

Yineng Zhang's avatar
Yineng Zhang committed
70
<img src="https://raw.githubusercontent.com/sgl-project/sgl-learning-materials/refs/heads/main/slides/adoption.png" alt="logo" width="800" margin="10px"></img>
Lianmin Zheng's avatar
Lianmin Zheng committed
71

72
73
## Contact Us

Yineng Zhang's avatar
Yineng Zhang committed
74
For enterprises interested in adopting or deploying SGLang at scale, including technical consulting, sponsorship opportunities, or partnership inquiries, please contact us at contact@sglang.ai.
75

76
77
## Acknowledgment
We learned the design and reused code from the following projects: [Guidance](https://github.com/guidance-ai/guidance), [vLLM](https://github.com/vllm-project/vllm), [LightLLM](https://github.com/ModelTC/lightllm), [FlashInfer](https://github.com/flashinfer-ai/flashinfer), [Outlines](https://github.com/outlines-dev/outlines), and [LMQL](https://github.com/eth-sri/lmql).