README.md 8.13 KB
Newer Older
Lianmin Zheng's avatar
Lianmin Zheng committed
1
<div align="center" id="sglangtop">
Kushal Agrawal's avatar
Kushal Agrawal committed
2
<img src="https://raw.githubusercontent.com/sgl-project/sglang/main/assets/logo.png" alt="logo" width="400" margin="10px"></img>
Lianmin Zheng's avatar
Lianmin Zheng committed
3

Yineng Zhang's avatar
Yineng Zhang committed
4
[![PyPI](https://img.shields.io/pypi/v/sglang)](https://pypi.org/project/sglang)
Yineng Zhang's avatar
Yineng Zhang committed
5
![PyPI - Downloads](https://static.pepy.tech/badge/sglang?period=month)
Yineng Zhang's avatar
Yineng Zhang committed
6
7
8
[![license](https://img.shields.io/github/license/sgl-project/sglang.svg)](https://github.com/sgl-project/sglang/tree/main/LICENSE)
[![issue resolution](https://img.shields.io/github/issues-closed-raw/sgl-project/sglang)](https://github.com/sgl-project/sglang/issues)
[![open issues](https://img.shields.io/github/issues-raw/sgl-project/sglang)](https://github.com/sgl-project/sglang/issues)
9
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/sgl-project/sglang)
Yineng Zhang's avatar
Yineng Zhang committed
10

Yineng Zhang's avatar
Yineng Zhang committed
11
12
</div>

Lianmin Zheng's avatar
Lianmin Zheng committed
13
14
--------------------------------------------------------------------------------

Lianmin Zheng's avatar
Lianmin Zheng committed
15
| [**Blog**](https://lmsys.org/blog/2025-05-05-large-scale-ep/)
Yineng Zhang's avatar
Yineng Zhang committed
16
17
18
| [**Documentation**](https://docs.sglang.ai/)
| [**Join Slack**](https://slack.sglang.ai/)
| [**Join Bi-Weekly Development Meeting**](https://meeting.sglang.ai/)
19
| [**Roadmap**](https://github.com/sgl-project/sglang/issues/7736)
Lianmin Zheng's avatar
Lianmin Zheng committed
20
| [**Slides**](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#slides) |
Lianmin Zheng's avatar
Lianmin Zheng committed
21

Lianmin Zheng's avatar
Lianmin Zheng committed
22
## News
23
- [2025/08] 🔔 SGLang x AMD SF Meetup on 8/22: Hands-on GPU workshop, tech talks by AMD/xAI/SGLang, and networking ([Roadmap](https://github.com/sgl-project/sgl-learning-materials/blob/main/slides/amd_meetup_sglang_roadmap.pdf), [Large-scale EP](https://github.com/sgl-project/sgl-learning-materials/blob/main/slides/amd_meetup_sglang_ep.pdf), [Highlights](https://github.com/sgl-project/sgl-learning-materials/blob/main/slides/amd_meetup_highlights.pdf), [AITER/MoRI](https://github.com/sgl-project/sgl-learning-materials/blob/main/slides/amd_meetup_aiter_mori.pdf), [Wave](https://github.com/sgl-project/sgl-learning-materials/blob/main/slides/amd_meetup_wave.pdf)).
24
- [2025/08] 🔥 SGLang provides day-0 support for OpenAI gpt-oss model ([instructions](https://github.com/sgl-project/sglang/issues/8833))
25
26
- [2025/06] 🔥 SGLang, the high-performance serving infrastructure powering trillions of tokens daily, has been awarded the third batch of the Open Source AI Grant by a16z ([a16z blog](https://a16z.com/advancing-open-source-ai-through-benchmarks-and-bold-experimentation/)).
- [2025/06] 🔥 Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part I): 2.7x Higher Decoding Throughput ([blog](https://lmsys.org/blog/2025-06-16-gb200-part-1/)).
Yineng Zhang's avatar
Yineng Zhang committed
27
- [2025/05] 🔥 Deploying DeepSeek with PD Disaggregation and Large-scale Expert Parallelism on 96 H100 GPUs ([blog](https://lmsys.org/blog/2025-05-05-large-scale-ep/)).
Yineng Zhang's avatar
Yineng Zhang committed
28
29
- [2025/03] Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X ([AMD blog](https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR1-Part2/README.html))
- [2025/03] SGLang Joins PyTorch Ecosystem: Efficient LLM Serving Engine ([PyTorch blog](https://pytorch.org/blog/sglang-joins-pytorch/))
Lianmin Zheng's avatar
Lianmin Zheng committed
30
- [2024/12] v0.4 Release: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs ([blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/)).
Ying Sheng's avatar
Ying Sheng committed
31

Ying Sheng's avatar
Ying Sheng committed
32
33
34
<details>
<summary>More</summary>

Lianmin Zheng's avatar
Lianmin Zheng committed
35
- [2025/02] Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU ([AMD blog](https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR1_Perf/README.html))
Lianmin Zheng's avatar
Lianmin Zheng committed
36
- [2025/01] SGLang provides day one support for DeepSeek V3/R1 models on NVIDIA and AMD GPUs with DeepSeek-specific optimizations. ([instructions](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3), [AMD blog](https://www.amd.com/en/developer/resources/technical-articles/amd-instinct-gpus-power-deepseek-v3-revolutionizing-ai-development-with-sglang.html), [10+ other companies](https://x.com/lmsysorg/status/1887262321636221412))
37
- [2024/10] The First SGLang Online Meetup ([slides](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#the-first-sglang-online-meetup)).
Lianmin Zheng's avatar
Lianmin Zheng committed
38
- [2024/09] v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision ([blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/)).
39
- [2024/07] v0.2 Release: Faster Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) ([blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/)).
40
- [2024/02] SGLang enables **3x faster JSON decoding** with compressed finite state machine ([blog](https://lmsys.org/blog/2024-02-05-compressed-fsm/)).
Ying Sheng's avatar
Ying Sheng committed
41
- [2024/01] SGLang provides up to **5x faster inference** with RadixAttention ([blog](https://lmsys.org/blog/2024-01-17-sglang/)).
Ying Sheng's avatar
Ying Sheng committed
42
43
44
45
- [2024/01] SGLang powers the serving of the official **LLaVA v1.6** release demo ([usage](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#demo)).

</details>

Ying Sheng's avatar
Ying Sheng committed
46
47
48
49
50
## About
SGLang is a fast serving framework for large language models and vision language models.
It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.
The core features include:

51
- **Fast Backend Runtime**: Provides efficient serving with RadixAttention for prefix caching, zero-overhead CPU scheduler, prefill-decode disaggregation, speculative decoding, continuous batching, paged attention, tensor/pipeline/expert/data parallelism, structured outputs, chunked prefill, quantization (FP4/FP8/INT4/AWQ/GPTQ), and multi-lora batching.
Ying Sheng's avatar
Ying Sheng committed
52
- **Flexible Frontend Language**: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
53
54
- **Extensive Model Support**: Supports a wide range of generative models (Llama, Qwen, DeepSeek, Kimi, GPT, Gemma, Mistral, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.
- **Active Community**: SGLang is open-source and backed by an active community with wide industry adoption.
Ying Sheng's avatar
Ying Sheng committed
55

Chayenne's avatar
Chayenne committed
56
## Getting Started
57
58
59
60
61
- [Install SGLang](https://docs.sglang.ai/get_started/install.html)
- [Quick Start](https://docs.sglang.ai/basic_usage/send_request.html)
- [Backend Tutorial](https://docs.sglang.ai/basic_usage/openai_api_completions.html)
- [Frontend Tutorial](https://docs.sglang.ai/references/frontend/frontend_tutorial.html)
- [Contribution Guide](https://docs.sglang.ai/developer_guide/contribution_guide.html)
Lianmin Zheng's avatar
Lianmin Zheng committed
62

63
## Benchmark and Performance
Lianmin Zheng's avatar
Lianmin Zheng committed
64
Learn more in the release blogs: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/), [v0.4 blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/), [Large-scale expert parallelism](https://lmsys.org/blog/2025-05-05-large-scale-ep/).
Lianmin Zheng's avatar
Lianmin Zheng committed
65

Lianmin Zheng's avatar
Lianmin Zheng committed
66
## Roadmap
67
[Development Roadmap (2025 H2)](https://github.com/sgl-project/sglang/issues/7736)
Lianmin Zheng's avatar
Lianmin Zheng committed
68

Lianmin Zheng's avatar
Lianmin Zheng committed
69
## Adoption and Sponsorship
Yineng Zhang's avatar
Yineng Zhang committed
70
SGLang has been deployed at large scale, generating trillions of tokens in production each day. It is trusted and adopted by a wide range of leading enterprises and institutions, including xAI, AMD, NVIDIA, Intel, LinkedIn, Cursor, Oracle Cloud, Google Cloud, Microsoft Azure, AWS, Atlas Cloud, Voltage Park, Nebius, DataCrunch, Novita, InnoMatrix, MIT, UCLA, the University of Washington, Stanford, UC Berkeley, Tsinghua University, Jam & Tea Studios, Baseten, and other major technology organizations across North America and Asia. As an open-source LLM inference engine, SGLang has become the de facto industry standard, with deployments running on over 1,000,000 GPUs worldwide.
Lianmin Zheng's avatar
Lianmin Zheng committed
71

Yineng Zhang's avatar
Yineng Zhang committed
72
<img src="https://raw.githubusercontent.com/sgl-project/sgl-learning-materials/refs/heads/main/slides/adoption.png" alt="logo" width="800" margin="10px"></img>
Lianmin Zheng's avatar
Lianmin Zheng committed
73

74
## Contact Us
Yineng Zhang's avatar
Yineng Zhang committed
75
For enterprises interested in adopting or deploying SGLang at scale, including technical consulting, sponsorship opportunities, or partnership inquiries, please contact us at contact@sglang.ai.
76

77
78
## Acknowledgment
We learned the design and reused code from the following projects: [Guidance](https://github.com/guidance-ai/guidance), [vLLM](https://github.com/vllm-project/vllm), [LightLLM](https://github.com/ModelTC/lightllm), [FlashInfer](https://github.com/flashinfer-ai/flashinfer), [Outlines](https://github.com/outlines-dev/outlines), and [LMQL](https://github.com/eth-sri/lmql).