README.md 4.72 KB
Newer Older
1
<!-- markdownlint-disable MD001 MD041 -->
Zhuohan Li's avatar
Zhuohan Li committed
2
3
<p align="center">
  <picture>
4
5
    <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/vllm-project/vllm/main/docs/assets/logos/vllm-logo-text-dark.png">
    <img alt="vLLM" src="https://raw.githubusercontent.com/vllm-project/vllm/main/docs/assets/logos/vllm-logo-text-light.png" width=55%>
Zhuohan Li's avatar
Zhuohan Li committed
6
7
  </picture>
</p>
Woosuk Kwon's avatar
Woosuk Kwon committed
8

Zhuohan Li's avatar
Zhuohan Li committed
9
10
11
<h3 align="center">
Easy, fast, and cheap LLM serving for everyone
</h3>
Woosuk Kwon's avatar
Woosuk Kwon committed
12

Zhuohan Li's avatar
Zhuohan Li committed
13
<p align="center">
14
| <a href="https://docs.vllm.ai"><b>Documentation</b></a> | <a href="https://blog.vllm.ai/"><b>Blog</b></a> | <a href="https://arxiv.org/abs/2309.06180"><b>Paper</b></a> | <a href="https://x.com/vllm_project"><b>Twitter/X</b></a> | <a href="https://discuss.vllm.ai"><b>User Forum</b></a> | <a href="https://slack.vllm.ai"><b>Developer Slack</b></a> |
Zhuohan Li's avatar
Zhuohan Li committed
15
</p>
Woosuk Kwon's avatar
Woosuk Kwon committed
16

17
18
🔥 We have built a vllm website to help you get started with vllm. Please visit [vllm.ai](https://vllm.ai) to learn more.
For events, please visit [vllm.ai/events](https://vllm.ai/events) to join us.
19
20

---
21

22
## About
23

chenzk's avatar
chenzk committed
24
vLLM is a fast and easy-to-use library for LLM inference and serving.
25

chenzk's avatar
chenzk committed
26
Originally developed in the [Sky Computing Lab](https://sky.cs.berkeley.edu) at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.
27

chenzk's avatar
chenzk committed
28
vLLM is fast with:
29

chenzk's avatar
chenzk committed
30
31
32
33
34
35
36
37
- State-of-the-art serving throughput
- Efficient management of attention key and value memory with [**PagedAttention**](https://blog.vllm.ai/2023/06/20/vllm.html)
- Continuous batching of incoming requests
- Fast model execution with CUDA/HIP graph
- Quantizations: [GPTQ](https://arxiv.org/abs/2210.17323), [AWQ](https://arxiv.org/abs/2306.00978), [AutoRound](https://arxiv.org/abs/2309.05516), INT4, INT8, and FP8
- Optimized CUDA kernels, including integration with FlashAttention and FlashInfer
- Speculative decoding
- Chunked prefill
38

chenzk's avatar
chenzk committed
39
vLLM is flexible and easy to use with:
40

chenzk's avatar
chenzk committed
41
42
43
44
45
46
47
48
- Seamless integration with popular Hugging Face models
- High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more
- Tensor, pipeline, data and expert parallelism support for distributed inference
- Streaming outputs
- OpenAI-compatible API server
- Support for NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs, Arm CPUs, and TPU. Additionally, support for diverse hardware plugins such as Intel Gaudi, IBM Spyre and Huawei Ascend.
- Prefix caching support
- Multi-LoRA support
49

chenzk's avatar
chenzk committed
50
vLLM seamlessly supports most popular open-source models on HuggingFace, including:
Zhuohan Li's avatar
Zhuohan Li committed
51

chenzk's avatar
chenzk committed
52
53
54
55
- Transformer-like LLMs (e.g., Llama)
- Mixture-of-Expert LLMs (e.g., Mixtral, Deepseek-V2 and V3)
- Embedding Models (e.g., E5-Mistral)
- Multi-modal LLMs (e.g., LLaVA)
Zhuohan Li's avatar
Zhuohan Li committed
56

chenzk's avatar
chenzk committed
57
Find the full list of supported models [here](https://docs.vllm.ai/en/latest/models/supported_models.html).
chenzk's avatar
chenzk committed
58

chenzk's avatar
chenzk committed
59
## Getting Started
chenzk's avatar
chenzk committed
60

chenzk's avatar
chenzk committed
61
62
63
64
Install vLLM with `pip` or [from source](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/index.html#build-wheel-from-source):

```bash
pip install vllm
chenzk's avatar
chenzk committed
65
```
66

chenzk's avatar
chenzk committed
67
68
69
70
71
Visit our [documentation](https://docs.vllm.ai/en/latest/) to learn more.

- [Installation](https://docs.vllm.ai/en/latest/getting_started/installation.html)
- [Quickstart](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)
- [List of Supported Models](https://docs.vllm.ai/en/latest/models/supported_models.html)
Zhuohan Li's avatar
Zhuohan Li committed
72

73
## Contributing
74

75
We welcome and value any contributions and collaborations.
chenzk's avatar
chenzk committed
76
Please check out [Contributing to vLLM](https://docs.vllm.ai/en/latest/contributing/index.html) for how to get involved.
Woosuk Kwon's avatar
Woosuk Kwon committed
77
78
79
80

## Citation

If you use vLLM for your research, please cite our [paper](https://arxiv.org/abs/2309.06180):
81

Woosuk Kwon's avatar
Woosuk Kwon committed
82
83
```bibtex
@inproceedings{kwon2023efficient,
84
  title={Efficient Memory Management for Large Language Model Serving with PagedAttention},
Woosuk Kwon's avatar
Woosuk Kwon committed
85
86
87
88
89
  author={Woosuk Kwon and Zhuohan Li and Siyuan Zhuang and Ying Sheng and Lianmin Zheng and Cody Hao Yu and Joseph E. Gonzalez and Hao Zhang and Ion Stoica},
  booktitle={Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles},
  year={2023}
}
```
90
91
92

## Contact Us

93
<!-- --8<-- [start:contact-us] -->
94
- For technical questions and feature requests, please use GitHub [Issues](https://github.com/vllm-project/vllm/issues)
95
- For discussing with fellow users, please use the [vLLM Forum](https://discuss.vllm.ai)
96
- For coordinating contributions and development, please use [Slack](https://slack.vllm.ai)
97
- For security disclosures, please use GitHub's [Security Advisories](https://github.com/vllm-project/vllm/security/advisories) feature
98
- For collaborations and partnerships, please contact us at [collaboration@vllm.ai](mailto:collaboration@vllm.ai)
99
<!-- --8<-- [end:contact-us] -->
Simon Mo's avatar
Simon Mo committed
100
101
102

## Media Kit

103
- If you wish to use vLLM's logo, please refer to [our media kit repo](https://github.com/vllm-project/media-kit)