Add logo and polish readme (#156)

a255885f · Zhuohan Li · GitHub · 5822ede6 · a255885f · a255885f
Unverified Commit a255885f authored Jun 19, 2023 by Zhuohan Li Committed by GitHub Jun 19, 2023
14 changed files
--- a/.gitignore
+++ b/.gitignore
-**/*.pyc
+# Byte-compiled / optimized / DLL files
-**/__pycache__/
+__pycache__/
-*.egg-info/
+*.py[cod]
-*.eggs/
+*$py.class
+# C extensions
 *.so
-*.log
-*.csv
+# Distribution / packaging
+.Python
 build/
-docs/build/
+develop-eggs/
 dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
-*.pkl
+# PyInstaller
-*.png
+#  Usually these files are written by a python script from a template
-**/log.txt
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+.pybuilder/
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# pytype static type analyzer
+.pytype/
+# Cython debug symbols
+cython_debug/
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+.idea/
+# VSCode
 .vscode/
+# DS Store
+.DS_Store
+# Results
+*.csv
+# Python pickle files
+*.pkl
--- a/README.md
+++ b/README.md
-# vLLM: Easy, Fast, and Cheap LLM Serving for Everyone
+<p align="center">
+  <picture>
+    <source media="(prefers-color-scheme: dark)" srcset="./docs/source/assets/logos/vllm-logo-text-dark.png">
+    <img alt="vLLM" src="./docs/source/assets/logos/vllm-logo-text-light.png" width=55%>
+  </picture>
+</p>
-| [**Documentation**](https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/) | [**Blog**]() |
+<h3 align="center">
+Easy, fast, and cheap LLM serving for everyone
+</h3>
-vLLM is a fast and easy-to-use library for LLM inference and serving.
+<p align="center">
+| <a href="https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/"><b>Documentation</b></a> | <a href=""><b>Blog</b></a> |
-## Latest News 🔥
+</p>
- [2023/06] We officially released vLLM! vLLM has powered [LMSYS Vicuna and Chatbot Arena](https://chat.lmsys.org) since mid April. Check out our [blog post]().
+---
-## Getting Started
+*Latest News* 🔥
-Visit our [documentation](https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/) to get started.
+- [2023/06] We officially released vLLM! vLLM has powered [LMSYS Vicuna and Chatbot Arena](https://chat.lmsys.org) since mid April. Check out our [blog post]().
- [Installation](https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/getting_started/installation.html): `pip install vllm`
- [Quickstart](https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/getting_started/quickstart.html)
+---
- [Supported Models](https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/models/supported_models.html)
-## Key Features
+vLLM is a fast and easy to use library for LLM inference and serving.
-vLLM comes with many powerful features that include:
+vLLM is fast with:
- State-of-the-art performance in serving throughput
+- State-of-the-art serving throughput
 - Efficient management of attention key and value memory with **PagedAttention**
- Seamless integration with popular HuggingFace models
 - Dynamic batching of incoming requests
 - Optimized CUDA kernels
- High-throughput serving with various decoding algorithms, including *parallel sampling* and *beam search*
+vLLM is flexible and easy to use with:
+- Seamless integration with popular HuggingFace models
+- High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more
 - Tensor parallelism support for distributed inference
 - Streaming outputs
 - OpenAI-compatible API server
+Install vLLM with pip or [from source](https://llm-serving-cacheflow.readthedocs-hosted.com/en/latest/getting_started/installation.html#build-from-source):
+```bash
+pip install vllm
+```
+## Getting Started
+Visit our [documentation](https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/) to get started.
+- [Installation](https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/getting_started/installation.html)
+- [Quickstart](https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/getting_started/quickstart.html)
+- [Supported Models](https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/models/supported_models.html)
 ## Performance
 vLLM outperforms HuggingFace Transformers (HF) by up to 24x and Text Generation Inference (TGI) by up to 3.5x, in terms of throughput.
 For details, check out our [blog post]().
 <p align="center">
-  <img src="./assets/figures/perf_a10g_n1.png" width="45%">
+  <picture>
-  <img src="./assets/figures/perf_a100_n1.png" width="45%">
+  <source media="(prefers-color-scheme: dark)" srcset="./docs/source/assets/figures/perf_a10g_n1_dark.png">
+  <img src="./docs/source/assets/figures/perf_a10g_n1_light.png" width="45%">
+  </picture>
+  <picture>
+  <source media="(prefers-color-scheme: dark)" srcset="./docs/source/assets/figures/perf_a100_n1_dark.png">
+  <img src="./docs/source/assets/figures/perf_a100_n1_light.png" width="45%">
+  </picture>
  <br>
  <em> Serving throughput when each request asks for 1 output completion. </em>
 </p>
 <p align="center">
-  <img src="./assets/figures/perf_a10g_n3.png" width="45%">
+  <picture>
-  <img src="./assets/figures/perf_a100_n3.png" width="45%">
+  <source media="(prefers-color-scheme: dark)" srcset="./docs/source/assets/figures/perf_a10g_n3_dark.png">
-  <br>
+  <img src="./docs/source/assets/figures/perf_a10g_n3_light.png" width="45%">
+  </picture>
+  <picture>
+  <source media="(prefers-color-scheme: dark)" srcset="./docs/source/assets/figures/perf_a100_n3_dark.png">
+  <img src="./docs/source/assets/figures/perf_a100_n3_light.png" width="45%">
+  </picture>  <br>
  <em> Serving throughput when each request asks for 3 output completions. </em>
 </p>

--- a/assets/figures/perf_a100_n1_dark.png
+++ b/assets/figures/perf_a100_n1_dark.png
--- a/assets/figures/perf_a100_n1_light.png
+++ b/assets/figures/perf_a100_n1_light.png
--- a/assets/figures/perf_a100_n3_dark.png
+++ b/assets/figures/perf_a100_n3_dark.png
--- a/assets/figures/perf_a100_n3_light.png
+++ b/assets/figures/perf_a100_n3_light.png
--- a/assets/figures/perf_a10g_n1_dark.png
+++ b/assets/figures/perf_a10g_n1_dark.png
--- a/assets/figures/perf_a10g_n1_light.png
+++ b/assets/figures/perf_a10g_n1_light.png
--- a/assets/figures/perf_a10g_n3_dark.png
+++ b/assets/figures/perf_a10g_n3_dark.png
--- a/assets/figures/perf_a10g_n3_light.png
+++ b/assets/figures/perf_a10g_n3_light.png
--- a/docs/source/assets/logos/vllm-logo-only-light.png
+++ b/docs/source/assets/logos/vllm-logo-only-light.png
--- a/docs/source/assets/logos/vllm-logo-text-dark.png
+++ b/docs/source/assets/logos/vllm-logo-text-dark.png
--- a/docs/source/assets/logos/vllm-logo-text-light.png
+++ b/docs/source/assets/logos/vllm-logo-text-light.png
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
 Welcome to vLLM!
 ================
-**vLLM** is a fast and easy-to-use library for LLM inference and serving.
+.. figure:: ./assets/logos/vllm-logo-text-light.png
-Its core features include:
+  :width: 60%
+  :align: center
- State-of-the-art performance in serving throughput
+  :alt: vLLM
- Efficient management of attention key and value memory with **PagedAttention**
+  :class: no-scaled-link
- Seamless integration with popular HuggingFace models
- Dynamic batching of incoming requests
+.. raw:: html
- Optimized CUDA kernels
- High-throughput serving with various decoding algorithms, including *parallel sampling* and *beam search*
+   <p style="text-align:center">
- Tensor parallelism support for distributed inference
+   <strong>Easy, fast, and cheap LLM serving for everyone
- Streaming outputs
+   </strong>
- OpenAI-compatible API server
+   </p>
+   <p style="text-align:center">
+   <a class="github-button" href="https://github.com/WoosukKwon/vllm" data-show-count="true" data-size="large" aria-label="Star skypilot-org/skypilot on GitHub">Star</a>
+   <a class="github-button" href="https://github.com/WoosukKwon/vllm/subscription" data-icon="octicon-eye" data-size="large" aria-label="Watch skypilot-org/skypilot on GitHub">Watch</a>
+   <a class="github-button" href="https://github.com/WoosukKwon/vllm/fork" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork skypilot-org/skypilot on GitHub">Fork</a>
+   </p>
+vLLM is a fast and easy to use library for LLM inference and serving.
+vLLM is fast with:
+* State-of-the-art serving throughput
+* Efficient management of attention key and value memory with **PagedAttention**
+* Dynamic batching of incoming requests
+* Optimized CUDA kernels
+vLLM is flexible and easy to use with:
+* Seamless integration with popular HuggingFace models
+* High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more
+* Tensor parallelism support for distributed inference
+* Streaming outputs
+* OpenAI-compatible API server
 For more information, please refer to our `blog post <>`_.