installation.rst 1.31 KB
Newer Older
Woosuk Kwon's avatar
Woosuk Kwon committed
1
2
3
Installation
============

Woosuk Kwon's avatar
Woosuk Kwon committed
4
5
vLLM is a Python library that includes some C++ and CUDA code.
vLLM can run on systems that meet the following requirements:
6
7
8
9
10
11
12

* OS: Linux
* Python: 3.8 or higher
* CUDA: 11.0 -- 11.8
* GPU: compute capability 7.0 or higher (e.g., V100, T4, RTX20xx, A100, etc.)

.. note::
Woosuk Kwon's avatar
Woosuk Kwon committed
13
    As of now, vLLM does not support CUDA 12.
14
15
16
    If you are using Hopper or Lovelace GPUs, please use CUDA 11.8.

.. tip::
Woosuk Kwon's avatar
Woosuk Kwon committed
17
    If you have trouble installing vLLM, we recommend using the NVIDIA PyTorch Docker image.
18
19
20

    .. code-block:: console

Woosuk Kwon's avatar
Woosuk Kwon committed
21
        $ # Pull the Docker image with CUDA 11.8.
22
23
        $ docker run --gpus all -it --rm --shm-size=8g nvcr.io/nvidia/pytorch:22.12-py3

Woosuk Kwon's avatar
Woosuk Kwon committed
24
    Inside the Docker container, please execute :code:`pip uninstall torch` before installing vLLM.
Woosuk Kwon's avatar
Woosuk Kwon committed
25

26
27
28
Install with pip
----------------

Woosuk Kwon's avatar
Woosuk Kwon committed
29
You can install vLLM using pip:
30
31
32
33
34
35
36

.. code-block:: console

    $ # (Optional) Create a new conda environment.
    $ conda create -n myenv python=3.8 -y
    $ conda activate myenv

Woosuk Kwon's avatar
Woosuk Kwon committed
37
38
    $ # Install vLLM.
    $ pip install vllm  # This may take 5-10 minutes.
39
40
41
42


.. _build_from_source:

Woosuk Kwon's avatar
Woosuk Kwon committed
43
44
45
Build from source
-----------------

Woosuk Kwon's avatar
Woosuk Kwon committed
46
You can also build and install vLLM from source.
47

Woosuk Kwon's avatar
Woosuk Kwon committed
48
49
.. code-block:: console

Woosuk Kwon's avatar
Woosuk Kwon committed
50
51
    $ git clone https://github.com/WoosukKwon/vllm.git
    $ cd vllm
52
    $ pip install -e .  # This may take 5-10 minutes.