llamastack.md 949 Bytes
Newer Older
1
# Llama Stack
2

3
vLLM is also available via [Llama Stack](https://github.com/llamastack/llama-stack).
4
5
6

To install Llama Stack, run

7
```bash
8
pip install llama-stack -q
9
10
```

11
## Inference using OpenAI-Compatible API
12

13
Then start the Llama Stack server and configure it to point to your vLLM server with the following settings:
14
15
16
17
18
19
20
21
22

```yaml
inference:
  - provider_id: vllm0
    provider_type: remote::vllm
    config:
      url: http://127.0.0.1:8000
```

23
Please refer to [this guide](https://llama-stack.readthedocs.io/en/latest/providers/inference/remote_vllm.html) for more details on this remote vLLM provider.
24

25
## Inference using Embedded vLLM
26

27
An [inline provider](https://github.com/llamastack/llama-stack/tree/main/llama_stack/providers/inline/inference)
28
29
30
is also available. This is a sample of configuration using that method:

```yaml
31
inference:
32
33
34
35
36
  - provider_type: vllm
    config:
      model: Llama3.1-8B-Instruct
      tensor_parallel_size: 4
```