vLLM is also available via [Llama Stack](https://github.com/meta-llama/llama-stack).
vLLM is also available via [Llama Stack](https://github.com/llamastack/llama-stack).
To install Llama Stack, run
To install Llama Stack, run
...
@@ -8,9 +8,9 @@ To install Llama Stack, run
...
@@ -8,9 +8,9 @@ To install Llama Stack, run
pip install llama-stack -q
pip install llama-stack -q
```
```
## Inference using OpenAICompatible API
## Inference using OpenAI-Compatible API
Then start Llama Stack server pointing to your vLLM server with the following configuration:
Then start the Llama Stack server and configure it to point to your vLLM server with the following settings:
```yaml
```yaml
inference:
inference:
...
@@ -20,15 +20,15 @@ inference:
...
@@ -20,15 +20,15 @@ inference:
url:http://127.0.0.1:8000
url:http://127.0.0.1:8000
```
```
Please refer to [this guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/remote-vllm.html) for more details on this remote vLLM provider.
Please refer to [this guide](https://llama-stack.readthedocs.io/en/latest/providers/inference/remote_vllm.html) for more details on this remote vLLM provider.
## Inference via Embedded vLLM
## Inference using Embedded vLLM
An [inline vLLM provider](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline/inference/vllm)
An [inline provider](https://github.com/llamastack/llama-stack/tree/main/llama_stack/providers/inline/inference)
is also available. This is a sample of configuration using that method:
is also available. This is a sample of configuration using that method: