Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
d14c4ebf
Unverified
Commit
d14c4ebf
authored
Sep 11, 2025
by
Michael Yao
Committed by
GitHub
Sep 11, 2025
Browse files
[Docs] Use 1-2-3 list for deploy steps in deployment/frameworks/ (#24633)
Signed-off-by:
windsonsea
<
haifeng.yao@daocloud.io
>
parent
ba601102
Changes
6
Hide whitespace changes
Inline
Side-by-side
Showing
6 changed files
with
98 additions
and
90 deletions
+98
-90
docs/deployment/frameworks/autogen.md
docs/deployment/frameworks/autogen.md
+7
-9
docs/deployment/frameworks/chatbox.md
docs/deployment/frameworks/chatbox.md
+14
-10
docs/deployment/frameworks/dify.md
docs/deployment/frameworks/dify.md
+28
-22
docs/deployment/frameworks/haystack.md
docs/deployment/frameworks/haystack.md
+6
-6
docs/deployment/frameworks/litellm.md
docs/deployment/frameworks/litellm.md
+11
-11
docs/deployment/frameworks/retrieval_augmented_generation.md
docs/deployment/frameworks/retrieval_augmented_generation.md
+32
-32
No files found.
docs/deployment/frameworks/autogen.md
View file @
d14c4ebf
...
@@ -4,9 +4,7 @@
...
@@ -4,9 +4,7 @@
## Prerequisites
## Prerequisites
-
Setup vLLM environment
Set up the vLLM and
[
AutoGen
](
https://microsoft.github.io/autogen/0.2/docs/installation/
)
environment:
-
Setup
[
AutoGen
](
https://microsoft.github.io/autogen/0.2/docs/installation/
)
environment
```
bash
```
bash
pip
install
vllm
pip
install
vllm
...
@@ -18,14 +16,14 @@ pip install -U "autogen-agentchat" "autogen-ext[openai]"
...
@@ -18,14 +16,14 @@ pip install -U "autogen-agentchat" "autogen-ext[openai]"
## Deploy
## Deploy
-
Start the vLLM server with the supported chat completion model, e.g.
1.
Start the vLLM server with the supported chat completion model, e.g.
```
bash
```bash
python
-m
vllm.entrypoints.openai.api_server
\
python -m vllm.entrypoints.openai.api_server \
--model
mistralai/Mistral-7B-Instruct-v0.2
--model mistralai/Mistral-7B-Instruct-v0.2
```
```
-
Call it with AutoGen:
1.
Call it with AutoGen:
??? code
??? code
...
...
docs/deployment/frameworks/chatbox.md
View file @
d14c4ebf
...
@@ -6,27 +6,31 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
...
@@ -6,27 +6,31 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
## Prerequisites
## Prerequisites
-
Setup vLLM environment
Set up the vLLM environment:
```
bash
pip
install
vllm
```
## Deploy
## Deploy
-
Start the vLLM server with the supported chat completion model, e.g.
1.
Start the vLLM server with the supported chat completion model, e.g.
```
bash
```bash
vllm serve qwen/Qwen1.5-0.5B-Chat
vllm serve qwen/Qwen1.5-0.5B-Chat
```
```
-
Download and install
[
Chatbox desktop
](
https://chatboxai.app/en#download
)
.
1.
Download and install
[
Chatbox desktop
](
https://chatboxai.app/en#download
)
.
-
On the bottom left of settings, Add Custom Provider
1.
On the bottom left of settings, Add Custom Provider
-
API Mode:
`OpenAI API Compatible`
-
API Mode:
`OpenAI API Compatible`
-
Name: vllm
-
Name: vllm
-
API Host:
`http://{vllm server host}:{vllm server port}/v1`
-
API Host:
`http://{vllm server host}:{vllm server port}/v1`
-
API Path:
`/chat/completions`
-
API Path:
`/chat/completions`
-
Model:
`qwen/Qwen1.5-0.5B-Chat`
-
Model:
`qwen/Qwen1.5-0.5B-Chat`


-
Go to
`Just chat`
, and start to chat:
1.
Go to
`Just chat`
, and start to chat:


docs/deployment/frameworks/dify.md
View file @
d14c4ebf
...
@@ -8,44 +8,50 @@ This guide walks you through deploying Dify using a vLLM backend.
...
@@ -8,44 +8,50 @@ This guide walks you through deploying Dify using a vLLM backend.
## Prerequisites
## Prerequisites
-
Setup vLLM environment
Set up the vLLM environment:
-
Install
[
Docker
](
https://docs.docker.com/engine/install/
)
and
[
Docker Compose
](
https://docs.docker.com/compose/install/
)
```
bash
pip
install
vllm
```
And install
[
Docker
](
https://docs.docker.com/engine/install/
)
and
[
Docker Compose
](
https://docs.docker.com/compose/install/
)
.
## Deploy
## Deploy
-
Start the vLLM server with the supported chat completion model, e.g.
1.
Start the vLLM server with the supported chat completion model, e.g.
```
bash
```bash
vllm serve Qwen/Qwen1.5-7B-Chat
vllm serve Qwen/Qwen1.5-7B-Chat
```
```
-
Start the Dify server with docker compose (
[
details
](
https://github.com/langgenius/dify?tab=readme-ov-file#quick-start
)
):
1.
Start the Dify server with docker compose (
[
details
](
https://github.com/langgenius/dify?tab=readme-ov-file#quick-start
)
):
```
bash
```bash
git clone https://github.com/langgenius/dify.git
git clone https://github.com/langgenius/dify.git
cd
dify
cd dify
cd
docker
cd docker
cp
.env.example .env
cp .env.example .env
docker compose up
-d
docker compose up -d
```
```
1.
Open the browser to access
`http://localhost/install`
, config the basic login information and login.
-
Ope
n the
browser to access
`http://localhost/install`
, config the basic login information and login
.
1.
I
n the
top-right user menu (under the profile icon), go to Settings, then click
`Model Provider`
, and locate the
`vLLM`
provider to install it
.
-
In the top-right user menu (under the profile icon), go to Settings, then click
`M
odel
P
rovider
`
, and locate the
`vLLM`
provider to install it.
1.
Fill in the m
odel
p
rovider
details as follows:
-
Fill in the model provider details as follows:
- **Model Type**: `LLM`
- **Model Type**: `LLM`
- **Model Name**: `Qwen/Qwen1.5-7B-Chat`
- **Model Name**: `Qwen/Qwen1.5-7B-Chat`
- **API Endpoint URL**: `http://{vllm_server_host}:{vllm_server_port}/v1`
- **API Endpoint URL**: `http://{vllm_server_host}:{vllm_server_port}/v1`
- **Model Name for API Endpoint**: `Qwen/Qwen1.5-7B-Chat`
- **Model Name for API Endpoint**: `Qwen/Qwen1.5-7B-Chat`
- **Completion Mode**: `Completion`
- **Completion Mode**: `Completion`


-
To create a test chatbot, go to
`Studio → Chatbot → Create from Blank`
, then select Chatbot as the type:
1.
To create a test chatbot, go to
`Studio → Chatbot → Create from Blank`
, then select Chatbot as the type:


-
Click the chatbot you just created to open the chat interface and start interacting with the model:
1.
Click the chatbot you just created to open the chat interface and start interacting with the model:


docs/deployment/frameworks/haystack.md
View file @
d14c4ebf
...
@@ -6,7 +6,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
...
@@ -6,7 +6,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
## Prerequisites
## Prerequisites
-
Setup vLLM and Haystack environment
Set
up
the
vLLM and Haystack environment
:
```
bash
```
bash
pip
install
vllm haystack-ai
pip
install
vllm haystack-ai
...
@@ -14,13 +14,13 @@ pip install vllm haystack-ai
...
@@ -14,13 +14,13 @@ pip install vllm haystack-ai
## Deploy
## Deploy
-
Start the vLLM server with the supported chat completion model, e.g.
1.
Start the vLLM server with the supported chat completion model, e.g.
```
bash
```bash
vllm serve mistralai/Mistral-7B-Instruct-v0.1
vllm serve mistralai/Mistral-7B-Instruct-v0.1
```
```
-
Use the
`OpenAIGenerator`
and
`OpenAIChatGenerator`
components in Haystack to query the vLLM server.
1.
Use the
`OpenAIGenerator`
and
`OpenAIChatGenerator`
components in Haystack to query the vLLM server.
??? code
??? code
...
...
docs/deployment/frameworks/litellm.md
View file @
d14c4ebf
...
@@ -13,7 +13,7 @@ And LiteLLM supports all models on VLLM.
...
@@ -13,7 +13,7 @@ And LiteLLM supports all models on VLLM.
## Prerequisites
## Prerequisites
-
Setup vLLM and litellm environment
Set
up
the
vLLM and litellm environment
:
```
bash
```
bash
pip
install
vllm litellm
pip
install
vllm litellm
...
@@ -23,13 +23,13 @@ pip install vllm litellm
...
@@ -23,13 +23,13 @@ pip install vllm litellm
### Chat completion
### Chat completion
-
Start the vLLM server with the supported chat completion model, e.g.
1.
Start the vLLM server with the supported chat completion model, e.g.
```
bash
```bash
vllm serve qwen/Qwen1.5-0.5B-Chat
vllm serve qwen/Qwen1.5-0.5B-Chat
```
```
-
Call it with litellm:
1.
Call it with litellm:
??? code
??? code
...
@@ -51,13 +51,13 @@ vllm serve qwen/Qwen1.5-0.5B-Chat
...
@@ -51,13 +51,13 @@ vllm serve qwen/Qwen1.5-0.5B-Chat
### Embeddings
### Embeddings
-
Start the vLLM server with the supported embedding model, e.g.
1.
Start the vLLM server with the supported embedding model, e.g.
```
bash
```bash
vllm serve BAAI/bge-base-en-v1.5
vllm serve BAAI/bge-base-en-v1.5
```
```
-
Call it with litellm:
1.
Call it with litellm:
```
python
```
python
from
litellm
import
embedding
from
litellm
import
embedding
...
...
docs/deployment/frameworks/retrieval_augmented_generation.md
View file @
d14c4ebf
...
@@ -11,7 +11,7 @@ Here are the integrations:
...
@@ -11,7 +11,7 @@ Here are the integrations:
### Prerequisites
### Prerequisites
-
Setup vLLM and langchain environment
Set
up
the
vLLM and langchain environment
:
```
bash
```
bash
pip
install
-U
vllm
\
pip
install
-U
vllm
\
...
@@ -22,33 +22,33 @@ pip install -U vllm \
...
@@ -22,33 +22,33 @@ pip install -U vllm \
### Deploy
### Deploy
-
Start the vLLM server with the supported embedding model, e.g.
1.
Start the vLLM server with the supported embedding model, e.g.
```
bash
```bash
# Start embedding service (port 8000)
# Start embedding service (port 8000)
vllm serve ssmits/Qwen2-7B-Instruct-embed-base
vllm serve ssmits/Qwen2-7B-Instruct-embed-base
```
```
-
Start the vLLM server with the supported chat completion model, e.g.
1.
Start the vLLM server with the supported chat completion model, e.g.
```
bash
```bash
# Start chat service (port 8001)
# Start chat service (port 8001)
vllm serve qwen/Qwen1.5-0.5B-Chat
--port
8001
vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
```
```
-
Use the script:
<gh-file:examples
/
online_serving
/
retrieval_augmented_generation_with_langchain.py
>
1.
Use the script:
<gh-file:examples
/
online_serving
/
retrieval_augmented_generation_with_langchain.py
>
-
Run the script
1.
Run the script
```
python
```python
python
retrieval_augmented_generation_with_langchain
.
py
python retrieval_augmented_generation_with_langchain.py
```
```
## vLLM + llamaindex
## vLLM + llamaindex
### Prerequisites
### Prerequisites
-
Setup vLLM and llamaindex environment
Set
up
the
vLLM and llamaindex environment
:
```
bash
```
bash
pip
install
vllm
\
pip
install
vllm
\
...
@@ -60,24 +60,24 @@ pip install vllm \
...
@@ -60,24 +60,24 @@ pip install vllm \
### Deploy
### Deploy
-
Start the vLLM server with the supported embedding model, e.g.
1.
Start the vLLM server with the supported embedding model, e.g.
```
bash
```bash
# Start embedding service (port 8000)
# Start embedding service (port 8000)
vllm serve ssmits/Qwen2-7B-Instruct-embed-base
vllm serve ssmits/Qwen2-7B-Instruct-embed-base
```
```
-
Start the vLLM server with the supported chat completion model, e.g.
1.
Start the vLLM server with the supported chat completion model, e.g.
```
bash
```bash
# Start chat service (port 8001)
# Start chat service (port 8001)
vllm serve qwen/Qwen1.5-0.5B-Chat
--port
8001
vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
```
```
-
Use the script:
<gh-file:examples
/
online_serving
/
retrieval_augmented_generation_with_llamaindex.py
>
1.
Use the script:
<gh-file:examples
/
online_serving
/
retrieval_augmented_generation_with_llamaindex.py
>
-
Run the script
1.
Run the script
:
```
python
```python
python
retrieval_augmented_generation_with_llamaindex
.
py
python retrieval_augmented_generation_with_llamaindex.py
```
```
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment