Unverified Commit d14c4ebf authored by Michael Yao's avatar Michael Yao Committed by GitHub
Browse files

[Docs] Use 1-2-3 list for deploy steps in deployment/frameworks/ (#24633)


Signed-off-by: default avatarwindsonsea <haifeng.yao@daocloud.io>
parent ba601102
...@@ -4,9 +4,7 @@ ...@@ -4,9 +4,7 @@
## Prerequisites ## Prerequisites
- Setup vLLM environment Set up the vLLM and [AutoGen](https://microsoft.github.io/autogen/0.2/docs/installation/) environment:
- Setup [AutoGen](https://microsoft.github.io/autogen/0.2/docs/installation/) environment
```bash ```bash
pip install vllm pip install vllm
...@@ -18,14 +16,14 @@ pip install -U "autogen-agentchat" "autogen-ext[openai]" ...@@ -18,14 +16,14 @@ pip install -U "autogen-agentchat" "autogen-ext[openai]"
## Deploy ## Deploy
- Start the vLLM server with the supported chat completion model, e.g. 1. Start the vLLM server with the supported chat completion model, e.g.
```bash ```bash
python -m vllm.entrypoints.openai.api_server \ python -m vllm.entrypoints.openai.api_server \
--model mistralai/Mistral-7B-Instruct-v0.2 --model mistralai/Mistral-7B-Instruct-v0.2
``` ```
- Call it with AutoGen: 1. Call it with AutoGen:
??? code ??? code
......
...@@ -6,27 +6,31 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac ...@@ -6,27 +6,31 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
## Prerequisites ## Prerequisites
- Setup vLLM environment Set up the vLLM environment:
```bash
pip install vllm
```
## Deploy ## Deploy
- Start the vLLM server with the supported chat completion model, e.g. 1. Start the vLLM server with the supported chat completion model, e.g.
```bash ```bash
vllm serve qwen/Qwen1.5-0.5B-Chat vllm serve qwen/Qwen1.5-0.5B-Chat
``` ```
- Download and install [Chatbox desktop](https://chatboxai.app/en#download). 1. Download and install [Chatbox desktop](https://chatboxai.app/en#download).
- On the bottom left of settings, Add Custom Provider 1. On the bottom left of settings, Add Custom Provider
- API Mode: `OpenAI API Compatible` - API Mode: `OpenAI API Compatible`
- Name: vllm - Name: vllm
- API Host: `http://{vllm server host}:{vllm server port}/v1` - API Host: `http://{vllm server host}:{vllm server port}/v1`
- API Path: `/chat/completions` - API Path: `/chat/completions`
- Model: `qwen/Qwen1.5-0.5B-Chat` - Model: `qwen/Qwen1.5-0.5B-Chat`
![](../../assets/deployment/chatbox-settings.png) ![](../../assets/deployment/chatbox-settings.png)
- Go to `Just chat`, and start to chat: 1. Go to `Just chat`, and start to chat:
![](../../assets/deployment/chatbox-chat.png) ![](../../assets/deployment/chatbox-chat.png)
...@@ -8,44 +8,50 @@ This guide walks you through deploying Dify using a vLLM backend. ...@@ -8,44 +8,50 @@ This guide walks you through deploying Dify using a vLLM backend.
## Prerequisites ## Prerequisites
- Setup vLLM environment Set up the vLLM environment:
- Install [Docker](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/)
```bash
pip install vllm
```
And install [Docker](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/).
## Deploy ## Deploy
- Start the vLLM server with the supported chat completion model, e.g. 1. Start the vLLM server with the supported chat completion model, e.g.
```bash ```bash
vllm serve Qwen/Qwen1.5-7B-Chat vllm serve Qwen/Qwen1.5-7B-Chat
``` ```
- Start the Dify server with docker compose ([details](https://github.com/langgenius/dify?tab=readme-ov-file#quick-start)): 1. Start the Dify server with docker compose ([details](https://github.com/langgenius/dify?tab=readme-ov-file#quick-start)):
```bash ```bash
git clone https://github.com/langgenius/dify.git git clone https://github.com/langgenius/dify.git
cd dify cd dify
cd docker cd docker
cp .env.example .env cp .env.example .env
docker compose up -d docker compose up -d
``` ```
1. Open the browser to access `http://localhost/install`, config the basic login information and login.
- Open the browser to access `http://localhost/install`, config the basic login information and login. 1. In the top-right user menu (under the profile icon), go to Settings, then click `Model Provider`, and locate the `vLLM` provider to install it.
- In the top-right user menu (under the profile icon), go to Settings, then click `Model Provider`, and locate the `vLLM` provider to install it. 1. Fill in the model provider details as follows:
- Fill in the model provider details as follows:
- **Model Type**: `LLM` - **Model Type**: `LLM`
- **Model Name**: `Qwen/Qwen1.5-7B-Chat` - **Model Name**: `Qwen/Qwen1.5-7B-Chat`
- **API Endpoint URL**: `http://{vllm_server_host}:{vllm_server_port}/v1` - **API Endpoint URL**: `http://{vllm_server_host}:{vllm_server_port}/v1`
- **Model Name for API Endpoint**: `Qwen/Qwen1.5-7B-Chat` - **Model Name for API Endpoint**: `Qwen/Qwen1.5-7B-Chat`
- **Completion Mode**: `Completion` - **Completion Mode**: `Completion`
![](../../assets/deployment/dify-settings.png) ![](../../assets/deployment/dify-settings.png)
- To create a test chatbot, go to `Studio → Chatbot → Create from Blank`, then select Chatbot as the type: 1. To create a test chatbot, go to `Studio → Chatbot → Create from Blank`, then select Chatbot as the type:
![](../../assets/deployment/dify-create-chatbot.png) ![](../../assets/deployment/dify-create-chatbot.png)
- Click the chatbot you just created to open the chat interface and start interacting with the model: 1. Click the chatbot you just created to open the chat interface and start interacting with the model:
![](../../assets/deployment/dify-chat.png) ![](../../assets/deployment/dify-chat.png)
...@@ -6,7 +6,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac ...@@ -6,7 +6,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
## Prerequisites ## Prerequisites
- Setup vLLM and Haystack environment Set up the vLLM and Haystack environment:
```bash ```bash
pip install vllm haystack-ai pip install vllm haystack-ai
...@@ -14,13 +14,13 @@ pip install vllm haystack-ai ...@@ -14,13 +14,13 @@ pip install vllm haystack-ai
## Deploy ## Deploy
- Start the vLLM server with the supported chat completion model, e.g. 1. Start the vLLM server with the supported chat completion model, e.g.
```bash ```bash
vllm serve mistralai/Mistral-7B-Instruct-v0.1 vllm serve mistralai/Mistral-7B-Instruct-v0.1
``` ```
- Use the `OpenAIGenerator` and `OpenAIChatGenerator` components in Haystack to query the vLLM server. 1. Use the `OpenAIGenerator` and `OpenAIChatGenerator` components in Haystack to query the vLLM server.
??? code ??? code
......
...@@ -13,7 +13,7 @@ And LiteLLM supports all models on VLLM. ...@@ -13,7 +13,7 @@ And LiteLLM supports all models on VLLM.
## Prerequisites ## Prerequisites
- Setup vLLM and litellm environment Set up the vLLM and litellm environment:
```bash ```bash
pip install vllm litellm pip install vllm litellm
...@@ -23,13 +23,13 @@ pip install vllm litellm ...@@ -23,13 +23,13 @@ pip install vllm litellm
### Chat completion ### Chat completion
- Start the vLLM server with the supported chat completion model, e.g. 1. Start the vLLM server with the supported chat completion model, e.g.
```bash ```bash
vllm serve qwen/Qwen1.5-0.5B-Chat vllm serve qwen/Qwen1.5-0.5B-Chat
``` ```
- Call it with litellm: 1. Call it with litellm:
??? code ??? code
...@@ -51,13 +51,13 @@ vllm serve qwen/Qwen1.5-0.5B-Chat ...@@ -51,13 +51,13 @@ vllm serve qwen/Qwen1.5-0.5B-Chat
### Embeddings ### Embeddings
- Start the vLLM server with the supported embedding model, e.g. 1. Start the vLLM server with the supported embedding model, e.g.
```bash ```bash
vllm serve BAAI/bge-base-en-v1.5 vllm serve BAAI/bge-base-en-v1.5
``` ```
- Call it with litellm: 1. Call it with litellm:
```python ```python
from litellm import embedding from litellm import embedding
......
...@@ -11,7 +11,7 @@ Here are the integrations: ...@@ -11,7 +11,7 @@ Here are the integrations:
### Prerequisites ### Prerequisites
- Setup vLLM and langchain environment Set up the vLLM and langchain environment:
```bash ```bash
pip install -U vllm \ pip install -U vllm \
...@@ -22,33 +22,33 @@ pip install -U vllm \ ...@@ -22,33 +22,33 @@ pip install -U vllm \
### Deploy ### Deploy
- Start the vLLM server with the supported embedding model, e.g. 1. Start the vLLM server with the supported embedding model, e.g.
```bash ```bash
# Start embedding service (port 8000) # Start embedding service (port 8000)
vllm serve ssmits/Qwen2-7B-Instruct-embed-base vllm serve ssmits/Qwen2-7B-Instruct-embed-base
``` ```
- Start the vLLM server with the supported chat completion model, e.g. 1. Start the vLLM server with the supported chat completion model, e.g.
```bash ```bash
# Start chat service (port 8001) # Start chat service (port 8001)
vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001 vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
``` ```
- Use the script: <gh-file:examples/online_serving/retrieval_augmented_generation_with_langchain.py> 1. Use the script: <gh-file:examples/online_serving/retrieval_augmented_generation_with_langchain.py>
- Run the script 1. Run the script
```python ```python
python retrieval_augmented_generation_with_langchain.py python retrieval_augmented_generation_with_langchain.py
``` ```
## vLLM + llamaindex ## vLLM + llamaindex
### Prerequisites ### Prerequisites
- Setup vLLM and llamaindex environment Set up the vLLM and llamaindex environment:
```bash ```bash
pip install vllm \ pip install vllm \
...@@ -60,24 +60,24 @@ pip install vllm \ ...@@ -60,24 +60,24 @@ pip install vllm \
### Deploy ### Deploy
- Start the vLLM server with the supported embedding model, e.g. 1. Start the vLLM server with the supported embedding model, e.g.
```bash ```bash
# Start embedding service (port 8000) # Start embedding service (port 8000)
vllm serve ssmits/Qwen2-7B-Instruct-embed-base vllm serve ssmits/Qwen2-7B-Instruct-embed-base
``` ```
- Start the vLLM server with the supported chat completion model, e.g. 1. Start the vLLM server with the supported chat completion model, e.g.
```bash ```bash
# Start chat service (port 8001) # Start chat service (port 8001)
vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001 vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
``` ```
- Use the script: <gh-file:examples/online_serving/retrieval_augmented_generation_with_llamaindex.py> 1. Use the script: <gh-file:examples/online_serving/retrieval_augmented_generation_with_llamaindex.py>
- Run the script 1. Run the script:
```python ```python
python retrieval_augmented_generation_with_llamaindex.py python retrieval_augmented_generation_with_llamaindex.py
``` ```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment