"tools/git@developer.sourcefind.cn:OpenDAS/openpcdet.git" did not exist on "04a73eda55689d009dbaafe2c2f4c6b5dccf16ca"
Commit 70266ec8 authored by ptarasiewiczNV's avatar ptarasiewiczNV Committed by GitHub
Browse files

docs: Add disaggregated architecture mermaid diagram (#190)


Co-authored-by: default avatarhongkuanz <hongkuanz@nvidia.com>
Co-authored-by: default avatarMeenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Co-authored-by: default avatarDmitry Tokarev <dtokarev@nvidia.com>
parent aca25898
...@@ -33,6 +33,29 @@ Single-instance deployment where both prefill and decode are done by the same wo ...@@ -33,6 +33,29 @@ Single-instance deployment where both prefill and decode are done by the same wo
### Disaggregated ### Disaggregated
Distributed deployment where prefill and decode are done by separate workers that can scale independently. Distributed deployment where prefill and decode are done by separate workers that can scale independently.
```mermaid
sequenceDiagram
participant D as VllmWorker
participant Q as PrefillQueue
participant P as PrefillWorker
Note over D: Request is routed to decode
D->>D: Decide if prefill should be done locally or remotely
D->>D: Allocate KV blocks
D->>Q: Put RemotePrefillRequest on the queue
P->>Q: Pull request from the queue
P-->>D: Read cached KVs from Decode
D->>D: Decode other requests
P->>P: Run prefill
P-->>D: Write prefilled KVs into allocated blocks
P->>D: Send completion notification
Note over D: Notification received when prefill is done
D->>D: Schedule decoding
```
## Getting Started ## Getting Started
1. Choose a deployment architecture based on your requirements 1. Choose a deployment architecture based on your requirements
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment