Unverified Commit 0a2a820b authored by Anant Sharma's avatar Anant Sharma Committed by GitHub
Browse files

docs: move all md files from components to docs (#3440)


Signed-off-by: default avatarAnant Sharma <anants@nvidia.com>
Co-authored-by: default avatarAnish <80174047+athreesh@users.noreply.github.com>
parent b640f283
...@@ -88,7 +88,7 @@ Install Dynamo with [SGLang](https://docs.sglang.ai/) support: ...@@ -88,7 +88,7 @@ Install Dynamo with [SGLang](https://docs.sglang.ai/) support:
pip install ai-dynamo[sglang] pip install ai-dynamo[sglang]
``` ```
For more information about the SGLang backend and its integration with Dynamo, see the [SGLang Backend Documentation](../../../components/backends/sglang/README.md). For more information about the SGLang backend and its integration with Dynamo, see the [SGLang Backend Documentation](../../../docs/backends/sglang/README.md).
### 3. Network Requirements ### 3. Network Requirements
......
...@@ -18,7 +18,7 @@ docker compose -f deploy/docker-compose.yml up -d ...@@ -18,7 +18,7 @@ docker compose -f deploy/docker-compose.yml up -d
## Components ## Components
- [Frontend](/components/src/dynamo/frontend/README.md) - A built-in component that launches an OpenAI compliant HTTP server, a pre-processor, and a router in a single process - [Frontend](/components/src/dynamo/frontend/README.md) - A built-in component that launches an OpenAI compliant HTTP server, a pre-processor, and a router in a single process
- [vLLM Backend](/components/backends/vllm/README.md) - A built-in component that runs vLLM within the Dynamo runtime - [vLLM Backend](/docs/backends/vllm/README.md) - A built-in component that runs vLLM within the Dynamo runtime
```mermaid ```mermaid
--- ---
......
...@@ -44,7 +44,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1)) ...@@ -44,7 +44,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
In this workflow, we have two workers, [VllmEncodeWorker](components/encode_worker.py) and [VllmPDWorker](components/worker.py). In this workflow, we have two workers, [VllmEncodeWorker](components/encode_worker.py) and [VllmPDWorker](components/worker.py).
The VllmEncodeWorker is responsible for encoding the image and passing the embeddings to the VllmPDWorker via a combination of NATS and RDMA. The VllmEncodeWorker is responsible for encoding the image and passing the embeddings to the VllmPDWorker via a combination of NATS and RDMA.
The work complete event is sent via NATS, while the embeddings tensor is transferred via RDMA through the NIXL interface. The work complete event is sent via NATS, while the embeddings tensor is transferred via RDMA through the NIXL interface.
Its VllmPDWorker then prefills and decodes the prompt, just like the [LLM aggregated serving](../../components/backends/vllm/README.md) example. Its VllmPDWorker then prefills and decodes the prompt, just like the [LLM aggregated serving](../../docs/backends/vllm/README.md) example.
By separating the encode from the prefill and decode stages, we can have a more flexible deployment and scale the By separating the encode from the prefill and decode stages, we can have a more flexible deployment and scale the
VllmEncodeWorker independently from the prefill and decode workers if needed. VllmEncodeWorker independently from the prefill and decode workers if needed.
...@@ -122,7 +122,7 @@ For the Llava model, embeddings are only required during the prefill stage. As s ...@@ -122,7 +122,7 @@ For the Llava model, embeddings are only required during the prefill stage. As s
The VllmEncodeWorker is responsible for encoding the image and passing the embeddings to the prefill worker via a combination of NATS and RDMA. The VllmEncodeWorker is responsible for encoding the image and passing the embeddings to the prefill worker via a combination of NATS and RDMA.
Its work complete event is sent via NATS, while the embeddings tensor is transferred via RDMA through the NIXL interface. Its work complete event is sent via NATS, while the embeddings tensor is transferred via RDMA through the NIXL interface.
The prefill worker performs the prefilling step and forwards the KV cache to the decode worker for decoding. The prefill worker performs the prefilling step and forwards the KV cache to the decode worker for decoding.
For more details on the roles of the prefill and decode workers, refer to the [LLM disaggregated serving](../../components/backends/vllm/README.md) example. For more details on the roles of the prefill and decode workers, refer to the [LLM disaggregated serving](../../docs/backends/vllm/README.md) example.
This figure illustrates the workflow: This figure illustrates the workflow:
```mermaid ```mermaid
...@@ -203,7 +203,7 @@ of the model per node. ...@@ -203,7 +203,7 @@ of the model per node.
#### Workflow #### Workflow
In this workflow, we have [VllmPDWorker](components/worker.py) which will encode the image, prefill and decode the prompt, just like the [LLM aggregated serving](/components/backends/vllm/README.md) example. In this workflow, we have [VllmPDWorker](components/worker.py) which will encode the image, prefill and decode the prompt, just like the [LLM aggregated serving](/docs/backends/vllm/README.md) example.
This figure illustrates the workflow: This figure illustrates the workflow:
```mermaid ```mermaid
...@@ -267,7 +267,7 @@ You should see a response similar to this: ...@@ -267,7 +267,7 @@ You should see a response similar to this:
In this workflow, we have two workers, [VllmDecodeWorker](components/worker.py), and [VllmPDWorker](components/worker.py). In this workflow, we have two workers, [VllmDecodeWorker](components/worker.py), and [VllmPDWorker](components/worker.py).
The prefill worker performs the encoding and prefilling steps and forwards the KV cache to the decode worker for decoding. The prefill worker performs the encoding and prefilling steps and forwards the KV cache to the decode worker for decoding.
For more details on the roles of the prefill and decode workers, refer to the [LLM disaggregated serving](/components/backends/vllm/README.md) example. For more details on the roles of the prefill and decode workers, refer to the [LLM disaggregated serving](/docs/backends/vllm/README.md) example.
This figure illustrates the workflow: This figure illustrates the workflow:
```mermaid ```mermaid
...@@ -342,7 +342,7 @@ This example demonstrates deploying an aggregated multimodal model that can proc ...@@ -342,7 +342,7 @@ This example demonstrates deploying an aggregated multimodal model that can proc
In this workflow, we have two workers, [VideoEncodeWorker](components/video_encode_worker.py) and [VllmPDWorker](components/worker.py). In this workflow, we have two workers, [VideoEncodeWorker](components/video_encode_worker.py) and [VllmPDWorker](components/worker.py).
The VideoEncodeWorker is responsible for decoding the video into a series of frames. Unlike the image pipeline which generates embeddings, The VideoEncodeWorker is responsible for decoding the video into a series of frames. Unlike the image pipeline which generates embeddings,
this pipeline passes the raw frames directly to the VllmPDWorker via a combination of NATS and RDMA. this pipeline passes the raw frames directly to the VllmPDWorker via a combination of NATS and RDMA.
Its VllmPDWorker then prefills and decodes the prompt, just like the [LLM aggregated serving](/components/backends/vllm/README.md) example. Its VllmPDWorker then prefills and decodes the prompt, just like the [LLM aggregated serving](/docs/backends/vllm/README.md) example.
By separating the video processing from the prefill and decode stages, we can have a more flexible deployment and scale the By separating the video processing from the prefill and decode stages, we can have a more flexible deployment and scale the
VideoEncodeWorker independently from the prefill and decode workers if needed. VideoEncodeWorker independently from the prefill and decode workers if needed.
...@@ -431,7 +431,7 @@ In this workflow, we have three workers, [VideoEncodeWorker](components/video_en ...@@ -431,7 +431,7 @@ In this workflow, we have three workers, [VideoEncodeWorker](components/video_en
For the LLaVA-NeXT-Video-7B model, frames are only required during the prefill stage. As such, the VideoEncodeWorker is connected directly to the prefill worker. For the LLaVA-NeXT-Video-7B model, frames are only required during the prefill stage. As such, the VideoEncodeWorker is connected directly to the prefill worker.
The VideoEncodeWorker is responsible for decoding the video into a series of frames and passing them to the prefill worker via RDMA. The VideoEncodeWorker is responsible for decoding the video into a series of frames and passing them to the prefill worker via RDMA.
The prefill worker performs the prefilling step and forwards the KV cache to the decode worker for decoding. The prefill worker performs the prefilling step and forwards the KV cache to the decode worker for decoding.
For more details on the roles of the prefill and decode workers, refer to the [LLM disaggregated serving](/components/backends/vllm/README.md) example. For more details on the roles of the prefill and decode workers, refer to the [LLM disaggregated serving](/docs/backends/vllm/README.md) example.
This figure illustrates the workflow: This figure illustrates the workflow:
```mermaid ```mermaid
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment