docs: hello world and vllm process docs (#525)

c14e460c · ishandhanani · GitHub · 4b6cfc1b · c14e460c · c14e460c
Commit c14e460c authored Apr 04, 2025 by ishandhanani Committed by GitHub Apr 04, 2025
Showing with 57 additions and 20 deletions

examples/hello_world/README.md examples/hello_world/README.md +44 -3

examples/hello_world/hello_world.py examples/hello_world/hello_world.py +9 -12

examples/llm/README.md examples/llm/README.md +4 -5

No files found.
--- a/examples/hello_world/README.md
+++ b/examples/hello_world/README.md
@@ -15,8 +15,17 @@ See the License for the specific language governing permissions and
 limitations under the License.
 -->
+# Hello World Example
 ## Overview
+This example demonstrates the basic concepts of Dynamo by creating a simple multi-service pipeline. It shows how to:
+1. Create and connect multiple Dynamo services
+2. Pass data between services using Dynamo's runtime
+3. Set up a simple HTTP API endpoint
+4. Deploy and interact with a Dynamo service graph
 Pipeline Architecture:
 ```
@@ -38,16 +47,35 @@ Users/Clients (HTTP)
 └─────────────┘
 ```
+## Component Descriptions
+### Frontend Service
+- Serves as the entry point for external HTTP requests
+- Exposes a `/generate` HTTP API endpoint that clients can call
+- Processes incoming text and passes it to the Middle service
+### Middle Service
+- Acts as an intermediary service in the pipeline
+- Receives requests from the Frontend
+- Appends "-mid" to the text and forwards it to the Backend
+### Backend Service
+- Functions as the final service in the pipeline
+- Processes requests from the Middle service
+- Appends "-back" to the text and yields tokens
+## Running the Example
-## Unified serve
+1. Launch all three services using a single command:
-1. Launch all three services using a single command -
 ```bash
 cd /workspace/examples/hello_world
 dynamo serve hello_world:Frontend
 ```
-2. Send request to frontend using curl -
+The `dynamo serve` command deploys the entire service graph, automatically handling the dependencies between Frontend, Middle, and Backend services.
+2. Send request to frontend using curl:
 ```bash
 curl -X 'POST' \
@@ -58,3 +86,16 @@ curl -X 'POST' \
  "text": "test"
 }'
 ```
+## Expected Output
+When you send the request with "test" as input, the response will show how the text flows through each service:
+```
+Frontend: Middle: Backend: test-mid-back
+```
+This demonstrates how:
+1. The Frontend receives "test"
+2. The Middle service adds "-mid" to create "test-mid"
+3. The Backend service adds "-back" to create "test-mid-back"
\ No newline at end of file
--- a/examples/hello_world/hello_world.py
+++ b/examples/hello_world/hello_world.py
@@ -13,10 +13,14 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
+import logging
 from pydantic import BaseModel
 from dynamo.sdk import DYNAMO_IMAGE, api, depends, dynamo_endpoint, service
+logger = logging.getLogger(__name__)
 """
 Pipeline Architecture:
@@ -48,32 +52,27 @@ class ResponseType(BaseModel):
 @service(
-    resources={"cpu": "2"},
-    traffic={"timeout": 30},
    dynamo={
        "enabled": True,
        "namespace": "inference",
    },
-    workers=3,
    image=DYNAMO_IMAGE,
 )
 class Backend:
    def __init__(self) -> None:
-        print("Starting backend")
+        logger.info("Starting backend")
    @dynamo_endpoint()
    async def generate(self, req: RequestType):
        """Generate tokens."""
        req_text = req.text
-        print(f"Backend received: {req_text}")
+        logger.info(f"Backend received: {req_text}")
        text = f"{req_text}-back"
        for token in text.split():
            yield f"Backend: {token}"
 @service(
-    resources={"cpu": "2"},
-    traffic={"timeout": 30},
    dynamo={"enabled": True, "namespace": "inference"},
    image=DYNAMO_IMAGE,
 )
@@ -81,23 +80,21 @@ class Middle:
    backend = depends(Backend)
    def __init__(self) -> None:
-        print("Starting middle")
+        logger.info("Starting middle")
    @dynamo_endpoint()
    async def generate(self, req: RequestType):
        """Forward requests to backend."""
        req_text = req.text
-        print(f"Middle received: {req_text}")
+        logger.info(f"Middle received: {req_text}")
        text = f"{req_text}-mid"
        next_request = RequestType(text=text).model_dump_json()
        async for response in self.backend.generate(next_request):
-            print(f"Middle received response: {response}")
+            logger.info(f"Middle received response: {response}")
            yield f"Middle: {response}"
 @service(
-    resources={"cpu": "1"},
-    traffic={"timeout": 60},
    image=DYNAMO_IMAGE,
 )  # Regular HTTP API
 class Frontend:

--- a/examples/llm/README.md
+++ b/examples/llm/README.md
@@ -157,9 +157,8 @@ See [multinode-examples.md](multinode-examples.md) for more details.
 ### Close deployment
-Kill all dynamo processes managed by circusd.
+> [!IMPORTANT]
+> We are aware of an issue where vLLM subprocesses might not be killed when `ctrl-c` is pressed.
+> We are working on addressing this. Relevant vLLM issues can be found [here](https://github.com/vllm-project/vllm/pull/8492) and [here](https://github.com/vllm-project/vllm/issues/6219#issuecomment-2439257824).
-```
+To stop the serve, you can press `ctrl-c` which will kill the different components. In order to kill the remaining vLLM subprocesses you can run `nvidia-smi` and `kill -9` the remaining processes or run `pkill python3` from inside of the container.
-ctrl-c
\ No newline at end of file
-pkill python3
-```