Commit c14e460c authored by ishandhanani's avatar ishandhanani Committed by GitHub
Browse files

docs: hello world and vllm process docs (#525)

parent 4b6cfc1b
...@@ -15,8 +15,17 @@ See the License for the specific language governing permissions and ...@@ -15,8 +15,17 @@ See the License for the specific language governing permissions and
limitations under the License. limitations under the License.
--> -->
# Hello World Example
## Overview ## Overview
This example demonstrates the basic concepts of Dynamo by creating a simple multi-service pipeline. It shows how to:
1. Create and connect multiple Dynamo services
2. Pass data between services using Dynamo's runtime
3. Set up a simple HTTP API endpoint
4. Deploy and interact with a Dynamo service graph
Pipeline Architecture: Pipeline Architecture:
``` ```
...@@ -38,16 +47,35 @@ Users/Clients (HTTP) ...@@ -38,16 +47,35 @@ Users/Clients (HTTP)
└─────────────┘ └─────────────┘
``` ```
## Component Descriptions
### Frontend Service
- Serves as the entry point for external HTTP requests
- Exposes a `/generate` HTTP API endpoint that clients can call
- Processes incoming text and passes it to the Middle service
### Middle Service
- Acts as an intermediary service in the pipeline
- Receives requests from the Frontend
- Appends "-mid" to the text and forwards it to the Backend
### Backend Service
- Functions as the final service in the pipeline
- Processes requests from the Middle service
- Appends "-back" to the text and yields tokens
## Running the Example
## Unified serve 1. Launch all three services using a single command:
1. Launch all three services using a single command -
```bash ```bash
cd /workspace/examples/hello_world cd /workspace/examples/hello_world
dynamo serve hello_world:Frontend dynamo serve hello_world:Frontend
``` ```
2. Send request to frontend using curl - The `dynamo serve` command deploys the entire service graph, automatically handling the dependencies between Frontend, Middle, and Backend services.
2. Send request to frontend using curl:
```bash ```bash
curl -X 'POST' \ curl -X 'POST' \
...@@ -58,3 +86,16 @@ curl -X 'POST' \ ...@@ -58,3 +86,16 @@ curl -X 'POST' \
"text": "test" "text": "test"
}' }'
``` ```
## Expected Output
When you send the request with "test" as input, the response will show how the text flows through each service:
```
Frontend: Middle: Backend: test-mid-back
```
This demonstrates how:
1. The Frontend receives "test"
2. The Middle service adds "-mid" to create "test-mid"
3. The Backend service adds "-back" to create "test-mid-back"
\ No newline at end of file
...@@ -13,10 +13,14 @@ ...@@ -13,10 +13,14 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import logging
from pydantic import BaseModel from pydantic import BaseModel
from dynamo.sdk import DYNAMO_IMAGE, api, depends, dynamo_endpoint, service from dynamo.sdk import DYNAMO_IMAGE, api, depends, dynamo_endpoint, service
logger = logging.getLogger(__name__)
""" """
Pipeline Architecture: Pipeline Architecture:
...@@ -48,32 +52,27 @@ class ResponseType(BaseModel): ...@@ -48,32 +52,27 @@ class ResponseType(BaseModel):
@service( @service(
resources={"cpu": "2"},
traffic={"timeout": 30},
dynamo={ dynamo={
"enabled": True, "enabled": True,
"namespace": "inference", "namespace": "inference",
}, },
workers=3,
image=DYNAMO_IMAGE, image=DYNAMO_IMAGE,
) )
class Backend: class Backend:
def __init__(self) -> None: def __init__(self) -> None:
print("Starting backend") logger.info("Starting backend")
@dynamo_endpoint() @dynamo_endpoint()
async def generate(self, req: RequestType): async def generate(self, req: RequestType):
"""Generate tokens.""" """Generate tokens."""
req_text = req.text req_text = req.text
print(f"Backend received: {req_text}") logger.info(f"Backend received: {req_text}")
text = f"{req_text}-back" text = f"{req_text}-back"
for token in text.split(): for token in text.split():
yield f"Backend: {token}" yield f"Backend: {token}"
@service( @service(
resources={"cpu": "2"},
traffic={"timeout": 30},
dynamo={"enabled": True, "namespace": "inference"}, dynamo={"enabled": True, "namespace": "inference"},
image=DYNAMO_IMAGE, image=DYNAMO_IMAGE,
) )
...@@ -81,23 +80,21 @@ class Middle: ...@@ -81,23 +80,21 @@ class Middle:
backend = depends(Backend) backend = depends(Backend)
def __init__(self) -> None: def __init__(self) -> None:
print("Starting middle") logger.info("Starting middle")
@dynamo_endpoint() @dynamo_endpoint()
async def generate(self, req: RequestType): async def generate(self, req: RequestType):
"""Forward requests to backend.""" """Forward requests to backend."""
req_text = req.text req_text = req.text
print(f"Middle received: {req_text}") logger.info(f"Middle received: {req_text}")
text = f"{req_text}-mid" text = f"{req_text}-mid"
next_request = RequestType(text=text).model_dump_json() next_request = RequestType(text=text).model_dump_json()
async for response in self.backend.generate(next_request): async for response in self.backend.generate(next_request):
print(f"Middle received response: {response}") logger.info(f"Middle received response: {response}")
yield f"Middle: {response}" yield f"Middle: {response}"
@service( @service(
resources={"cpu": "1"},
traffic={"timeout": 60},
image=DYNAMO_IMAGE, image=DYNAMO_IMAGE,
) # Regular HTTP API ) # Regular HTTP API
class Frontend: class Frontend:
......
...@@ -157,9 +157,8 @@ See [multinode-examples.md](multinode-examples.md) for more details. ...@@ -157,9 +157,8 @@ See [multinode-examples.md](multinode-examples.md) for more details.
### Close deployment ### Close deployment
Kill all dynamo processes managed by circusd. > [!IMPORTANT]
> We are aware of an issue where vLLM subprocesses might not be killed when `ctrl-c` is pressed.
> We are working on addressing this. Relevant vLLM issues can be found [here](https://github.com/vllm-project/vllm/pull/8492) and [here](https://github.com/vllm-project/vllm/issues/6219#issuecomment-2439257824).
``` To stop the serve, you can press `ctrl-c` which will kill the different components. In order to kill the remaining vLLM subprocesses you can run `nvidia-smi` and `kill -9` the remaining processes or run `pkill python3` from inside of the container.
ctrl-c \ No newline at end of file
pkill python3
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment