README.md

# KServe gRPC Example

This directory contains a minimal Dynamo worker that serves a KServe-compatible
gRPC endpoint (`server.py`) and a Python client (`test_client.py`) that exercises
the endpoint using the Triton `tritonclient.grpc` API.

## Prerequisites

- The Dynamo Python bindings installed
- Client dependencies:
  - `numpy`
  - `tritonclient[grpc]`

You can install the Python dependencies into your active environment with:

```bash
uv pip install numpy tritonclient[grpc]
```

## Running the mock server

1. From the repository root, set `PYTHONPATH` so Python can locate the local
   Dynamo package:

   ```bash
   export PYTHONPATH=$(pwd)
   ```

2. Start the worker:

   ```bash
   python lib/bindings/python/examples/kserve_grpc_service/server.py
   ```

   The server registers a mock completions model named `mock_model` and listens
   on `0.0.0.0:8787`. Leave this process running while you test the endpoint.

## Sending a request with the Triton client

With the server running, invoke the example client from a separate terminal:

```bash
python lib/bindings/python/examples/kserve_grpc_service/test_client.py \
  --model mock_model \
  --prompt "Hello from Dynamo!"
```


You can override the `--host`, `--port`, and `--prompt` options as needed. The script sends an inference request over gRPC using the `InferenceServerClient` and prints the decoded `ModelInferResponse` payload. You should see the prompt `Hello from Dynamo!` successfully received and printed by the server.

## Alternative tooling

For debugging purposes you can still call the endpoint directly with
[`grpcurl`](https://github.com/fullstorydev/grpcurl) by running
`grpcurl.sh` in this directory.