docs: readme + examples (#116)

Co-authored-by: aflowers <aflowers@nvidia.com>

docs: readme + examples (#116)
Co-authored-by: aflowers <aflowers@nvidia.com>
cdcdce96 · Ryan Olson · GitHub · c9efcce6 · cdcdce96 · cdcdce96
Commit cdcdce96 authored Feb 06, 2025 by Ryan Olson Committed by GitHub Feb 06, 2025
9 changed files
--- a/runtime/rust/README.md
+++ b/runtime/rust/README.md
@@ -14,36 +14,103 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 -->
-# triton-distributed

-## Overview
+# Triton Distributed Runtime

-This repository contains the core components of a distributed GenAI inference framework written in Rust.
+<h4>A Datacenter Scale Distributed Inference Serving Framework</h4>

-Features:
- A High level API defined in [component.rs](src/component.rs) to build distributed applications.
- A [Distributed runtime](src/distributed.rs) to manage the distributed execution of the inference graph.
- Uses [NATS](src/transports/nats.rs) for component communication and [etcd](src/transports/etcd.rs) for service discovery, allowing engines to be distributed across multiple nodes while maintaining a unified processing graph.
- Modular design makes it easy to build inference pipelines by composing reusable components.
+[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

-## Install Rust:
+Rust implementation of the Triton distributed runtime system, enabling distributed computing capabilities for machine learning workloads.
+
+## 🛠️ Prerequisites
+
+### Install Rust and Cargo using [rustup](https://rustup.rs/):

 ```bash
 curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
 ```

-[Rust Installation Guide](https://www.rust-lang.org/tools/install)
-
-## Rust Build
+### Build

-```bash
+```
 cargo build
+cargo test
 ```

-## Run Tests
+### Start Dependencies

-```bash
-cargo test
+#### Docker Compose
+
+The simplest way to deploy the pre-requisite services is using
+[docker-compose](https://docs.docker.com/compose/install/linux/),
+defined in the project's root [docker-compose.yml](docker-compose.yml).
+
+```
+docker-compose up -d
+```
+
+This will deploy a [NATS.io](https://nats.io/) server and an [etcd](https://etcd.io/)
+server used to communicate between and discover components at runtime.
+
+
+#### Local (alternate)
+
+To deploy the pre-requisite services locally instead of using `docker-compose`
+above, you can manually launch each:
+
+- [NATS.io](https://docs.nats.io/running-a-nats-service/introduction/installation) server with [Jetstream](https://docs.nats.io/nats-concepts/jetstream)
+    - example: `nats-server -js --trace`
+- [etcd](https://etcd.io) server
+    - follow instructions in [etcd installation](https://etcd.io/docs/v3.5/install/) to start an `etcd-server` locally
+
+
+### Run Examples
+
+When developing or running examples, any process or user that shared your core-services (`etcd` and `nats.io`) will
+be operating within your distributed runtime.
+
+The current examples use a hard-coded `namespace`. We will address the `namespace` collisions in this
+[issue](https://github.com/triton-inference-server/triton_distributed/issues/114).
+
+All examples require the `etcd` and `nats.io` pre-requisites to be running and available.
+
+#### Rust `hello_world`
+
+With two terminals open, in one window:
+
+```
+cd examples/hello_world
+cargo run --bin server
 ```

+In the second terminal, execute:
+
+```
+cd examples/hello_world
+cargo run --bin client
+```
+
+which should yield some output similar to:
+```
+    Finished `dev` profile [unoptimized + debuginfo] target(s) in 6.25s
+     Running `target/debug/client`
+Annotated { data: Some("h"), id: None, event: None, comment: None }
+Annotated { data: Some("e"), id: None, event: None, comment: None }
+Annotated { data: Some("l"), id: None, event: None, comment: None }
+Annotated { data: Some("l"), id: None, event: None, comment: None }
+Annotated { data: Some("o"), id: None, event: None, comment: None }
+Annotated { data: Some(" "), id: None, event: None, comment: None }
+Annotated { data: Some("w"), id: None, event: None, comment: None }
+Annotated { data: Some("o"), id: None, event: None, comment: None }
+Annotated { data: Some("r"), id: None, event: None, comment: None }
+Annotated { data: Some("l"), id: None, event: None, comment: None }
+Annotated { data: Some("d"), id: None, event: None, comment: None }
+```
+
+#### Python
+
+See the [README.md](./python-wheel/README.md) for details

+The Python and Rust `hello_world` client and server examples are interchangeable,
+so you can start the Python `server.py` and talk to it from the Rust `client`.
--- a/runtime/rust/docker-compose.yml
+++ b/runtime/rust/docker-compose.yml
+# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+services:
+  nats-server:
+    image: nats
+    command: [ "-js", "--trace" ]
+    ports:
+      - 4222:4222
+      - 6222:6222
+      - 8222:8222
+
+  etcd-server:
+    image: bitnami/etcd
+    environment:
+      - ALLOW_NONE_AUTHENTICATION=yes
+    ports:
+      - 2379:2379
+      - 2380:2380
--- a/runtime/rust/examples/hello_world/Cargo.lock
+++ b/runtime/rust/examples/hello_world/Cargo.lock
--- a/runtime/rust/examples/hello_world/Cargo.toml
+++ b/runtime/rust/examples/hello_world/Cargo.toml
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+[package]
+name = "hello_world"
+version = "0.1.0"
+edition = "2021"
+
+[dependencies]
+triton-distributed = { path = "../../" }
+
+# third-party
+env_logger = "0.11.6"
--- a/runtime/rust/examples/hello_world/src/bin/client.rs
+++ b/runtime/rust/examples/hello_world/src/bin/client.rs
+// SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// SPDX-License-Identifier: Apache-2.0
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+use hello_world::DEFAULT_NAMESPACE;
+use triton_distributed::{
+    protocols::annotated::Annotated, stream::StreamExt, DistributedRuntime, Result, Runtime, Worker,
+};
+
+fn main() -> Result<()> {
+    env_logger::init();
+    let worker = Worker::from_settings()?;
+    worker.execute(app)
+}
+
+async fn app(runtime: Runtime) -> Result<()> {
+    let distributed = DistributedRuntime::from_settings(runtime.clone()).await?;
+
+    let client = distributed
+        .namespace(DEFAULT_NAMESPACE)?
+        .component("backend")?
+        .endpoint("generate")
+        .client::<String, Annotated<String>>()
+        .await?;
+
+    client.wait_for_endpoints().await?;
+
+    let mut stream = client.random("hello world".to_string().into()).await?;
+
+    while let Some(resp) = stream.next().await {
+        println!("{:?}", resp);
+    }
+
+    runtime.shutdown();
+
+    Ok(())
+}
--- a/runtime/rust/examples/hello_world/src/bin/server.rs
+++ b/runtime/rust/examples/hello_world/src/bin/server.rs
+// SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// SPDX-License-Identifier: Apache-2.0
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+use hello_world::DEFAULT_NAMESPACE;
+use std::sync::Arc;
+use triton_distributed::{
+    pipeline::{
+        async_trait, network::Ingress, AsyncEngine, AsyncEngineContextProvider, Error, ManyOut,
+        ResponseStream, SingleIn,
+    },
+    protocols::annotated::Annotated,
+    stream, DistributedRuntime, Result, Runtime, Worker,
+};
+
+fn main() -> Result<()> {
+    env_logger::init();
+    let worker = Worker::from_settings()?;
+    worker.execute(app)
+}
+
+async fn app(runtime: Runtime) -> Result<()> {
+    let distributed = DistributedRuntime::from_settings(runtime.clone()).await?;
+    backend(distributed).await
+}
+
+struct RequestHandler {}
+
+impl RequestHandler {
+    fn new() -> Arc<Self> {
+        Arc::new(Self {})
+    }
+}
+
+#[async_trait]
+impl AsyncEngine<SingleIn<String>, ManyOut<Annotated<String>>, Error> for RequestHandler {
+    async fn generate(&self, input: SingleIn<String>) -> Result<ManyOut<Annotated<String>>> {
+        let (data, ctx) = input.into_parts();
+
+        let chars = data
+            .chars()
+            .map(|c| Annotated::from_data(c.to_string()))
+            .collect::<Vec<_>>();
+
+        let stream = stream::iter(chars);
+
+        Ok(ResponseStream::new(Box::pin(stream), ctx.context()))
+    }
+}
+
+async fn backend(runtime: DistributedRuntime) -> Result<()> {
+    // attach an ingress to an engine
+    let ingress = Ingress::for_engine(RequestHandler::new())?;
+
+    // // make the ingress discoverable via a component service
+    // // we must first create a service, then we can attach one more more endpoints
+    runtime
+        .namespace(DEFAULT_NAMESPACE)?
+        .component("backend")?
+        .service_builder()
+        .create()
+        .await?
+        .endpoint("generate")
+        .endpoint_builder()
+        .handler(ingress)
+        .start()
+        .await
+}
--- a/runtime/rust/examples/hello_world/src/lib.rs
+++ b/runtime/rust/examples/hello_world/src/lib.rs
+// SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// SPDX-License-Identifier: Apache-2.0
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+pub const DEFAULT_NAMESPACE: &str = "triton-init";
--- a/runtime/rust/python-wheel/README.md
+++ b/runtime/rust/python-wheel/README.md
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+https://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Triton Distributed Python Bindings
+
+Python bindings for the Triton distributed runtime system, enabling distributed computing capabilities for machine learning workloads.
+
+## 🚀 Quick Start
+
+1. Install `uv`: https://docs.astral.sh/uv/#getting-started
+```
+curl -LsSf https://astral.sh/uv/install.sh | sh
+```
+
+2. Install `protoc` protobuf compiler: https://grpc.io/docs/protoc-installation/.
+
+For example on an Ubuntu/Debian system:
+```
+apt install protobuf-compiler
+```
+
+3. Setup a virtualenv
+```
+cd python-wheels/triton-distributed
+uv venv
+source .venv/bin/activate
+uv pip install maturin
+```
+
+4. Build and install triton_distributed wheel
+```
+maturin develop --uv
+```
+
+# Run Examples
+
+## Pre-requisite
+
+See [README.md](../README.md).
+
+## Hello World Example
+
+1. Start 3 separate shells, and activate the virtual environment in each
+```
+cd python-wheels/triton-distributed
+source .venv/bin/activate
+```
+
+2. In one shell (shell 1), run example server the instance-1
+```
+python3 ./examples/hello_world/server.py
+```
+
+3. (Optional) In another shell (shell 2), run example the server instance-2
+```
+python3 ./examples/hello_world/server.py
+```
+
+4. In the last shell (shell 3), run the example client:
+```
+python3 ./examples/hello_world/client.py
+```
+
+If you run the example client in rapid succession, and you started more than
+one server instance above, you should see the requests from the client being
+distributed across the server instances in each server's output. If only one
+server instance is started, you should see the requests go to that server
+each time.
--- a/runtime/rust/src/lib.rs
+++ b/runtime/rust/src/lib.rs
@@ -40,6 +40,7 @@ pub mod worker;

 pub mod distributed;

+pub use futures::stream;
 pub use tokio_util::sync::CancellationToken;
 pub use worker::Worker;