Commit cdcdce96 authored by Ryan Olson's avatar Ryan Olson Committed by GitHub
Browse files

docs: readme + examples (#116)


Co-authored-by: default avataraflowers <aflowers@nvidia.com>
parent c9efcce6
......@@ -14,36 +14,103 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# triton-distributed
## Overview
# Triton Distributed Runtime
This repository contains the core components of a distributed GenAI inference framework written in Rust.
<h4>A Datacenter Scale Distributed Inference Serving Framework</h4>
Features:
- A High level API defined in [component.rs](src/component.rs) to build distributed applications.
- A [Distributed runtime](src/distributed.rs) to manage the distributed execution of the inference graph.
- Uses [NATS](src/transports/nats.rs) for component communication and [etcd](src/transports/etcd.rs) for service discovery, allowing engines to be distributed across multiple nodes while maintaining a unified processing graph.
- Modular design makes it easy to build inference pipelines by composing reusable components.
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
## Install Rust:
Rust implementation of the Triton distributed runtime system, enabling distributed computing capabilities for machine learning workloads.
## 🛠️ Prerequisites
### Install Rust and Cargo using [rustup](https://rustup.rs/):
```bash
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```
[Rust Installation Guide](https://www.rust-lang.org/tools/install)
## Rust Build
### Build
```bash
```
cargo build
cargo test
```
## Run Tests
### Start Dependencies
```bash
cargo test
#### Docker Compose
The simplest way to deploy the pre-requisite services is using
[docker-compose](https://docs.docker.com/compose/install/linux/),
defined in the project's root [docker-compose.yml](docker-compose.yml).
```
docker-compose up -d
```
This will deploy a [NATS.io](https://nats.io/) server and an [etcd](https://etcd.io/)
server used to communicate between and discover components at runtime.
#### Local (alternate)
To deploy the pre-requisite services locally instead of using `docker-compose`
above, you can manually launch each:
- [NATS.io](https://docs.nats.io/running-a-nats-service/introduction/installation) server with [Jetstream](https://docs.nats.io/nats-concepts/jetstream)
- example: `nats-server -js --trace`
- [etcd](https://etcd.io) server
- follow instructions in [etcd installation](https://etcd.io/docs/v3.5/install/) to start an `etcd-server` locally
### Run Examples
When developing or running examples, any process or user that shared your core-services (`etcd` and `nats.io`) will
be operating within your distributed runtime.
The current examples use a hard-coded `namespace`. We will address the `namespace` collisions in this
[issue](https://github.com/triton-inference-server/triton_distributed/issues/114).
All examples require the `etcd` and `nats.io` pre-requisites to be running and available.
#### Rust `hello_world`
With two terminals open, in one window:
```
cd examples/hello_world
cargo run --bin server
```
In the second terminal, execute:
```
cd examples/hello_world
cargo run --bin client
```
which should yield some output similar to:
```
Finished `dev` profile [unoptimized + debuginfo] target(s) in 6.25s
Running `target/debug/client`
Annotated { data: Some("h"), id: None, event: None, comment: None }
Annotated { data: Some("e"), id: None, event: None, comment: None }
Annotated { data: Some("l"), id: None, event: None, comment: None }
Annotated { data: Some("l"), id: None, event: None, comment: None }
Annotated { data: Some("o"), id: None, event: None, comment: None }
Annotated { data: Some(" "), id: None, event: None, comment: None }
Annotated { data: Some("w"), id: None, event: None, comment: None }
Annotated { data: Some("o"), id: None, event: None, comment: None }
Annotated { data: Some("r"), id: None, event: None, comment: None }
Annotated { data: Some("l"), id: None, event: None, comment: None }
Annotated { data: Some("d"), id: None, event: None, comment: None }
```
#### Python
See the [README.md](./python-wheel/README.md) for details
The Python and Rust `hello_world` client and server examples are interchangeable,
so you can start the Python `server.py` and talk to it from the Rust `client`.
# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
services:
nats-server:
image: nats
command: [ "-js", "--trace" ]
ports:
- 4222:4222
- 6222:6222
- 8222:8222
etcd-server:
image: bitnami/etcd
environment:
- ALLOW_NONE_AUTHENTICATION=yes
ports:
- 2379:2379
- 2380:2380
This diff is collapsed.
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
[package]
name = "hello_world"
version = "0.1.0"
edition = "2021"
[dependencies]
triton-distributed = { path = "../../" }
# third-party
env_logger = "0.11.6"
// SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use hello_world::DEFAULT_NAMESPACE;
use triton_distributed::{
protocols::annotated::Annotated, stream::StreamExt, DistributedRuntime, Result, Runtime, Worker,
};
fn main() -> Result<()> {
env_logger::init();
let worker = Worker::from_settings()?;
worker.execute(app)
}
async fn app(runtime: Runtime) -> Result<()> {
let distributed = DistributedRuntime::from_settings(runtime.clone()).await?;
let client = distributed
.namespace(DEFAULT_NAMESPACE)?
.component("backend")?
.endpoint("generate")
.client::<String, Annotated<String>>()
.await?;
client.wait_for_endpoints().await?;
let mut stream = client.random("hello world".to_string().into()).await?;
while let Some(resp) = stream.next().await {
println!("{:?}", resp);
}
runtime.shutdown();
Ok(())
}
// SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use hello_world::DEFAULT_NAMESPACE;
use std::sync::Arc;
use triton_distributed::{
pipeline::{
async_trait, network::Ingress, AsyncEngine, AsyncEngineContextProvider, Error, ManyOut,
ResponseStream, SingleIn,
},
protocols::annotated::Annotated,
stream, DistributedRuntime, Result, Runtime, Worker,
};
fn main() -> Result<()> {
env_logger::init();
let worker = Worker::from_settings()?;
worker.execute(app)
}
async fn app(runtime: Runtime) -> Result<()> {
let distributed = DistributedRuntime::from_settings(runtime.clone()).await?;
backend(distributed).await
}
struct RequestHandler {}
impl RequestHandler {
fn new() -> Arc<Self> {
Arc::new(Self {})
}
}
#[async_trait]
impl AsyncEngine<SingleIn<String>, ManyOut<Annotated<String>>, Error> for RequestHandler {
async fn generate(&self, input: SingleIn<String>) -> Result<ManyOut<Annotated<String>>> {
let (data, ctx) = input.into_parts();
let chars = data
.chars()
.map(|c| Annotated::from_data(c.to_string()))
.collect::<Vec<_>>();
let stream = stream::iter(chars);
Ok(ResponseStream::new(Box::pin(stream), ctx.context()))
}
}
async fn backend(runtime: DistributedRuntime) -> Result<()> {
// attach an ingress to an engine
let ingress = Ingress::for_engine(RequestHandler::new())?;
// // make the ingress discoverable via a component service
// // we must first create a service, then we can attach one more more endpoints
runtime
.namespace(DEFAULT_NAMESPACE)?
.component("backend")?
.service_builder()
.create()
.await?
.endpoint("generate")
.endpoint_builder()
.handler(ingress)
.start()
.await
}
// SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
pub const DEFAULT_NAMESPACE: &str = "triton-init";
<!--
SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Triton Distributed Python Bindings
Python bindings for the Triton distributed runtime system, enabling distributed computing capabilities for machine learning workloads.
## 🚀 Quick Start
1. Install `uv`: https://docs.astral.sh/uv/#getting-started
```
curl -LsSf https://astral.sh/uv/install.sh | sh
```
2. Install `protoc` protobuf compiler: https://grpc.io/docs/protoc-installation/.
For example on an Ubuntu/Debian system:
```
apt install protobuf-compiler
```
3. Setup a virtualenv
```
cd python-wheels/triton-distributed
uv venv
source .venv/bin/activate
uv pip install maturin
```
4. Build and install triton_distributed wheel
```
maturin develop --uv
```
# Run Examples
## Pre-requisite
See [README.md](../README.md).
## Hello World Example
1. Start 3 separate shells, and activate the virtual environment in each
```
cd python-wheels/triton-distributed
source .venv/bin/activate
```
2. In one shell (shell 1), run example server the instance-1
```
python3 ./examples/hello_world/server.py
```
3. (Optional) In another shell (shell 2), run example the server instance-2
```
python3 ./examples/hello_world/server.py
```
4. In the last shell (shell 3), run the example client:
```
python3 ./examples/hello_world/client.py
```
If you run the example client in rapid succession, and you started more than
one server instance above, you should see the requests from the client being
distributed across the server instances in each server's output. If only one
server instance is started, you should see the requests go to that server
each time.
......@@ -40,6 +40,7 @@ pub mod worker;
pub mod distributed;
pub use futures::stream;
pub use tokio_util::sync::CancellationToken;
pub use worker::Worker;
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment