Commit b4d56a57 authored by Dmitry Tokarev's avatar Dmitry Tokarev Committed by GitHub
Browse files

chore: Renamed Triton Distributed to Dynamo (#56)

parent dd7646ef
...@@ -25,20 +25,20 @@ Contributions intended to add significant new functionality must ...@@ -25,20 +25,20 @@ Contributions intended to add significant new functionality must
follow a more collaborative path described in the following follow a more collaborative path described in the following
points. Before submitting a large PR that adds a major enhancement or points. Before submitting a large PR that adds a major enhancement or
extension, be sure to submit a GitHub issue that describes the extension, be sure to submit a GitHub issue that describes the
proposed change so that the Triton team can provide feedback. proposed change so that the Dynamo team can provide feedback.
- As part of the GitHub issue discussion, a design for your change - As part of the GitHub issue discussion, a design for your change
will be agreed upon. An up-front design discussion is required to will be agreed upon. An up-front design discussion is required to
ensure that your enhancement is done in a manner that is consistent ensure that your enhancement is done in a manner that is consistent
with Triton Distributed's overall architecture. with Dynamo's overall architecture.
- The Triton Distributed project is spread across multiple GitHub Repositories. - The Dynamo project is spread across multiple GitHub Repositories.
The Triton team will provide guidance about how and where your enhancement The Dynamo team will provide guidance about how and where your enhancement
should be implemented. should be implemented.
- Testing is a critical part of any Triton - Testing is a critical part of any Dynamo
enhancement. You should plan on spending significant time on enhancement. You should plan on spending significant time on
creating tests for your change. The Triton team will help you to creating tests for your change. The Dynamo team will help you to
design your testing so that it is compatible with existing testing design your testing so that it is compatible with existing testing
infrastructure. infrastructure.
...@@ -75,7 +75,7 @@ proposed change so that the Triton team can provide feedback. ...@@ -75,7 +75,7 @@ proposed change so that the Triton team can provide feedback.
- Make sure all tests pass. - Make sure all tests pass.
- Triton Distributed's default build assumes recent versions of - Dynamo's default build assumes recent versions of
dependencies (CUDA, TensorFlow, PyTorch, TensorRT, dependencies (CUDA, TensorFlow, PyTorch, TensorRT,
etc.). Contributions that add compatibility with older versions of etc.). Contributions that add compatibility with older versions of
those dependencies will be considered, but NVIDIA cannot guarantee those dependencies will be considered, but NVIDIA cannot guarantee
...@@ -85,7 +85,7 @@ proposed change so that the Triton team can provide feedback. ...@@ -85,7 +85,7 @@ proposed change so that the Triton team can provide feedback.
- Make sure that you can contribute your work to open source (no - Make sure that you can contribute your work to open source (no
license and/or patent conflict is introduced by your code). license and/or patent conflict is introduced by your code).
You must certify compliance with the You must certify compliance with the
[license terms](https://github.com/triton-inference-server/triton-distributed/blob/main/LICENSE) [license terms](https://github.com/ai-dynamo/dynamo/blob/main/LICENSE)
and sign off on the [Developer Certificate of Origin (DCO)](https://developercertificate.org) and sign off on the [Developer Certificate of Origin (DCO)](https://developercertificate.org)
described below before your pull request (PR) can be merged. described below before your pull request (PR) can be merged.
...@@ -96,7 +96,7 @@ proposed change so that the Triton team can provide feedback. ...@@ -96,7 +96,7 @@ proposed change so that the Triton team can provide feedback.
All pull requests are checked against the All pull requests are checked against the
[pre-commit hooks](https://github.com/pre-commit/pre-commit-hooks) [pre-commit hooks](https://github.com/pre-commit/pre-commit-hooks)
located [in the repository's top-level .pre-commit-config.yaml](https://github.com/triton-inference-server/triton-distributed/blob/main/.pre-commit-config.yaml). located [in the repository's top-level .pre-commit-config.yaml](https://github.com/ai-dynamo/dynamo/blob/main/.pre-commit-config.yaml).
The hooks do some sanity checking like linting and formatting. The hooks do some sanity checking like linting and formatting.
These checks must pass to merge a change. These checks must pass to merge a change.
...@@ -123,7 +123,7 @@ Also you can use vscode extension [GitHub Local Actions](https://marketplace.vis ...@@ -123,7 +123,7 @@ Also you can use vscode extension [GitHub Local Actions](https://marketplace.vis
# Developer Certificate of Origin # Developer Certificate of Origin
Triton Distributed is an open source product released under Dynamo is an open source product released under
the Apache 2.0 license (see either the Apache 2.0 license (see either
[the Apache site](https://www.apache.org/licenses/LICENSE-2.0) or [the Apache site](https://www.apache.org/licenses/LICENSE-2.0) or
the [LICENSE file](./LICENSE)). The Apache 2.0 license allows you the [LICENSE file](./LICENSE)). The Apache 2.0 license allows you
...@@ -177,7 +177,7 @@ By making a contribution to this project, I certify that: ...@@ -177,7 +177,7 @@ By making a contribution to this project, I certify that:
this project or the open source license(s) involved. this project or the open source license(s) involved.
``` ```
We require that every contribution to Triton Distributed is signed with We require that every contribution to Dynamo is signed with
a Developer Certificate of Origin. Additionally, please use your real name. a Developer Certificate of Origin. Additionally, please use your real name.
We do not accept anonymous contributors nor those utilizing pseudonyms. We do not accept anonymous contributors nor those utilizing pseudonyms.
......
...@@ -63,7 +63,7 @@ TENSORRTLLM_BASE_IMAGE_TAG=${TENSORRTLLM_BASE_VERSION}-trtllm-python-py3 ...@@ -63,7 +63,7 @@ TENSORRTLLM_BASE_IMAGE_TAG=${TENSORRTLLM_BASE_VERSION}-trtllm-python-py3
# used in the base image above. # used in the base image above.
TENSORRTLLM_BACKEND_REPO_TAG=triton-llm/v0.17.0 TENSORRTLLM_BACKEND_REPO_TAG=triton-llm/v0.17.0
# Set this as 1 to rebuild and replace trtllm backend bits in the container. # Set this as 1 to rebuild and replace trtllm backend bits in the container.
# This will allow building triton distributed container image with custom # This will allow building Dynamo container image with custom
# trt-llm backend repo branch. # trt-llm backend repo branch.
TENSORRTLLM_BACKEND_REBUILD=0 TENSORRTLLM_BACKEND_REBUILD=0
# Set this as 1 to skip cloning the trt-llm backend repo. If cloning is skipped, trt-llm # Set this as 1 to skip cloning the trt-llm backend repo. If cloning is skipped, trt-llm
...@@ -247,7 +247,7 @@ get_options() { ...@@ -247,7 +247,7 @@ get_options() {
fi fi
if [ -z "$TAG" ]; then if [ -z "$TAG" ]; then
TAG="--tag triton-distributed:${VERSION}-${FRAMEWORK,,}" TAG="--tag dynamo:${VERSION}-${FRAMEWORK,,}"
if [ ! -z ${TARGET} ]; then if [ ! -z ${TARGET} ]; then
TAG="${TAG}-${TARGET}" TAG="${TAG}-${TARGET}"
fi fi
...@@ -265,7 +265,7 @@ get_options() { ...@@ -265,7 +265,7 @@ get_options() {
show_image_options() { show_image_options() {
echo "" echo ""
echo "Building Triton Distributed Image: '${TAG}'" echo "Building Dynamo Image: '${TAG}'"
echo "" echo ""
echo " Base: '${BASE_IMAGE}'" echo " Base: '${BASE_IMAGE}'"
echo " Base_Image_Tag: '${BASE_IMAGE_TAG}'" echo " Base_Image_Tag: '${BASE_IMAGE_TAG}'"
...@@ -340,7 +340,7 @@ if [ ! -z ${HF_TOKEN} ]; then ...@@ -340,7 +340,7 @@ if [ ! -z ${HF_TOKEN} ]; then
BUILD_ARGS+=" --build-arg HF_TOKEN=${HF_TOKEN} " BUILD_ARGS+=" --build-arg HF_TOKEN=${HF_TOKEN} "
fi fi
LATEST_TAG="--tag triton-distributed:latest-${FRAMEWORK,,}" LATEST_TAG="--tag dynamo:latest-${FRAMEWORK,,}"
if [ ! -z ${TARGET} ]; then if [ ! -z ${TARGET} ]; then
LATEST_TAG="${LATEST_TAG}-${TARGET}" LATEST_TAG="${LATEST_TAG}-${TARGET}"
fi fi
......
...@@ -178,7 +178,7 @@ get_options() { ...@@ -178,7 +178,7 @@ get_options() {
fi fi
if [ -z "$IMAGE" ]; then if [ -z "$IMAGE" ]; then
IMAGE="triton-distributed:latest-${FRAMEWORK,,}" IMAGE="dynamo:latest-${FRAMEWORK,,}"
if [ ! -z ${TARGET} ]; then if [ ! -z ${TARGET} ]; then
IMAGE="${IMAGE}-${TARGET}" IMAGE="${IMAGE}-${TARGET}"
fi fi
......
...@@ -15,9 +15,9 @@ See the License for the specific language governing permissions and ...@@ -15,9 +15,9 @@ See the License for the specific language governing permissions and
limitations under the License. limitations under the License.
--> -->
# TensorRT-LLM Integration with Triton Distributed # TensorRT-LLM Integration with Dynamo
This example demonstrates how to use Triton Distributed to serve large language models with the tensorrt_llm engine, enabling efficient model serving with both monolithic and disaggregated deployment options. This example demonstrates how to use Dynamo to serve large language models with the tensorrt_llm engine, enabling efficient model serving with both monolithic and disaggregated deployment options.
## Prerequisites ## Prerequisites
...@@ -58,7 +58,7 @@ python3 scripts/build_wheel.py --clean --trt_root /usr/local/tensorrt -a native ...@@ -58,7 +58,7 @@ python3 scripts/build_wheel.py --clean --trt_root /usr/local/tensorrt -a native
cp build/tensorrt_llm-*.whl /home cp build/tensorrt_llm-*.whl /home
``` ```
- Build the Triton Distributed container - Build the Dynamo container
```bash ```bash
# Build image # Build image
./container/build.sh --base-image gitlab-master.nvidia.com:5005/dl/dgx/tritonserver/tensorrt-llm/amd64 --base-image-tag krish-fix-trtllm-build.23766174 ./container/build.sh --base-image gitlab-master.nvidia.com:5005/dl/dgx/tritonserver/tensorrt-llm/amd64 --base-image-tag krish-fix-trtllm-build.23766174
...@@ -73,7 +73,7 @@ Alternatively, you can build with latest tensorrt_llm pipeline like below: ...@@ -73,7 +73,7 @@ Alternatively, you can build with latest tensorrt_llm pipeline like below:
## Launching the Environment ## Launching the Environment
``` ```
# Run image interactively from with the triton distributed root directory. # Run image interactively from with the Dynamo root directory.
./container/run.sh --framework TENSORRTLLM -it -v /home/:/home/ ./container/run.sh --framework TENSORRTLLM -it -v /home/:/home/
# Install the TRT-LLM wheel. No need to do this if you are using the latest tensorrt_llm image. # Install the TRT-LLM wheel. No need to do this if you are using the latest tensorrt_llm image.
...@@ -306,7 +306,7 @@ export ETCD_ENDPOINTS="http://node1:2379,http://node2:2379" ...@@ -306,7 +306,7 @@ export ETCD_ENDPOINTS="http://node1:2379,http://node2:2379"
3. Launch the workers from node1 or login node. WORLD_SIZE is similar to single node deployment. 3. Launch the workers from node1 or login node. WORLD_SIZE is similar to single node deployment.
```bash ```bash
srun --mpi pmix -N NUM_NODES --ntasks WORLD_SIZE --ntasks-per-node=WORLD_SIZE --no-container-mount-home --overlap --container-image IMAGE --output batch_%x_%j.log --err batch_%x_%j.err --container-mounts PATH_TO_TRITON_DISTRIBUTED:/workspace --container-env=NATS_SERVER,ETCD_ENDPOINTS bash -c 'cd /workspace/examples/python_rs/llm/tensorrt_llm && python3 -m disaggregated.worker --engine_args llm_api_config.yaml -c disaggregated/llmapi_disaggregated_configs/multi_node_config.yaml' & srun --mpi pmix -N NUM_NODES --ntasks WORLD_SIZE --ntasks-per-node=WORLD_SIZE --no-container-mount-home --overlap --container-image IMAGE --output batch_%x_%j.log --err batch_%x_%j.err --container-mounts PATH_TO_DYNAMO:/workspace --container-env=NATS_SERVER,ETCD_ENDPOINTS bash -c 'cd /workspace/examples/python_rs/llm/tensorrt_llm && python3 -m disaggregated.worker --engine_args llm_api_config.yaml -c disaggregated/llmapi_disaggregated_configs/multi_node_config.yaml' &
``` ```
Once the workers are launched, you should see the output similar to the following in the worker logs. Once the workers are launched, you should see the output similar to the following in the worker logs.
...@@ -323,7 +323,7 @@ Once the workers are launched, you should see the output similar to the followin ...@@ -323,7 +323,7 @@ Once the workers are launched, you should see the output similar to the followin
4. Launch the router from node1 or login node. 4. Launch the router from node1 or login node.
```bash ```bash
srun --mpi pmix -N 1 --ntasks 1 --ntasks-per-node=1 --overlap --container-image IMAGE --output batch_router_%x_%j.log --err batch_router_%x_%j.err --container-mounts PATH_TO_TRITON_DISTRIBUTED:/workspace --container-env=NATS_SERVER,ETCD_ENDPOINTS bash -c 'cd /workspace/examples/python_rs/llm/tensorrt_llm && python3 -m disaggregated.router' & srun --mpi pmix -N 1 --ntasks 1 --ntasks-per-node=1 --overlap --container-image IMAGE --output batch_router_%x_%j.log --err batch_router_%x_%j.err --container-mounts PATH_TO_DYNAMO:/workspace --container-env=NATS_SERVER,ETCD_ENDPOINTS bash -c 'cd /workspace/examples/python_rs/llm/tensorrt_llm && python3 -m disaggregated.router' &
``` ```
5. Send requests to the router. 5. Send requests to the router.
......
...@@ -15,9 +15,9 @@ See the License for the specific language governing permissions and ...@@ -15,9 +15,9 @@ See the License for the specific language governing permissions and
limitations under the License. limitations under the License.
--> -->
# vLLM Integration with Triton Distributed # vLLM Integration with Dynamo
This example demonstrates how to use Triton Distributed to serve large language models with the vLLM engine, enabling efficient model serving with both monolithic and disaggregated deployment options. This example demonstrates how to use Dynamo to serve large language models with the vLLM engine, enabling efficient model serving with both monolithic and disaggregated deployment options.
## Prerequisites ## Prerequisites
...@@ -38,7 +38,7 @@ Start required services (etcd and NATS): ...@@ -38,7 +38,7 @@ Start required services (etcd and NATS):
## Building the Environment ## Building the Environment
The example is designed to run in a containerized environment using Triton Distributed, vLLM, and associated dependencies. To build the container: The example is designed to run in a containerized environment using Dynamo, vLLM, and associated dependencies. To build the container:
```bash ```bash
# Build image # Build image
......
...@@ -15,9 +15,9 @@ See the License for the specific language governing permissions and ...@@ -15,9 +15,9 @@ See the License for the specific language governing permissions and
limitations under the License. limitations under the License.
--> -->
# Triton Distributed Python Bindings # Dynamo Python Bindings
Python bindings for the Triton distributed runtime system, enabling distributed computing capabilities for machine learning workloads. Python bindings for the Dynamo runtime system, enabling distributed computing capabilities for machine learning workloads.
## 🚀 Quick Start ## 🚀 Quick Start
...@@ -56,7 +56,7 @@ See [README.md](/lib/runtime/README.md). ...@@ -56,7 +56,7 @@ See [README.md](/lib/runtime/README.md).
1. Start 3 separate shells, and activate the virtual environment in each 1. Start 3 separate shells, and activate the virtual environment in each
``` ```
cd python-wheels/triton-distributed cd python-wheels/dynamo
source .venv/bin/activate source .venv/bin/activate
``` ```
......
...@@ -15,13 +15,13 @@ See the License for the specific language governing permissions and ...@@ -15,13 +15,13 @@ See the License for the specific language governing permissions and
limitations under the License. limitations under the License.
--> -->
# Triton Distributed Runtime # Dynamo Runtime
<h4>A Datacenter Scale Distributed Inference Serving Framework</h4> <h4>A Datacenter Scale Distributed Inference Serving Framework</h4>
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
Rust implementation of the Triton distributed runtime system, enabling distributed computing capabilities for machine learning workloads. Rust implementation of the Dynamo runtime system, enabling distributed computing capabilities for machine learning workloads.
## 🛠️ Prerequisites ## 🛠️ Prerequisites
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment