"vscode:/vscode.git/clone" did not exist on "0e4fffbc6b28b65f894ebcc520b13cea59db369d"
Commit 548578f4 authored by Dmitry Tokarev's avatar Dmitry Tokarev Committed by GitHub
Browse files

docs: fix links in docs (#256)


Co-authored-by: default avatarAnant Sharma <anants@nvidia.com>
parent 792b747c
...@@ -41,7 +41,7 @@ The following examples require a few system level packages. ...@@ -41,7 +41,7 @@ The following examples require a few system level packages.
apt-get update apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -yq python3-dev libucx0 DEBIAN_FRONTEND=noninteractive apt-get install -yq python3-dev libucx0
pip install ai-dynamo nixl vllm==0.7.2+dynamo pip install ai-dynamo[all]
``` ```
> [!NOTE] > [!NOTE]
......
...@@ -65,7 +65,7 @@ metrics --component my_component --endpoint my_endpoint ...@@ -65,7 +65,7 @@ metrics --component my_component --endpoint my_endpoint
### Real Worker ### Real Worker
To run a more realistic deployment to gathering metrics from, To run a more realistic deployment to gathering metrics from,
see the examples in [deploy/examples/llm](deploy/examples/llm). see the examples in [examples/llm](../../examples/llm).
For example, for a VLLM + KV Routing based deployment that For example, for a VLLM + KV Routing based deployment that
exposes statistics on an endpoint labeled exposes statistics on an endpoint labeled
...@@ -88,7 +88,7 @@ endpoint name used for python-based workers that register a `KvMetricsPublisher` ...@@ -88,7 +88,7 @@ endpoint name used for python-based workers that register a `KvMetricsPublisher`
To visualize the metrics being exposed on the Prometheus endpoint, To visualize the metrics being exposed on the Prometheus endpoint,
see the Prometheus and Grafana configurations in see the Prometheus and Grafana configurations in
[deploy/metrics](deploy/metrics): [deploy/metrics](../../deploy/metrics):
```bash ```bash
docker compose -f deploy/docker-compose.yml --profile metrics up -d docker compose -f deploy/docker-compose.yml --profile metrics up -d
``` ```
......
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
# Introduction # Introduction
Dynamo is a flexible and performant distributed inferencing solution for large-scale deployments. It is an ecosystem of tools, frameworks, and abstractions that makes the design, customization, and deployment of frontier-level models onto datacenter-scale infrastructure easy to reason about and optimized for your specific inferencing workloads. Dynamo's core is written in Rust and contains a set of well-defined Python bindings. Docs and examples for those can be found [here](../../../../README.md). Dynamo is a flexible and performant distributed inferencing solution for large-scale deployments. It is an ecosystem of tools, frameworks, and abstractions that makes the design, customization, and deployment of frontier-level models onto datacenter-scale infrastructure easy to reason about and optimized for your specific inferencing workloads. Dynamo's core is written in Rust and contains a set of well-defined Python bindings. Docs and examples for those can be found [here](../../../../../README.md).
Dynamo SDK is a layer on top of the core. It is a Python framework that makes it easy to create inference graphs and deploy them locally and onto a target K8s cluster. The SDK was heavily inspired by [BentoML's](https://github.com/bentoml/BentoML) open source deployment patterns and leverages many of its core primitives. The Dynamo CLI is a companion tool that allows you to spin up an inference pipeline locally, containerize it, and deploy it. You can find a toy hello-world example [here](../../README.md). Dynamo SDK is a layer on top of the core. It is a Python framework that makes it easy to create inference graphs and deploy them locally and onto a target K8s cluster. The SDK was heavily inspired by [BentoML's](https://github.com/bentoml/BentoML) open source deployment patterns and leverages many of its core primitives. The Dynamo CLI is a companion tool that allows you to spin up an inference pipeline locally, containerize it, and deploy it. You can find a toy hello-world example [here](../../README.md).
......
...@@ -64,7 +64,7 @@ Distributed deployment where prefill and decode are done by separate workers tha ...@@ -64,7 +64,7 @@ Distributed deployment where prefill and decode are done by separate workers tha
### Prerequisites ### Prerequisites
Start required services (etcd and NATS) using [Docker Compose](/deploy/docker-compose.yml) Start required services (etcd and NATS) using [Docker Compose](../../deploy/docker-compose.yml)
```bash ```bash
docker compose -f deploy/docker-compose.yml up -d docker compose -f deploy/docker-compose.yml up -d
``` ```
......
...@@ -12,7 +12,7 @@ The Dynamo KV Cache Manager feature addresses this challenge by enabling the off ...@@ -12,7 +12,7 @@ The Dynamo KV Cache Manager feature addresses this challenge by enabling the off
The Dynamo KV Cache Manager uses advanced caching policies that prioritize placing frequently accessed data in GPU memory, while less accessed data is moved to shared CPU memory, SSDs, or networked object storage. It incorporates eviction policies that strike a balance between over-caching (which can introduce lookup latencies) and under-caching (which leads to missed lookups and KV cache re-computation). The Dynamo KV Cache Manager uses advanced caching policies that prioritize placing frequently accessed data in GPU memory, while less accessed data is moved to shared CPU memory, SSDs, or networked object storage. It incorporates eviction policies that strike a balance between over-caching (which can introduce lookup latencies) and under-caching (which leads to missed lookups and KV cache re-computation).
Additionally, this feature can manage KV cache across multiple GPU nodes, supporting both distributed and disaggregated inference serving, and offers hierarchical caching capabilities, creating offloading strategies at the GPU, node, and cluster levels. Additionally, this feature can manage KV cache across multiple GPU nodes, supporting both distributed and disaggregated inference serving, and offers hierarchical caching capabilities, creating offloading strategies at the GPU, node, and cluster levels.
The Dynamo KV Cache Manager is designed to be framework-agnostic to support various backends, including TensorRT-TLLM, vLLM, and SGLang, and to facilitate the scaling of KV cache storage across large, distributed clusters using NVLink, NVIDIA Quantum switches, and NVIDIA Spectrum switches. It integrates with [NIXL](https://github.com/ai-dynamo/nixl/blob/omrik/documentation/docs/nixl.md) to enable data transfers across different worker instances and storage backends. The Dynamo KV Cache Manager is designed to be framework-agnostic to support various backends, including TensorRT-TLLM, vLLM, and SGLang, and to facilitate the scaling of KV cache storage across large, distributed clusters using NVLink, NVIDIA Quantum switches, and NVIDIA Spectrum switches. It integrates with [NIXL](https://github.com/ai-dynamo/nixl/blob/main/docs/nixl.md) to enable data transfers across different worker instances and storage backends.
## Design ## Design
......
...@@ -64,7 +64,7 @@ sequenceDiagram ...@@ -64,7 +64,7 @@ sequenceDiagram
### Prerequisites ### Prerequisites
Start required services (etcd and NATS) using [Docker Compose](/deploy/docker-compose.yml) Start required services (etcd and NATS) using [Docker Compose](../../deploy/docker-compose.yml)
```bash ```bash
docker compose -f deploy/docker-compose.yml up -d docker compose -f deploy/docker-compose.yml up -d
``` ```
......
...@@ -77,7 +77,7 @@ E.g. https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/blob/main/Llama ...@@ -77,7 +77,7 @@ E.g. https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/blob/main/Llama
Download model file: Download model file:
``` ```
curl -L -o Llama-3.2-3B-Instruct-Q4_K_M.gguf "https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/blob/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf?download=true" curl -L -o Llama-3.2-3B-Instruct-Q4_K_M.gguf "https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf?download=true"
``` ```
## Run a model from local file ## Run a model from local file
......
...@@ -50,7 +50,7 @@ maturin develop --uv ...@@ -50,7 +50,7 @@ maturin develop --uv
## Pre-requisite ## Pre-requisite
See [README.md](/lib/runtime/README.md). See [README.md](../../runtime/README.md#️-prerequisites).
## Hello World Example ## Hello World Example
......
...@@ -44,7 +44,7 @@ cargo test ...@@ -44,7 +44,7 @@ cargo test
The simplest way to deploy the pre-requisite services is using The simplest way to deploy the pre-requisite services is using
[docker-compose](https://docs.docker.com/compose/install/linux/), [docker-compose](https://docs.docker.com/compose/install/linux/),
defined in the project's root [docker-compose.yml](docker-compose.yml). defined in [deploy/docker-compose.yml](../../deploy/docker-compose.yml).
``` ```
docker-compose up -d docker-compose up -d
...@@ -109,7 +109,7 @@ Annotated { data: Some("d"), id: None, event: None, comment: None } ...@@ -109,7 +109,7 @@ Annotated { data: Some("d"), id: None, event: None, comment: None }
#### Python #### Python
See the [README.md](/lib/bindings/python/README.md) for details See the [README.md](../bindings/python/README.md) for details
The Python and Rust `hello_world` client and server examples are interchangeable, The Python and Rust `hello_world` client and server examples are interchangeable,
so you can start the Python `server.py` and talk to it from the Rust `client`. so you can start the Python `server.py` and talk to it from the Rust `client`.
...@@ -39,12 +39,12 @@ If you are using a **GPU**, the following GPU models and architectures are suppo ...@@ -39,12 +39,12 @@ If you are using a **GPU**, the following GPU models and architectures are suppo
| **Dependency** | **Version** | | **Dependency** | **Version** |
|------------------|-------------| |------------------|-------------|
|**Base Container**| 25.01 | |**Base Container**| 25.01 |
| **vLLM** |0.7.2+dynamo*| |**ai-dynamo-vllm**| 0.7.2* |
|**TensorRT-LLM** | 0.19.0** | |**TensorRT-LLM** | 0.19.0** |
|**NIXL** | 0.1.0 | |**NIXL** | 0.1.0 |
> **Note**: > **Note**:
> - *v0.7.2+dynamo is a customized patch of v0.7.2 from vLLM. > - *ai-dynamo-vllm v0.7.2 is a customized patch of v0.7.2 from vLLM.
> - **The specific version of TensorRT-LLM (planned v0.19.0) that will be supported by Dynamo is subject to change. > - **The specific version of TensorRT-LLM (planned v0.19.0) that will be supported by Dynamo is subject to change.
...@@ -54,4 +54,4 @@ If you are using a **GPU**, the following GPU models and architectures are suppo ...@@ -54,4 +54,4 @@ If you are using a **GPU**, the following GPU models and architectures are suppo
- **Wheels**: Pre-built Python wheels are only available for **x86_64 Linux**. No wheels are available for other platforms at this time. - **Wheels**: Pre-built Python wheels are only available for **x86_64 Linux**. No wheels are available for other platforms at this time.
- **Container Images**: We distribute only the source code for container images, and only **x86_64 Linux** is supported for these. Users must build the container image from source if they require it. - **Container Images**: We distribute only the source code for container images, and only **x86_64 Linux** is supported for these. Users must build the container image from source if they require it.
Once you've confirmed that your platform and architecture are compatible, you can install **Dynamo** by following the instructions in the [Quick Start Guide](https://github.com/ai-dynamo/dynamo/?tab=readme-ov-file#quick-start). Once you've confirmed that your platform and architecture are compatible, you can install **Dynamo** by following the instructions in the [Quick Start Guide](https://github.com/ai-dynamo/dynamo/blob/main/README.md#installation).
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment