Unverified Commit 03360b84 authored by dagil-nvidia's avatar dagil-nvidia Committed by GitHub
Browse files

docs: remove duplicate H1 headings from Fern pages (#6410)


Signed-off-by: default avatarDan Gil <dagil@nvidia.com>
parent 01ecc8c7
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: NIXL Connect API title: NIXL Connect API
--- ---
# Dynamo NIXL Connect
Dynamo NIXL Connect specializes in moving data between models/workers in a Dynamo Graph, and for the use cases where registration and memory regions need to be dynamic. Dynamo NIXL Connect specializes in moving data between models/workers in a Dynamo Graph, and for the use cases where registration and memory regions need to be dynamic.
Dynamo connect provides utilities for such use cases, using the NIXL-based I/O subsystem via a set of Python classes. Dynamo connect provides utilities for such use cases, using the NIXL-based I/O subsystem via a set of Python classes.
The relaxed registration comes with some performance overheads, but simplifies the integration process. The relaxed registration comes with some performance overheads, but simplifies the integration process.
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: Connector title: Connector
--- ---
# dynamo.nixl_connect.Connector
Core class for managing the connection between workers in a distributed environment. Core class for managing the connection between workers in a distributed environment.
Use this class to create readable and writable operations, or read and write data to remote workers. Use this class to create readable and writable operations, or read and write data to remote workers.
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: Descriptor title: Descriptor
--- ---
# dynamo.nixl_connect.Descriptor
Memory descriptor that ensures memory is registered with the NIXL-base I/O subsystem. Memory descriptor that ensures memory is registered with the NIXL-base I/O subsystem.
Memory must be registered with the NIXL subsystem to enable interaction with the memory. Memory must be registered with the NIXL subsystem to enable interaction with the memory.
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: Device Kind title: Device Kind
--- ---
# dynamo.nixl_connect.DeviceKind(IntEnum)
Represents the kind of device a [`Device`](device.md) object represents. Represents the kind of device a [`Device`](device.md) object represents.
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: Device title: Device
--- ---
# dynamo.nixl_connect.Device
`Device` class describes the device a given allocation resides in. `Device` class describes the device a given allocation resides in.
Usually host (`"cpu"`) or GPU (`"cuda"`) memory. Usually host (`"cpu"`) or GPU (`"cuda"`) memory.
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: Operation Status title: Operation Status
--- ---
# dynamo.nixl_connect.OperationStatus(IntEnum)
Represents the current state or status of an operation. Represents the current state or status of an operation.
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: RDMA Metadata title: RDMA Metadata
--- ---
# dynamo.nixl_connect.RdmaMetadata
A Pydantic type intended to provide JSON serialized NIXL metadata about a [`ReadableOperation`](readable-operation.md) or [`WritableOperation`](writable-operation.md) object. A Pydantic type intended to provide JSON serialized NIXL metadata about a [`ReadableOperation`](readable-operation.md) or [`WritableOperation`](writable-operation.md) object.
NIXL metadata contains detailed information about a worker process and how to access memory regions registered with the corresponding agent. NIXL metadata contains detailed information about a worker process and how to access memory regions registered with the corresponding agent.
This data is required to perform data transfers using the NIXL-based I/O subsystem. This data is required to perform data transfers using the NIXL-based I/O subsystem.
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: Read Operation title: Read Operation
--- ---
# dynamo.nixl_connect.ReadOperation
An operation which transfers data from a remote worker to the local worker. An operation which transfers data from a remote worker to the local worker.
To create the operation, NIXL metadata ([RdmaMetadata](rdma-metadata.md)) from a remote worker's [`ReadableOperation`](readable-operation.md) To create the operation, NIXL metadata ([RdmaMetadata](rdma-metadata.md)) from a remote worker's [`ReadableOperation`](readable-operation.md)
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: Readable Operation title: Readable Operation
--- ---
# dynamo.nixl_connect.ReadableOperation
An operation which enables a remote worker to read data from the local worker. An operation which enables a remote worker to read data from the local worker.
To create the operation, a set of local [`Descriptor`](descriptor.md) objects must be provided that reference memory intended to be transferred to a remote worker. To create the operation, a set of local [`Descriptor`](descriptor.md) objects must be provided that reference memory intended to be transferred to a remote worker.
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: Writable Operation title: Writable Operation
--- ---
# dynamo.nixl_connect.WritableOperation
An operation which enables a remote worker to write data to the local worker. An operation which enables a remote worker to write data to the local worker.
To create the operation, a set of local [`Descriptor`](descriptor.md) objects must be provided which reference memory intended to receive data from a remote worker. To create the operation, a set of local [`Descriptor`](descriptor.md) objects must be provided which reference memory intended to receive data from a remote worker.
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: Write Operation title: Write Operation
--- ---
# dynamo.nixl_connect.WriteOperation
An operation which transfers data from the local worker to a remote worker. An operation which transfers data from the local worker to a remote worker.
To create the operation, NIXL metadata ([RdmaMetadata](rdma-metadata.md)) from a remote worker's [`WritableOperation`](writable-operation.md) To create the operation, NIXL metadata ([RdmaMetadata](rdma-metadata.md)) from a remote worker's [`WritableOperation`](writable-operation.md)
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: SGLang title: SGLang
--- ---
# Running SGLang with Dynamo
## Use the Latest Release ## Use the Latest Release
We recommend using the latest stable release of Dynamo to avoid breaking changes: We recommend using the latest stable release of Dynamo to avoid breaking changes:
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: Diffusion title: Diffusion
--- ---
# Diffusion Models
Dynamo SGLang supports three types of diffusion-based generation: **LLM diffusion** (text generation via iterative refinement), **image diffusion** (text-to-image), and **video generation** (text-to-video). Each uses a different worker flag and handler, but all integrate with SGLang's `DiffGenerator`. Dynamo SGLang supports three types of diffusion-based generation: **LLM diffusion** (text generation via iterative refinement), **image diffusion** (text-to-image), and **video generation** (text-to-video). Each uses a different worker flag and handler, but all integrate with SGLang's `DiffGenerator`.
## Overview ## Overview
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: Disaggregation title: Disaggregation
--- ---
# SGLang Disaggregated Serving
This document explains how SGLang's disaggregated prefill-decode architecture works, both standalone and within Dynamo. This document explains how SGLang's disaggregated prefill-decode architecture works, both standalone and within Dynamo.
## Overview ## Overview
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: Examples title: Examples
--- ---
# SGLang Examples
For quick start instructions, see the [SGLang README](README.md). This document provides all deployment patterns for running SGLang with Dynamo, including LLMs, multimodal, and diffusion models, and Kubernetes deployment. For quick start instructions, see the [SGLang README](README.md). This document provides all deployment patterns for running SGLang with Dynamo, including LLMs, multimodal, and diffusion models, and Kubernetes deployment.
## Table of Contents ## Table of Contents
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: Observability title: Observability
--- ---
# SGLang Observability
This guide covers metrics, tracing, and visualization for SGLang deployments running through Dynamo. This guide covers metrics, tracing, and visualization for SGLang deployments running through Dynamo.
## Prometheus Metrics ## Prometheus Metrics
......
...@@ -5,8 +5,6 @@ title: Reference Guide ...@@ -5,8 +5,6 @@ title: Reference Guide
subtitle: Architecture, configuration, and operational details for the SGLang backend subtitle: Architecture, configuration, and operational details for the SGLang backend
--- ---
# Reference Guide
## Overview ## Overview
The SGLang backend in Dynamo uses a modular architecture where `main.py` dispatches to specialized initialization modules based on the worker type. Each worker type has its own init module, request handler, health check, and registration logic. The SGLang backend in Dynamo uses a modular architecture where `main.py` dispatches to specialized initialization modules based on the worker type. Each worker type has its own init module, request handler, health check, and registration logic.
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: TensorRT-LLM title: TensorRT-LLM
--- ---
# LLM Deployment using TensorRT-LLM
This directory contains examples and reference implementations for deploying Large Language Models (LLMs) in various configurations using TensorRT-LLM. This directory contains examples and reference implementations for deploying Large Language Models (LLMs) in various configurations using TensorRT-LLM.
## Use the Latest Release ## Use the Latest Release
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: Gemma3 Sliding Window title: Gemma3 Sliding Window
--- ---
# Gemma 3 with Variable Sliding Window Attention
This guide demonstrates how to deploy google/gemma-3-1b-it with Variable Sliding Window Attention (VSWA) using Dynamo. Since google/gemma-3-1b-it is a small model, each aggregated, decode, or prefill worker only requires one H100 GPU or one GB200 GPU. This guide demonstrates how to deploy google/gemma-3-1b-it with Variable Sliding Window Attention (VSWA) using Dynamo. Since google/gemma-3-1b-it is a small model, each aggregated, decode, or prefill worker only requires one H100 GPU or one GB200 GPU.
VSWA is a mechanism in which a model’s layers alternate between multiple sliding window sizes. An example of this is Gemma 3, which incorporates both global attention layers and sliding window layers. VSWA is a mechanism in which a model’s layers alternate between multiple sliding window sizes. An example of this is Gemma 3, which incorporates both global attention layers and sliding window layers.
......
...@@ -4,8 +4,6 @@ ...@@ -4,8 +4,6 @@
title: GPT-OSS title: GPT-OSS
--- ---
# Running gpt-oss-120b Disaggregated with TensorRT-LLM
Dynamo supports disaggregated serving of gpt-oss-120b with TensorRT-LLM. This guide demonstrates how to deploy gpt-oss-120b using disaggregated prefill/decode serving on a single B200 node with 8 GPUs, running 1 prefill worker on 4 GPUs and 1 decode worker on 4 GPUs. Dynamo supports disaggregated serving of gpt-oss-120b with TensorRT-LLM. This guide demonstrates how to deploy gpt-oss-120b using disaggregated prefill/decode serving on a single B200 node with 8 GPUs, running 1 prefill worker on 4 GPUs and 1 decode worker on 4 GPUs.
## Overview ## Overview
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment