Unverified Commit 03360b84 authored by dagil-nvidia's avatar dagil-nvidia Committed by GitHub
Browse files

docs: remove duplicate H1 headings from Fern pages (#6410)


Signed-off-by: default avatarDan Gil <dagil@nvidia.com>
parent 01ecc8c7
......@@ -4,8 +4,6 @@
title: NIXL Connect API
---
# Dynamo NIXL Connect
Dynamo NIXL Connect specializes in moving data between models/workers in a Dynamo Graph, and for the use cases where registration and memory regions need to be dynamic.
Dynamo connect provides utilities for such use cases, using the NIXL-based I/O subsystem via a set of Python classes.
The relaxed registration comes with some performance overheads, but simplifies the integration process.
......
......@@ -4,8 +4,6 @@
title: Connector
---
# dynamo.nixl_connect.Connector
Core class for managing the connection between workers in a distributed environment.
Use this class to create readable and writable operations, or read and write data to remote workers.
......
......@@ -4,8 +4,6 @@
title: Descriptor
---
# dynamo.nixl_connect.Descriptor
Memory descriptor that ensures memory is registered with the NIXL-base I/O subsystem.
Memory must be registered with the NIXL subsystem to enable interaction with the memory.
......
......@@ -4,8 +4,6 @@
title: Device Kind
---
# dynamo.nixl_connect.DeviceKind(IntEnum)
Represents the kind of device a [`Device`](device.md) object represents.
......
......@@ -4,8 +4,6 @@
title: Device
---
# dynamo.nixl_connect.Device
`Device` class describes the device a given allocation resides in.
Usually host (`"cpu"`) or GPU (`"cuda"`) memory.
......
......@@ -4,8 +4,6 @@
title: Operation Status
---
# dynamo.nixl_connect.OperationStatus(IntEnum)
Represents the current state or status of an operation.
......
......@@ -4,8 +4,6 @@
title: RDMA Metadata
---
# dynamo.nixl_connect.RdmaMetadata
A Pydantic type intended to provide JSON serialized NIXL metadata about a [`ReadableOperation`](readable-operation.md) or [`WritableOperation`](writable-operation.md) object.
NIXL metadata contains detailed information about a worker process and how to access memory regions registered with the corresponding agent.
This data is required to perform data transfers using the NIXL-based I/O subsystem.
......
......@@ -4,8 +4,6 @@
title: Read Operation
---
# dynamo.nixl_connect.ReadOperation
An operation which transfers data from a remote worker to the local worker.
To create the operation, NIXL metadata ([RdmaMetadata](rdma-metadata.md)) from a remote worker's [`ReadableOperation`](readable-operation.md)
......
......@@ -4,8 +4,6 @@
title: Readable Operation
---
# dynamo.nixl_connect.ReadableOperation
An operation which enables a remote worker to read data from the local worker.
To create the operation, a set of local [`Descriptor`](descriptor.md) objects must be provided that reference memory intended to be transferred to a remote worker.
......
......@@ -4,8 +4,6 @@
title: Writable Operation
---
# dynamo.nixl_connect.WritableOperation
An operation which enables a remote worker to write data to the local worker.
To create the operation, a set of local [`Descriptor`](descriptor.md) objects must be provided which reference memory intended to receive data from a remote worker.
......
......@@ -4,8 +4,6 @@
title: Write Operation
---
# dynamo.nixl_connect.WriteOperation
An operation which transfers data from the local worker to a remote worker.
To create the operation, NIXL metadata ([RdmaMetadata](rdma-metadata.md)) from a remote worker's [`WritableOperation`](writable-operation.md)
......
......@@ -4,8 +4,6 @@
title: SGLang
---
# Running SGLang with Dynamo
## Use the Latest Release
We recommend using the latest stable release of Dynamo to avoid breaking changes:
......
......@@ -4,8 +4,6 @@
title: Diffusion
---
# Diffusion Models
Dynamo SGLang supports three types of diffusion-based generation: **LLM diffusion** (text generation via iterative refinement), **image diffusion** (text-to-image), and **video generation** (text-to-video). Each uses a different worker flag and handler, but all integrate with SGLang's `DiffGenerator`.
## Overview
......
......@@ -4,8 +4,6 @@
title: Disaggregation
---
# SGLang Disaggregated Serving
This document explains how SGLang's disaggregated prefill-decode architecture works, both standalone and within Dynamo.
## Overview
......
......@@ -4,8 +4,6 @@
title: Examples
---
# SGLang Examples
For quick start instructions, see the [SGLang README](README.md). This document provides all deployment patterns for running SGLang with Dynamo, including LLMs, multimodal, and diffusion models, and Kubernetes deployment.
## Table of Contents
......
......@@ -4,8 +4,6 @@
title: Observability
---
# SGLang Observability
This guide covers metrics, tracing, and visualization for SGLang deployments running through Dynamo.
## Prometheus Metrics
......
......@@ -5,8 +5,6 @@ title: Reference Guide
subtitle: Architecture, configuration, and operational details for the SGLang backend
---
# Reference Guide
## Overview
The SGLang backend in Dynamo uses a modular architecture where `main.py` dispatches to specialized initialization modules based on the worker type. Each worker type has its own init module, request handler, health check, and registration logic.
......
......@@ -4,8 +4,6 @@
title: TensorRT-LLM
---
# LLM Deployment using TensorRT-LLM
This directory contains examples and reference implementations for deploying Large Language Models (LLMs) in various configurations using TensorRT-LLM.
## Use the Latest Release
......
......@@ -4,8 +4,6 @@
title: Gemma3 Sliding Window
---
# Gemma 3 with Variable Sliding Window Attention
This guide demonstrates how to deploy google/gemma-3-1b-it with Variable Sliding Window Attention (VSWA) using Dynamo. Since google/gemma-3-1b-it is a small model, each aggregated, decode, or prefill worker only requires one H100 GPU or one GB200 GPU.
VSWA is a mechanism in which a model’s layers alternate between multiple sliding window sizes. An example of this is Gemma 3, which incorporates both global attention layers and sliding window layers.
......
......@@ -4,8 +4,6 @@
title: GPT-OSS
---
# Running gpt-oss-120b Disaggregated with TensorRT-LLM
Dynamo supports disaggregated serving of gpt-oss-120b with TensorRT-LLM. This guide demonstrates how to deploy gpt-oss-120b using disaggregated prefill/decode serving on a single B200 node with 8 GPUs, running 1 prefill worker on 4 GPUs and 1 decode worker on 4 GPUs.
## Overview
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment