docs: remove duplicate H1 headings from Fern pages (#6410)

Signed-off-by: Dan Gil <dagil@nvidia.com>

docs: remove duplicate H1 headings from Fern pages (#6410)
Signed-off-by: Dan Gil <dagil@nvidia.com>
03360b84 · dagil-nvidia · GitHub · 01ecc8c7 · 03360b84 · 03360b84
Unverified Commit 03360b84 authored Feb 25, 2026 by dagil-nvidia Committed by GitHub Feb 25, 2026
20 changed files
--- a/docs/pages/development/runtime-guide.md
+++ b/docs/pages/development/runtime-guide.md
@@ -4,8 +4,6 @@
 title: Runtime Guide
 ---

-# Dynamo Runtime
-
 <h4>A Datacenter Scale Distributed Inference Serving Framework</h4>

 [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

--- a/docs/pages/fault-tolerance/README.md
+++ b/docs/pages/fault-tolerance/README.md
@@ -5,8 +5,6 @@ title: Fault Tolerance
 subtitle: Handle failures gracefully with request migration, cancellation, and graceful shutdown
 ---

-# Fault Tolerance
-
 Dynamo provides comprehensive fault tolerance mechanisms to ensure reliable LLM inference in production deployments. This section covers the various strategies and features that enable Dynamo to handle failures gracefully and maintain service availability.

 ## Overview

--- a/docs/pages/fault-tolerance/graceful-shutdown.md
+++ b/docs/pages/fault-tolerance/graceful-shutdown.md
@@ -4,8 +4,6 @@
 title: Graceful Shutdown
 ---

-# Graceful Shutdown
-
 This document describes how Dynamo components handle shutdown signals to ensure in-flight requests complete successfully and resources are properly cleaned up.

 ## Overview

--- a/docs/pages/fault-tolerance/request-rejection.md
+++ b/docs/pages/fault-tolerance/request-rejection.md
@@ -4,8 +4,6 @@
 title: Request Rejection
 ---

-# Request Rejection (Load Shedding)
-
 This document describes how Dynamo implements request rejection to prevent system overload and maintain service stability under high load conditions.

 ## Overview

--- a/docs/pages/fault-tolerance/testing.md
+++ b/docs/pages/fault-tolerance/testing.md
@@ -4,8 +4,6 @@
 title: Testing
 ---

-# Fault Tolerance Testing
-
 This document describes the test infrastructure for validating Dynamo's fault tolerance mechanisms. The testing framework supports request cancellation, migration, etcd HA, and hardware fault injection scenarios.

 ## Overview

--- a/docs/pages/features/disaggregated-serving/README.md
+++ b/docs/pages/features/disaggregated-serving/README.md
@@ -5,8 +5,6 @@ title: Disaggregated Serving
 subtitle: Find optimal prefill/decode configuration for disaggregated serving deployments
 ---

-# Disaggregated Serving Guide
-
 [AIConfigurator](https://github.com/ai-dynamo/aiconfigurator/tree/main) is a performance optimization tool that helps you find the optimal configuration for deploying LLMs with Dynamo. It automatically determines the best number of prefill and decode workers, parallelism settings, and deployment parameters to meet your SLA targets while maximizing throughput.

 ## Why Use AIConfigurator?

--- a/docs/pages/features/lora/README.md
+++ b/docs/pages/features/lora/README.md
@@ -5,8 +5,6 @@ title: LoRA Adapters
 subtitle: Serve fine-tuned LoRA adapters with dynamic loading and routing in Dynamo
 ---

-# LoRA Adapters
-
 LoRA (Low-Rank Adaptation) enables efficient fine-tuning and serving of specialized model variants without duplicating full model weights. Dynamo provides built-in support for dynamic LoRA adapter loading, caching, and inference routing.

 ## Backend Support

--- a/docs/pages/features/multimodal/README.md
+++ b/docs/pages/features/multimodal/README.md
@@ -5,8 +5,6 @@ title: Multimodality Support
 subtitle: Deploy multimodal models with image, video, and audio support in Dynamo
 ---

-# Multimodal Inference in Dynamo
-
 Dynamo supports multimodal inference across multiple LLM backends, enabling models to process images, video, and audio alongside text. This section provides comprehensive documentation for deploying multimodal models.

 > [!IMPORTANT]

--- a/docs/pages/features/multimodal/multimodal-sglang.md
+++ b/docs/pages/features/multimodal/multimodal-sglang.md
@@ -4,8 +4,6 @@
 title: SGLang Multimodal
 ---

-# SGLang Multimodal
-
 This document provides a comprehensive guide for multimodal inference using SGLang backend in Dynamo. SGLang multimodal supports **EPD**, **E/PD**, and **E/P/D** flows, with NIXL (RDMA) for zero-copy tensor transfer in disaggregated modes.

 ## Support Matrix

--- a/docs/pages/features/multimodal/multimodal-trtllm.md
+++ b/docs/pages/features/multimodal/multimodal-trtllm.md
@@ -4,8 +4,6 @@
 title: TensorRT-LLM Multimodal
 ---

-# TensorRT-LLM Multimodal
-
 This document provides a comprehensive guide for multimodal inference using TensorRT-LLM backend in Dynamo.

 You can provide multimodal inputs in the following ways:

--- a/docs/pages/features/multimodal/multimodal-vllm.md
+++ b/docs/pages/features/multimodal/multimodal-vllm.md
@@ -4,8 +4,6 @@
 title: vLLM Multimodal
 ---

-# vLLM Multimodal
-
 This document provides a comprehensive guide for multimodal inference using vLLM backend in Dynamo.

 <Warning>

--- a/docs/pages/features/speculative-decoding/README.md
+++ b/docs/pages/features/speculative-decoding/README.md
@@ -4,8 +4,6 @@
 title: Speculative Decoding
 ---

-# Speculative Decoding
-
 Speculative decoding is an optimization technique that uses a smaller "draft" model to predict multiple tokens, which are then verified by the main model in parallel. This can significantly reduce latency for autoregressive generation.

 ## Backend Support

--- a/docs/pages/features/speculative-decoding/speculative-decoding-vllm.md
+++ b/docs/pages/features/speculative-decoding/speculative-decoding-vllm.md
@@ -4,8 +4,6 @@
 title: Speculative Decoding with vLLM
 ---

-# Speculative Decoding with vLLM
-
 Using Speculative Decoding with the vLLM backend.

 > **See also**: [Speculative Decoding Overview](./README.md) for cross-backend documentation.

--- a/docs/pages/integrations/flexkv-integration.md
+++ b/docs/pages/integrations/flexkv-integration.md
@@ -4,8 +4,6 @@
 title: FlexKV
 ---

-# FlexKV Integration in Dynamo
-
 ## Introduction

 [FlexKV](https://github.com/taco-project/FlexKV) is a scalable, distributed runtime for KV cache offloading developed by Tencent Cloud's TACO team in collaboration with the community. It acts as a unified KV caching layer for inference engines like SGLang, TensorRT-LLM, and vLLM.

--- a/docs/pages/integrations/kv-events-custom-engines.md
+++ b/docs/pages/integrations/kv-events-custom-engines.md
@@ -4,8 +4,6 @@
 title: KV Events for Custom Engines
 ---

-# KV Event Publishing for Custom Engines
-
 This document explains how to implement KV event publishing for custom inference engines, enabling them to participate in Dynamo's KV cache-aware routing.

 ## Overview

--- a/docs/pages/integrations/lmcache-integration.md
+++ b/docs/pages/integrations/lmcache-integration.md
@@ -4,8 +4,6 @@
 title: LMCache
 ---

-# LMCache Integration in Dynamo
-
 ## Introduction

 LMCache is a high-performance KV cache layer that supercharges LLM serving by enabling **prefill-once, reuse-everywhere** semantics. As described in the [official documentation](https://docs.lmcache.ai/index.html), LMCache lets LLMs prefill each text only once by storing the KV caches of all reusable texts, allowing reuse of KV caches for any reused text (not necessarily prefix) across any serving engine instance.

--- a/docs/pages/integrations/sglang-hicache.md
+++ b/docs/pages/integrations/sglang-hicache.md
@@ -4,8 +4,6 @@
 title: SGLang HiCache
 ---

-# Enable SGLang Hierarchical Cache (HiCache)
-
 This guide shows how to enable SGLang's Hierarchical Cache (HiCache) inside Dynamo.

 ## 1) Start the SGLang worker with HiCache enabled

--- a/docs/pages/kubernetes/README.md
+++ b/docs/pages/kubernetes/README.md
@@ -4,8 +4,6 @@
 title: Deployment Guide
 ---

-# Deploying Dynamo on Kubernetes
-
 High-level guide to Dynamo Kubernetes deployments. Start here, then dive into specific guides.

 ## Important Terminology

--- a/docs/pages/kubernetes/chrek/dynamo.md
+++ b/docs/pages/kubernetes/chrek/dynamo.md
@@ -4,8 +4,6 @@
 title: Integration with Dynamo
 ---

-# Checkpoint/Restore for Fast Pod Startup
-
 > ⚠️ **Experimental Feature**: ChReK is currently in **beta/preview**. The ChReK DaemonSet runs in privileged mode to perform CRIU operations. See [Limitations](#limitations) for details.

 Checkpointing captures the complete state of a running worker pod (including GPU memory) and saves it to storage. New pods can restore from this checkpoint instead of performing a full cold start.

--- a/docs/pages/kubernetes/deployment/minikube.md
+++ b/docs/pages/kubernetes/deployment/minikube.md
@@ -4,8 +4,6 @@
 title: Minikube Setup
 ---

-# Minikube Setup Guide
-
 Don't have a Kubernetes cluster? No problem! You can set up a local development environment using Minikube. This guide walks through the set up of everything you need to run Dynamo Kubernetes Platform locally.

 ## 1. Install Minikube