docs: migrate Fern docs from fern/ into docs/ (#6206)

Signed-off-by: Jont828 <jt572@cornell.edu>

docs: migrate Fern docs from fern/ into docs/ (#6206)
Signed-off-by: Jont828 <jt572@cornell.edu>
39d645e5 · Jonathan Tong · GitHub · d381e6ff · 39d645e5 · 39d645e5
Unverified Commit 39d645e5 authored Feb 11, 2026 by Jonathan Tong Committed by GitHub Feb 11, 2026
20 changed files
--- a/fern/pages/backends/sglang/profiling.md
+++ b/fern/pages/backends/sglang/profiling.md
--- a/fern/pages/backends/sglang/prometheus.md
+++ b/fern/pages/backends/sglang/prometheus.md
--- a/fern/pages/backends/sglang/sglang-disaggregation.md
+++ b/fern/pages/backends/sglang/sglang-disaggregation.md
--- a/fern/pages/backends/trtllm/README.md
+++ b/fern/pages/backends/trtllm/README.md
@@ -30,6 +30,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 - [Client](#client)
 - [Benchmarking](#benchmarking)
 - [Multimodal Support](#multimodal-support)
+- [Video Diffusion Support](#video-diffusion-support-experimental)
 - [Logits Processing](#logits-processing)
 - [DP Rank Routing](#dp-rank-routing-attention-data-parallelism)
 - [Performance Sweep](#performance-sweep)
@@ -80,12 +81,12 @@ docker compose -f deploy/docker-compose.yml up -d
 apt-get update && apt-get -y install git git-lfs
 # On an x86 machine:
-python container/render.py --framework sglang --output-short-filename
+python container/render.py --framework=trtllm --target=runtime --output-short-filename
-docker build -f container/rendered.Dockerfile -t dynamo:latest-trtllm .
+docker build -t dynamo:trtllm-latest -f container/rendered.Dockerfile .
 # On an ARM machine:
-python container/render.py --framework trtllm --platform arm64 --output-short-filename
+python container/render.py --framework=trtllm --target=runtime --platform=arm64 --output-short-filename
-docker build -f container/rendered.Dockerfile -t dynamo:latest-trtllm .
+docker build -t dynamo:trtllm-latest -f container/rendered.Dockerfile .
 ```
 ### Run container
@@ -208,6 +209,70 @@ To benchmark your deployment with AIPerf, see this utility script, configuring t
 Dynamo with the TensorRT-LLM backend supports multimodal models, enabling you to process both text and images (or pre-computed embeddings) in a single request. For detailed setup instructions, example requests, and best practices, see the [TensorRT-LLM Multimodal Guide](../../features/multimodal/multimodal-trtllm.md).
+## Video Diffusion Support (Experimental)
+Dynamo supports video generation using diffusion models through the `--modality video_diffusion` flag.
+### Requirements
+- **visual_gen**: Part of TensorRT-LLM, located at `tensorrt_llm/visual_gen/`. Currently available **only** on the [`feat/visual_gen`](https://github.com/NVIDIA/TensorRT-LLM/tree/feat/visual_gen/tensorrt_llm/visual_gen) branch (not yet merged to main or any release). Install from source:
+  ```bash
+  git clone https://github.com/NVIDIA/TensorRT-LLM.git
+  cd TensorRT-LLM && git checkout feat/visual_gen
+  cd tensorrt_llm/visual_gen && pip install -e .
+  ```
+- **dynamo-runtime with video API**: The Dynamo runtime must include `ModelType.Videos` support. Ensure you're using a compatible version.
+### Supported Models
+| Diffusers Pipeline | Description | Example Model |
+|--------------------|-------------|---------------|
+| `WanPipeline` | Wan 2.1/2.2 Text-to-Video | `Wan-AI/Wan2.1-T2V-1.3B-Diffusers` |
+The pipeline type is **auto-detected** from the model's `model_index.json` — no `--model-type` flag is needed.
+### Quick Start
+```bash
+python -m dynamo.trtllm \
+  --modality video_diffusion \
+  --model-path Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
+  --output-dir /tmp/videos
+```
+### API Endpoint
+Video generation uses the `/v1/videos/generations` endpoint:
+```bash
+curl -X POST http://localhost:8000/v1/videos/generations \
+  -H "Content-Type: application/json" \
+  -d '{
+    "prompt": "A cat playing piano",
+    "model": "wan_t2v",
+    "size": "832x480",
+    "seconds": 4,
+    "fps": 24
+  }'
+```
+### Configuration Options
+| Flag | Description | Default |
+|------|-------------|---------|
+| `--output-dir` | Directory for generated videos | `/tmp/dynamo_videos` |
+| `--default-height` | Default video height | `480` |
+| `--default-width` | Default video width | `832` |
+| `--default-num-frames` | Default frame count | `81` |
+| `--enable-teacache` | Enable TeaCache optimization | `False` |
+| `--disable-torch-compile` | Disable torch.compile | `False` |
+### Limitations
+- Video diffusion is experimental and not recommended for production use
+- Only text-to-video is supported in this release (image-to-video planned)
+- Requires GPU with sufficient VRAM for the diffusion model
 ## Logits Processing
 Logits processors let you modify the next-token logits at every decoding step (e.g., to apply custom constraints or sampling transforms). Dynamo provides a backend-agnostic interface and an adapter for TensorRT-LLM so you can plug in custom processors.

--- a/fern/pages/backends/trtllm/gemma3-sliding-window-attention.md
+++ b/fern/pages/backends/trtllm/gemma3-sliding-window-attention.md
--- a/fern/pages/backends/trtllm/gpt-oss.md
+++ b/fern/pages/backends/trtllm/gpt-oss.md
--- a/fern/pages/backends/trtllm/kv-cache-transfer.md
+++ b/fern/pages/backends/trtllm/kv-cache-transfer.md
--- a/fern/pages/backends/trtllm/llama4-plus-eagle.md
+++ b/fern/pages/backends/trtllm/llama4-plus-eagle.md
--- a/fern/pages/backends/trtllm/multinode/multinode-examples.md
+++ b/fern/pages/backends/trtllm/multinode/multinode-examples.md
--- a/docs/backends/trtllm/prometheus.md
+++ b/docs/backends/trtllm/prometheus.md
-<!--
+---
-SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-SPDX-License-Identifier: Apache-2.0
+# SPDX-License-Identifier: Apache-2.0
-->
+---
 # TensorRT-LLM Prometheus Metrics

--- a/fern/pages/backends/vllm/README.md
+++ b/fern/pages/backends/vllm/README.md
--- a/fern/pages/backends/vllm/deepseek-r1.md
+++ b/fern/pages/backends/vllm/deepseek-r1.md
--- a/fern/pages/backends/vllm/gpt-oss.md
+++ b/fern/pages/backends/vllm/gpt-oss.md
--- a/fern/pages/backends/vllm/multi-node.md
+++ b/fern/pages/backends/vllm/multi-node.md
--- a/fern/pages/backends/vllm/prometheus.md
+++ b/fern/pages/backends/vllm/prometheus.md
--- a/fern/pages/backends/vllm/prompt-embeddings.md
+++ b/fern/pages/backends/vllm/prompt-embeddings.md
--- a/docs/backends/vllm/vllm-omni.md
+++ b/docs/backends/vllm/vllm-omni.md
@@ -9,7 +9,7 @@ Dynamo supports omni (multimodal generation) models via the [vLLM-Omni](https://
 ## Prerequisites
-This guide assumes familiarity with deploying Dynamo with vLLM as described in [README.md](/docs/backends/vllm/README.md).
+This guide assumes familiarity with deploying Dynamo with vLLM as described in [README.md](/docs/pages/backends/vllm/README.md).
 ## Quick Start

--- a/fern/pages/benchmarks/benchmarking.md
+++ b/fern/pages/benchmarks/benchmarking.md
--- a/fern/pages/benchmarks/kv-router-ab-testing.md
+++ b/fern/pages/benchmarks/kv-router-ab-testing.md
--- a/fern/pages/components/frontend/README.md
+++ b/fern/pages/components/frontend/README.md