Unverified Commit ece08dc9 authored by Neal Vaidya's avatar Neal Vaidya Committed by GitHub
Browse files

docs: restructure docs directory and move fern config to fern/ (#6700)


Signed-off-by: default avatarNeal Vaidya <nealv@nvidia.com>
Co-authored-by: default avatarClaude Opus 4.6 <noreply@anthropic.com>
parent 1412e44b
......@@ -8,7 +8,7 @@ This document provides an in-depth look at the architecture, components, framewo
## KVBM Components
![Internal Components of Dynamo KVBM](../../assets/img/kvbm-components.png)
![Internal Components of Dynamo KVBM](../assets/img/kvbm-components.png)
*Internal Components of Dynamo KVBM*
......@@ -39,7 +39,7 @@ This document provides an in-depth look at the architecture, components, framewo
## KVBM Data Flows
![KVBM Data Flows](../../assets/img/kvbm-data-flows.png)
![KVBM Data Flows](../assets/img/kvbm-data-flows.png)
*KVBM Data Flows from device to other memory hierarchies*
......@@ -72,7 +72,7 @@ This document provides an in-depth look at the architecture, components, framewo
## Internal Architecture Deep Dive
![Internal architecture and key modules in the Dynamo KVBM](../../assets/img/kvbm-internal-arch.png)
![Internal architecture and key modules in the Dynamo KVBM](../assets/img/kvbm-internal-arch.png)
*Internal architecture and key modules in the Dynamo KVBM*
......@@ -320,23 +320,23 @@ There are two components of the interface:
- **Scheduler (Leader)**: Responsible for orchestration of KV block offload/onboard, builds metadata specifying transfer data to the workers. It also maintains hooks for handling asynchronous transfer completion.
- **Worker**: Responsible for reading metadata built by the scheduler (leader), performs async onboarding/offloading at the end of the forward pass.
![vLLM KVBM Integration](../../assets/img/kvbm-integrations.png)
![vLLM KVBM Integration](../assets/img/kvbm-integrations.png)
*Typical integration of KVBM with inference frameworks (vLLM shown as example)*
### Onboarding Operations
![Onboarding blocks from Host to Device](../../assets/img/kvbm-onboard-host2device.png)
![Onboarding blocks from Host to Device](../assets/img/kvbm-onboard-host2device.png)
*Onboarding blocks from Host to Device*
![Onboarding blocks from Disk to Device](../../assets/img/kvbm-onboard-disk2device.png)
![Onboarding blocks from Disk to Device](../assets/img/kvbm-onboard-disk2device.png)
*Onboarding blocks from Disk to Device*
### Offloading Operations
![Offloading blocks from Device to Host & Disk](../../assets/img/kvbm-offload.png)
![Offloading blocks from Device to Host & Disk](../assets/img/kvbm-offload.png)
*Offloading blocks from Device to Host & Disk*
......
......@@ -12,7 +12,7 @@ The Planner is Dynamo's autoscaling controller. It supports two scaling modes: *
## Throughput-Based Scaling
![Planner architecture showing Metric Collector, Load Predictor, and Performance Interpolator feeding into the Scaling Algorithm and Connector Layer](../../assets/img/planner-architecture.svg)
![Planner architecture showing Metric Collector, Load Predictor, and Performance Interpolator feeding into the Scaling Algorithm and Connector Layer](../assets/img/planner-architecture.svg)
## Scaling Algorithm
......
......@@ -23,17 +23,17 @@ AIConfigurator answers these questions in seconds, providing:
### End-to-End Workflow
![AIConfigurator end-to-end workflow](../../../assets/img/e2e-workflow.svg)
![AIConfigurator end-to-end workflow](../../assets/img/e2e-workflow.svg)
### Aggregated vs Disaggregated Architecture
AIConfigurator evaluates two deployment architectures and recommends the best one for your workload:
![Aggregated vs Disaggregated architecture comparison](../../../assets/img/arch-comparison.svg)
![Aggregated vs Disaggregated architecture comparison](../../assets/img/arch-comparison.svg)
### When to Use Each Architecture
![Decision flowchart for choosing aggregated vs disaggregated](../../../assets/img/decision-flowchart.svg)
![Decision flowchart for choosing aggregated vs disaggregated](../../assets/img/decision-flowchart.svg)
## Quick Start
......@@ -287,7 +287,7 @@ Run AIPerf **inside the cluster** to avoid network latency affecting measurement
To use AIPerf to benchmark an AIC-recommended configuration, you'll need to translate AIC parameters into AIPerf profiling arguments (we are working to automate this):
![AIC-to-AIPerf parameter mapping](../../../assets/img/param-mapping.svg)
![AIC-to-AIPerf parameter mapping](../../assets/img/param-mapping.svg)
| AIC Output | AIPerf Parameter | Notes |
|------------|-----------------|-------|
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment