"lib/vscode:/vscode.git/clone" did not exist on "498e6668e798483d3e47ef97199b4394e4d25dfd"
Unverified Commit 912a4d4b authored by Abhishek Gupta's avatar Abhishek Gupta Committed by GitHub
Browse files

docs: add NIXL backend configuration and fix multiple typos (#5564)



Signed-off-by: Abhishek Kumar Gupta (AbhiOnGithub)
Signed-off-by: default avatarAbhishek Kumar Gupta <mail2abhishekgupta@gmail.com>
Co-authored-by: default avatardagil-nvidia <dagil@nvidia.com>
parent 92ecd308
......@@ -61,7 +61,7 @@ class Config:
# multimodal options
multimodal_processor: bool = False
# Emebdding Cache Processor is different from the regular processor
# Embedding Cache Processor is different from the regular processor
# TODO: Have a single processor for all cases and adopting rust based processor
ec_processor: bool = False
multimodal_encode_worker: bool = False
......
......@@ -30,7 +30,39 @@ By default, TensorRT-LLM uses **NIXL** (NVIDIA Inference Xfer Library) with UCX
### Specify Backends for NIXL
TODO: Add instructions for how to specify different backends for NIXL.
NIXL supports multiple communication backends that can be configured via environment variables. By default, UCX is used if no backends are explicitly specified.
**Environment Variable Format:**
```bash
DYN_KVBM_NIXL_BACKEND_<BACKEND>=<value>
```
**Supported Backends:**
- `UCX` - Unified Communication X (default)
- `GDS` - GPU Direct Storage
**Examples:**
```bash
# Enable UCX backend (default behavior)
export DYN_KVBM_NIXL_BACKEND_UCX=true
# Enable GDS backend
export DYN_KVBM_NIXL_BACKEND_GDS=true
# Enable multiple backends
export DYN_KVBM_NIXL_BACKEND_UCX=true
export DYN_KVBM_NIXL_BACKEND_GDS=true
# Explicitly disable a backend
export DYN_KVBM_NIXL_BACKEND_GDS=false
```
**Valid Values:**
- `true`, `1`, `on`, `yes` - Enable the backend
- `false`, `0`, `off`, `no` - Disable the backend
> [!Note]
> If no `DYN_KVBM_NIXL_BACKEND_*` environment variables are set, UCX is used as the default backend.
## Alternative Method: UCX
......
......@@ -5,7 +5,7 @@ SPDX-License-Identifier: Apache-2.0
# Running Deepseek R1 with Wide EP
Dynamo supports running Deepseek R1 with data parallel attention and wide expert parallelism. Each data parallel attention rank is a seperate dynamo component that will emit its own KV Events and Metrics. vLLM controls the expert parallelism using the flag `--enable-expert-parallel`
Dynamo supports running Deepseek R1 with data parallel attention and wide expert parallelism. Each data parallel attention rank is a separate dynamo component that will emit its own KV Events and Metrics. vLLM controls the expert parallelism using the flag `--enable-expert-parallel`
## Instructions
......
......@@ -1284,7 +1284,7 @@ if __name__ == "__main__":
"--percentile-metrics",
type=str,
default="ttft,tpot,itl",
help="Comma-seperated list of selected metrics to report percentils. "
help="Comma-separated list of selected metrics to report percentiles. "
"This argument specifies the metrics to report percentiles. "
'Allowed metric names are "ttft", "tpot", "itl", "e2el". '
'Default value is "ttft,tpot,itl".',
......@@ -1293,7 +1293,7 @@ if __name__ == "__main__":
"--metric-percentiles",
type=str,
default="99",
help="Comma-seperated list of percentiles for selected metrics. "
help="Comma-separated list of percentiles for selected metrics. "
'To report 25-th, 50-th, and 75-th percentiles, use "25,50,75". '
'Default value is "99". '
'Use "--percentile-metrics" to select metrics.',
......
......@@ -1318,7 +1318,7 @@ mod tests_startup_helpers {
}
assert!(no_blocks, "worker should have no blocks after removal");
// Global kvindexer should have recieved two events (create/remove)
// Global kvindexer should have received two events (create/remove)
let published = published.lock().unwrap();
assert_eq!(
published.len(),
......@@ -1397,7 +1397,7 @@ mod tests_startup_helpers {
}
assert!(no_blocks, "worker should have no blocks after clearing");
// Global kvindexer should have recieved two events (create/remove)
// Global kvindexer should have received two events (create/remove)
let published = published.lock().unwrap();
assert_eq!(
published.len(),
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment