Unverified Commit 19ecf46f authored by Dr. Stefan Schimanski's avatar Dr. Stefan Schimanski Committed by GitHub
Browse files

docs(operator): clarify Interconnect and RDMA field semantics in HardwareSpec (#8300)


Signed-off-by: default avatarDr. Stefan Schimanski <sschimanski@nvidia.com>
parent 5103efdb
...@@ -221,11 +221,11 @@ class HardwareSpec(BaseModel): ...@@ -221,11 +221,11 @@ class HardwareSpec(BaseModel):
) )
interconnect: Optional[str] = Field( interconnect: Optional[str] = Field(
default=None, default=None,
description='Interconnect describes the GPU interconnect type within a node. Examples: "pcie", "nvlink", "infiniband".', description='Interconnect describes the primary GPU-to-GPU interconnect *within a node*. Semantics / usage: - This is capability metadata used for profiling, planning, and deployment decisions. - It does NOT configure or enable any GPU interconnect; it only describes what is available/assumed. - When omitted, the operator may attempt best-effort discovery (currently distinguishes "nvlink" vs "pcie" based on DCGM NVLink link count). If discovery is unavailable, it may remain empty. Impact of wrong / missing values: - If set more optimistically than reality (e.g., "nvlink" when only PCIe is present), performance models may overestimate intra-node bandwidth and choose overly aggressive parallelism or layouts, resulting in degraded performance compared to expectations. - If set more pessimistically than reality (e.g., "pcie" when NVLink is present), the system may choose conservative plans and leave performance on the table. - If unset and undiscovered, consumers should treat the interconnect as unknown and fall back to conservative assumptions. Example values: "pcie", "nvlink". Other values may be accepted but may not be auto-detected. ',
) )
rdma: Optional[bool] = Field( rdma: Optional[bool] = Field(
default=None, default=None,
description="RDMA indicates whether RDMA is available on the cluster.", description="RDMA indicates whether the cluster has RDMA-capable networking available for Dynamo data movement. Semantics / usage: - This is capability metadata used for profiling, planning, and deployment decisions. - It does NOT install, enable, or configure RDMA (e.g., drivers, SR-IOV, NVIDIA network operator, GPUDirect settings). It only expresses availability/intent. - When omitted, the operator may attempt best-effort discovery (e.g., via node labels indicating RDMA/SR-IOV capability and/or presence of NVIDIA network-operator RDMA components). If discovery is unavailable, it may remain unset. Impact of wrong / missing values: - False positive (set true when RDMA is not actually usable end-to-end) may cause plans or deployments to assume RDMA is available; depending on the runtime transport selection and fallback behavior, this can lead to connection/setup failures or performance regressions. - False negative (set false when RDMA is available) will typically avoid RDMA-optimized paths and fall back to non-RDMA transports, usually remaining functional but potentially slower. - If unset and undiscovered, consumers should treat RDMA availability as unknown and use conservative defaults / fallback transports. ",
) )
......
...@@ -615,15 +615,49 @@ spec: ...@@ -615,15 +615,49 @@ spec:
type: string type: string
interconnect: interconnect:
description: |- description: |-
Interconnect describes the GPU interconnect type within a node. Interconnect describes the primary GPU-to-GPU interconnect *within a node*.
Examples: "pcie", "nvlink", "infiniband".
Semantics / usage:
- This is capability metadata used for profiling, planning, and deployment decisions.
- It does NOT configure or enable any GPU interconnect; it only describes what is available/assumed.
- When omitted, the operator may attempt best-effort discovery (currently distinguishes "nvlink"
vs "pcie" based on DCGM NVLink link count). If discovery is unavailable, it may remain empty.
Impact of wrong / missing values:
- If set more optimistically than reality (e.g., "nvlink" when only PCIe is present), performance
models may overestimate intra-node bandwidth and choose overly aggressive parallelism or layouts,
resulting in degraded performance compared to expectations.
- If set more pessimistically than reality (e.g., "pcie" when NVLink is present), the system may
choose conservative plans and leave performance on the table.
- If unset and undiscovered, consumers should treat the interconnect as unknown and fall back to
conservative assumptions.
Example values: "pcie", "nvlink". Other values may be accepted but may not be auto-detected.
type: string type: string
numGpusPerNode: numGpusPerNode:
description: NumGPUsPerNode is the number of GPUs per node. description: NumGPUsPerNode is the number of GPUs per node.
format: int32 format: int32
type: integer type: integer
rdma: rdma:
description: RDMA indicates whether RDMA is available on the cluster. description: |-
RDMA indicates whether the cluster has RDMA-capable networking available for Dynamo data movement.
Semantics / usage:
- This is capability metadata used for profiling, planning, and deployment decisions.
- It does NOT install, enable, or configure RDMA (e.g., drivers, SR-IOV, NVIDIA network operator,
GPUDirect settings). It only expresses availability/intent.
- When omitted, the operator may attempt best-effort discovery (e.g., via node labels indicating
RDMA/SR-IOV capability and/or presence of NVIDIA network-operator RDMA components). If discovery
is unavailable, it may remain unset.
Impact of wrong / missing values:
- False positive (set true when RDMA is not actually usable end-to-end) may cause plans or
deployments to assume RDMA is available; depending on the runtime transport selection and
fallback behavior, this can lead to connection/setup failures or performance regressions.
- False negative (set false when RDMA is available) will typically avoid RDMA-optimized paths and
fall back to non-RDMA transports, usually remaining functional but potentially slower.
- If unset and undiscovered, consumers should treat RDMA availability as unknown and use
conservative defaults / fallback transports.
type: boolean type: boolean
totalGpus: totalGpus:
description: TotalGPUs is the total number of GPUs available in the cluster. description: TotalGPUs is the total number of GPUs available in the cluster.
......
...@@ -353,11 +353,47 @@ type HardwareSpec struct { ...@@ -353,11 +353,47 @@ type HardwareSpec struct {
// NumGPUsPerNode is the number of GPUs per node. // NumGPUsPerNode is the number of GPUs per node.
// +optional // +optional
NumGPUsPerNode *int32 `json:"numGpusPerNode,omitempty"` NumGPUsPerNode *int32 `json:"numGpusPerNode,omitempty"`
// Interconnect describes the GPU interconnect type within a node. // Interconnect describes the primary GPU-to-GPU interconnect *within a node*.
// Examples: "pcie", "nvlink", "infiniband". //
// Semantics / usage:
// - This is capability metadata used for profiling, planning, and deployment decisions.
// - It does NOT configure or enable any GPU interconnect; it only describes what is available/assumed.
// - When omitted, the operator may attempt best-effort discovery (currently distinguishes "nvlink"
// vs "pcie" based on DCGM NVLink link count). If discovery is unavailable, it may remain empty.
//
// Impact of wrong / missing values:
// - If set more optimistically than reality (e.g., "nvlink" when only PCIe is present), performance
// models may overestimate intra-node bandwidth and choose overly aggressive parallelism or layouts,
// resulting in degraded performance compared to expectations.
// - If set more pessimistically than reality (e.g., "pcie" when NVLink is present), the system may
// choose conservative plans and leave performance on the table.
// - If unset and undiscovered, consumers should treat the interconnect as unknown and fall back to
// conservative assumptions.
//
// Example values: "pcie", "nvlink". Other values may be accepted but may not be auto-detected.
//
// +optional // +optional
Interconnect string `json:"interconnect,omitempty"` Interconnect string `json:"interconnect,omitempty"`
// RDMA indicates whether RDMA is available on the cluster.
// RDMA indicates whether the cluster has RDMA-capable networking available for Dynamo data movement.
//
// Semantics / usage:
// - This is capability metadata used for profiling, planning, and deployment decisions.
// - It does NOT install, enable, or configure RDMA (e.g., drivers, SR-IOV, NVIDIA network operator,
// GPUDirect settings). It only expresses availability/intent.
// - When omitted, the operator may attempt best-effort discovery (e.g., via node labels indicating
// RDMA/SR-IOV capability and/or presence of NVIDIA network-operator RDMA components). If discovery
// is unavailable, it may remain unset.
//
// Impact of wrong / missing values:
// - False positive (set true when RDMA is not actually usable end-to-end) may cause plans or
// deployments to assume RDMA is available; depending on the runtime transport selection and
// fallback behavior, this can lead to connection/setup failures or performance regressions.
// - False negative (set false when RDMA is available) will typically avoid RDMA-optimized paths and
// fall back to non-RDMA transports, usually remaining functional but potentially slower.
// - If unset and undiscovered, consumers should treat RDMA availability as unknown and use
// conservative defaults / fallback transports.
//
// +optional // +optional
RDMA *bool `json:"rdma,omitempty"` RDMA *bool `json:"rdma,omitempty"`
} }
......
...@@ -615,15 +615,49 @@ spec: ...@@ -615,15 +615,49 @@ spec:
type: string type: string
interconnect: interconnect:
description: |- description: |-
Interconnect describes the GPU interconnect type within a node. Interconnect describes the primary GPU-to-GPU interconnect *within a node*.
Examples: "pcie", "nvlink", "infiniband".
Semantics / usage:
- This is capability metadata used for profiling, planning, and deployment decisions.
- It does NOT configure or enable any GPU interconnect; it only describes what is available/assumed.
- When omitted, the operator may attempt best-effort discovery (currently distinguishes "nvlink"
vs "pcie" based on DCGM NVLink link count). If discovery is unavailable, it may remain empty.
Impact of wrong / missing values:
- If set more optimistically than reality (e.g., "nvlink" when only PCIe is present), performance
models may overestimate intra-node bandwidth and choose overly aggressive parallelism or layouts,
resulting in degraded performance compared to expectations.
- If set more pessimistically than reality (e.g., "pcie" when NVLink is present), the system may
choose conservative plans and leave performance on the table.
- If unset and undiscovered, consumers should treat the interconnect as unknown and fall back to
conservative assumptions.
Example values: "pcie", "nvlink". Other values may be accepted but may not be auto-detected.
type: string type: string
numGpusPerNode: numGpusPerNode:
description: NumGPUsPerNode is the number of GPUs per node. description: NumGPUsPerNode is the number of GPUs per node.
format: int32 format: int32
type: integer type: integer
rdma: rdma:
description: RDMA indicates whether RDMA is available on the cluster. description: |-
RDMA indicates whether the cluster has RDMA-capable networking available for Dynamo data movement.
Semantics / usage:
- This is capability metadata used for profiling, planning, and deployment decisions.
- It does NOT install, enable, or configure RDMA (e.g., drivers, SR-IOV, NVIDIA network operator,
GPUDirect settings). It only expresses availability/intent.
- When omitted, the operator may attempt best-effort discovery (e.g., via node labels indicating
RDMA/SR-IOV capability and/or presence of NVIDIA network-operator RDMA components). If discovery
is unavailable, it may remain unset.
Impact of wrong / missing values:
- False positive (set true when RDMA is not actually usable end-to-end) may cause plans or
deployments to assume RDMA is available; depending on the runtime transport selection and
fallback behavior, this can lead to connection/setup failures or performance regressions.
- False negative (set false when RDMA is available) will typically avoid RDMA-optimized paths and
fall back to non-RDMA transports, usually remaining functional but potentially slower.
- If unset and undiscovered, consumers should treat RDMA availability as unknown and use
conservative defaults / fallback transports.
type: boolean type: boolean
totalGpus: totalGpus:
description: TotalGPUs is the total number of GPUs available in the cluster. description: TotalGPUs is the total number of GPUs available in the cluster.
......
...@@ -1584,8 +1584,8 @@ _Appears in:_ ...@@ -1584,8 +1584,8 @@ _Appears in:_
| `vramMb` _float_ | VRAMMB is the VRAM per GPU in MiB. | | Optional: \{\} <br /> | | `vramMb` _float_ | VRAMMB is the VRAM per GPU in MiB. | | Optional: \{\} <br /> |
| `totalGpus` _integer_ | TotalGPUs is the total number of GPUs available in the cluster. | | Optional: \{\} <br /> | | `totalGpus` _integer_ | TotalGPUs is the total number of GPUs available in the cluster. | | Optional: \{\} <br /> |
| `numGpusPerNode` _integer_ | NumGPUsPerNode is the number of GPUs per node. | | Optional: \{\} <br /> | | `numGpusPerNode` _integer_ | NumGPUsPerNode is the number of GPUs per node. | | Optional: \{\} <br /> |
| `interconnect` _string_ | Interconnect describes the GPU interconnect type within a node.<br />Examples: "pcie", "nvlink", "infiniband". | | Optional: \{\} <br /> | | `interconnect` _string_ | Interconnect describes the primary GPU-to-GPU interconnect *within a node*.<br />Semantics / usage:<br /> - This is capability metadata used for profiling, planning, and deployment decisions.<br /> - It does NOT configure or enable any GPU interconnect; it only describes what is available/assumed.<br /> - When omitted, the operator may attempt best-effort discovery (currently distinguishes "nvlink"<br /> vs "pcie" based on DCGM NVLink link count). If discovery is unavailable, it may remain empty.<br />Impact of wrong / missing values:<br /> - If set more optimistically than reality (e.g., "nvlink" when only PCIe is present), performance<br /> models may overestimate intra-node bandwidth and choose overly aggressive parallelism or layouts,<br /> resulting in degraded performance compared to expectations.<br /> - If set more pessimistically than reality (e.g., "pcie" when NVLink is present), the system may<br /> choose conservative plans and leave performance on the table.<br /> - If unset and undiscovered, consumers should treat the interconnect as unknown and fall back to<br /> conservative assumptions.<br />Example values: "pcie", "nvlink". Other values may be accepted but may not be auto-detected. | | Optional: \{\} <br /> |
| `rdma` _boolean_ | RDMA indicates whether RDMA is available on the cluster. | | Optional: \{\} <br /> | | `rdma` _boolean_ | RDMA indicates whether the cluster has RDMA-capable networking available for Dynamo data movement.<br />Semantics / usage:<br /> - This is capability metadata used for profiling, planning, and deployment decisions.<br /> - It does NOT install, enable, or configure RDMA (e.g., drivers, SR-IOV, NVIDIA network operator,<br /> GPUDirect settings). It only expresses availability/intent.<br /> - When omitted, the operator may attempt best-effort discovery (e.g., via node labels indicating<br /> RDMA/SR-IOV capability and/or presence of NVIDIA network-operator RDMA components). If discovery<br /> is unavailable, it may remain unset.<br />Impact of wrong / missing values:<br /> - False positive (set true when RDMA is not actually usable end-to-end) may cause plans or<br /> deployments to assume RDMA is available; depending on the runtime transport selection and<br /> fallback behavior, this can lead to connection/setup failures or performance regressions.<br /> - False negative (set false when RDMA is available) will typically avoid RDMA-optimized paths and<br /> fall back to non-RDMA transports, usually remaining functional but potentially slower.<br /> - If unset and undiscovered, consumers should treat RDMA availability as unknown and use<br /> conservative defaults / fallback transports. | | Optional: \{\} <br /> |
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment