Unverified Commit e3fd943a authored by Henry Li's avatar Henry Li Committed by GitHub
Browse files

Bug fix - update IB_DEVICES specification logic to fix ib-loopback test regression (#762)

**Description**

The ib-loopback test was regressed due to this recent
[change](https://github.com/microsoft/superbenchmark/commit/c65ae56713d6bfcc4a3be37d7fe24779590f9791).
When running ib-loopback using the standard
[config](https://github.com/microsoft/superbenchmark/blob/c65ae56713d6bfcc4a3be37d7fe24779590f9791/superbench/config/default.yaml#L69

),
the test would fail since it would pass numeric values like `0` into the
test command which would break since it is not a valid IB device name.

Example failure:

```
 [2025-11-25 22:08:38,100 vmssnc6ec000003:141056][micro_base.py:200][INFO] Execute command - round: 0, benchmark: ib-loopback, command: /usr/local/bin/run_perftest_loopback 47 45 /usr/local/b                                                                                                                                                        in/ib_write_bw -s 8388608 -F --iters=20000 -d 0 -p 45617 -x 0 --report_gbits.
[0]: IB device 0 not found
 Unable to find the Infiniband/RoCE device
IB device 0 not found
 Unable to find the Infiniband/RoCE device
[2025-11-25 22:08:39,113 vmssnc6ec000003:141056][micro_base.py:209][ERROR] Microbenchmark execution failed - round: 0, benchmark: ib-loopback, error message: IB device 0 not found
 Unable to find the Infiniband/RoCE device
IB device 0 not found
 Unable to find the Infiniband/RoCE device
```


**Major Revision**
- Major Revision A
- Major Revision B
- ...

**Minor Revision**
- Minor Revision A
- Minor Revision B
- ...

---------
Co-authored-by: default avatarHenry Li <lihl@microsoft.com>
parent c65ae567
......@@ -7,6 +7,7 @@
import re
import os
from pathlib import Path
from superbench.common.utils import logger
def get_free_port():
......@@ -33,7 +34,26 @@ def get_ib_devices():
ib_devices_port (list): IB devices with available ports in current system.
"""
if os.getenv('IB_DEVICES', None):
return os.getenv('IB_DEVICES').split(',')
ib_devices_env = os.getenv('IB_DEVICES').split(',')
# Validate that IB_DEVICES contains either all
# numeric indices or all device names, not mixed
numeric_flags = [device.strip().isdigit() for device in ib_devices_env]
all_numeric = all(numeric_flags)
any_numeric = any(numeric_flags)
# Check for mixed case (some numeric, some not)
if any_numeric and not all_numeric:
logger.log_and_raise(
exception=ValueError,
msg='IB_DEVICES contains mixed numeric indices and device names: {}. '
'All values must be either numeric indices (e.g., "0,2,4,6") '
'or device names (e.g., "mlx5_ib0,mlx5_ib2").'.format(os.getenv('IB_DEVICES'))
)
# If all numeric, fall through to discover actual devices; otherwise use provided names
if not all_numeric:
# All are device names, use them directly
return ib_devices_env
devices = list(p.name for p in Path('/sys/class/infiniband').glob('*'))
ib_devices_port_dict = {}
for device in devices:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment