Release - SuperBench v0.6.0 (#409)

**Description** Cherry-pick bug fixes from v0.6.0 to main. **Major Revisions** * Enable latency test in ib traffic validation distributed benchmark (#396) * Enhance parameter parsing to allow spaces in value (#397) * Update apt packages in dockerfile (#398) * Upgrade colorlog for NO_COLOR support (#404) * Analyzer - Update error handling to support exit code of sb result diagnosis (#403) * Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399) * Enhance timeout cleanup to avoid possible hanging (#405) * Auto generate ibstat file by pssh (#402) * Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406) * Docs - Upgrade version and release note (#407) * Docs - Fix issues in document (#408) Co-authored-by: Yang Wang <yangwang1@microsoft.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

Release - SuperBench v0.6.0 (#409)
**Description** Cherry-pick bug fixes from v0.6.0 to main. **Major Revisions** * Enable latency test in ib traffic validation distributed benchmark (#396) * Enhance parameter parsing to allow spaces in value (#397) * Update apt packages in dockerfile (#398) * Upgrade colorlog for NO_COLOR support (#404) * Analyzer - Update error handling to support exit code of sb result diagnosis (#403) * Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399) * Enhance timeout cleanup to avoid possible hanging (#405) * Auto generate ibstat file by pssh (#402) * Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406) * Docs - Upgrade version and release note (#407) * Docs - Fix issues in document (#408) Co-authored-by: Yang Wang <yangwang1@microsoft.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
63e9b2d1 · Yifan Xiong · GitHub · 733860d7 · 63e9b2d1 · 63e9b2d1
Unverified Commit 63e9b2d1 authored Sep 06, 2022 by Yifan Xiong Committed by GitHub Sep 06, 2022
20 changed files
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@

 __SuperBench__ is a validation and profiling tool for AI infrastructure.

-📢 [v0.5.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.5.0) has been released!
+📢 [v0.6.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.6.0) has been released!

 ## _Check [aka.ms/superbench](https://aka.ms/superbench) for more details._


--- a/dockerfile/cuda11.1.1.dockerfile
+++ b/dockerfile/cuda11.1.1.dockerfile
@@ -26,6 +26,7 @@ RUN apt-get update && \
    curl \
    dmidecode \
    git \
+    iproute2 \
    jq \
    libaio-dev \
    libcap2 \
@@ -38,6 +39,7 @@ RUN apt-get update && \
    openssh-client \
    openssh-server \
    pciutils \
+    sudo \
    util-linux \
    vim \
    wget \

--- a/dockerfile/rocm5.0.x.dockerfile
+++ b/dockerfile/rocm5.0.x.dockerfile
@@ -31,6 +31,7 @@ RUN apt-get update && \
    dmidecode \
    git \
    hipify-clang \
+    iproute2 \
    jq \
    libaio-dev \
    libboost-program-options-dev \
@@ -46,6 +47,7 @@ RUN apt-get update && \
    openssh-server \
    pciutils \
    rsync \
+    sudo \
    util-linux \
    vim \
    wget \

--- a/dockerfile/rocm5.1.x.dockerfile
+++ b/dockerfile/rocm5.1.x.dockerfile
@@ -30,6 +30,7 @@ RUN apt-get update && \
    dmidecode \
    git \
    hipify-clang \
+    iproute2 \
    jq \
    libaio-dev \
    libboost-program-options-dev \
@@ -46,6 +47,7 @@ RUN apt-get update && \
    openssh-server \
    pciutils \
    rsync \
+    sudo \
    util-linux \
    vim \
    wget \

--- a/docs/cli.md
+++ b/docs/cli.md
@@ -180,16 +180,16 @@ sb result diagnosis --baseline-file

 #### Required arguments

-| Name                   | Description            |
-|------------------------|------------------------|
-| `--baseline-file` `-b` | Path to baseline file. |
-| `--data-file` `-d`     | Path to raw data file. |
-| `--rule-file` `-r`     | Path to rule file.     |
+| Name               | Description            |
+|--------------------|------------------------|
+| `--data-file` `-d` | Path to raw data file. |
+| `--rule-file` `-r` | Path to rule file.     |

 #### Optional arguments

 | Name                    | Default | Description                                                                 |
 |-------------------------|---------|-----------------------------------------------------------------------------|
+| `--baseline-file` `-b` | Path to baseline file. |
 | `--decimal-place-value` | 2       | Number of valid decimal places to show in output. Default: 2.               |
 | `--output-all`          | N/A     | Output diagnosis results for all nodes.                                     |
 | `--output-dir`          | `None`  | Path to output directory, outputs/{datetime} will be used if not specified. |

--- a/docs/getting-started/installation.mdx
+++ b/docs/getting-started/installation.mdx
@@ -61,7 +61,7 @@ You can clone the source from GitHub and build it.
 :::note Note
 You should checkout corresponding tag to use release version, for example,

-`git clone -b v0.5.0 https://github.com/microsoft/superbenchmark`
+`git clone -b v0.6.0 https://github.com/microsoft/superbenchmark`
 :::

 ```bash
@@ -96,7 +96,7 @@ Here're the system requirements for all managed GPU nodes.

 * Latest version of Linux, you're highly encouraged to use Ubuntu 18.04 or later.
 * Compatible GPU drivers should be installed correctly. Driver version can be checked by running `nvidia-smi`.
-* [Docker CE](https://docs.docker.com/engine/install/) version 19.03 or later (which can be checked by running `docker --version`).
+* [Docker CE](https://docs.docker.com/engine/install/) version 20.10 or later (which can be checked by running `docker --version`).
 * NVIDIA GPU support in Docker, install
  [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#setting-up-nvidia-container-toolkit).

@@ -106,7 +106,7 @@ Here're the system requirements for all managed GPU nodes.
 * Latest version of Linux, you're highly encouraged to use Ubuntu 18.04 or later.
 * Compatible GPU drivers should be installed correctly, and group permission should be set to access GPU resources.
  You should be able to run `rocm-smi` and `rocminfo` directly to check GPU usage and information.
-* [Docker CE](https://docs.docker.com/engine/install/) version 19.03 or later (which can be checked by running `docker --version`).
+* [Docker CE](https://docs.docker.com/engine/install/) version 20.10 or later (which can be checked by running `docker --version`).

 </TabItem>
 </Tabs>
--- a/docs/getting-started/run-superbench.md
+++ b/docs/getting-started/run-superbench.md
@@ -27,7 +27,7 @@ sb deploy -f remote.ini --host-password [password]
 :::note Note
 You should deploy corresponding Docker image to use release version, for example,

-`sb deploy -f local.ini -i superbench/superbench:v0.5.0-cuda11.1.1`
+`sb deploy -f local.ini -i superbench/superbench:v0.6.0-cuda11.1.1`

 You should note that version of git repo only determines version of sb CLI, and not the sb container. You should define the container version even if you specified a release version for the git clone.


--- a/docs/superbench-config.mdx
+++ b/docs/superbench-config.mdx
@@ -70,7 +70,7 @@ superbench:
 <TabItem value='example'>

 ```yaml
-version: v0.5
+version: v0.6
 superbench:
  enable: benchmark_1
  monitor:

--- a/docs/user-tutorial/container-images.mdx
+++ b/docs/user-tutorial/container-images.mdx
@@ -29,6 +29,7 @@ available tags are listed below for all stable versions.

 | Tag               | Description                        |
 |-------------------|------------------------------------|
+| v0.6.0-cuda11.1.1 | SuperBench v0.6.0 with CUDA 11.1.1 |
 | v0.5.0-cuda11.1.1 | SuperBench v0.5.0 with CUDA 11.1.1 |
 | v0.4.0-cuda11.1.1 | SuperBench v0.4.0 with CUDA 11.1.1 |
 | v0.3.0-cuda11.1.1 | SuperBench v0.3.0 with CUDA 11.1.1 |
@@ -40,6 +41,10 @@ available tags are listed below for all stable versions.

 | Tag                           | Description                                      |
 |-------------------------------|--------------------------------------------------|
+| v0.6.0-rocm5.1.3              | SuperBench v0.6.0 with ROCm 5.1.3                |
+| v0.6.0-rocm5.1.1              | SuperBench v0.6.0 with ROCm 5.1.1                |
+| v0.6.0-rocm5.0.1              | SuperBench v0.6.0 with ROCm 5.0.1                |
+| v0.6.0-rocm5.0                | SuperBench v0.6.0 with ROCm 5.0                  |
 | v0.5.0-rocm5.0.1-pytorch1.9.0 | SuperBench v0.5.0 with ROCm 5.0.1, PyTorch 1.9.0 |
 | v0.5.0-rocm5.0-pytorch1.9.0   | SuperBench v0.5.0 with ROCm 5.0, PyTorch 1.9.0   |
 | v0.5.0-rocm4.2-pytorch1.7.0   | SuperBench v0.5.0 with ROCm 4.2, PyTorch 1.7.0   |

--- a/docs/user-tutorial/data-diagnosis.md
+++ b/docs/user-tutorial/data-diagnosis.md
@@ -32,7 +32,7 @@ The input mainly includes 3 files:

 - **rule file**: It uses YAML format and includes each metrics' rules to filter defective machines for diagnosis.

- - **baseline file**: json file including the baseline values for the metrics.
+ - **baseline file (optional)**: json file including the baseline values for the metrics.

    `Tips`: this file for some representative machine types will be published in [SuperBench Results Repo](https://github.com/microsoft/superbench-results/tree/main) with the release of Superbench.

@@ -52,8 +52,8 @@ superbench:
    ${var_name}: dict
  rules:
    ${rule_name}:
-      function: string
-      criteria: string
+      function: (optional)string
+      criteria: (optional)string
      store: (optional)bool
      categories: string
      metrics:
@@ -65,11 +65,11 @@ superbench:
 example:
 ```yaml
 # SuperBench rules
-version: v0.5
+version: v0.6
 superbench:
  rules:
    failure-rule:
-      function: value
+      function: failure_check
      criteria: lambda x:x>0
      categories: Failed
      metrics:
@@ -125,8 +125,17 @@ superbench:
        - vgg_models/pytorch-vgg.*/throughput_train_.*\
    rule6:
      function: multi_rules
-      criteria: 'lambda label:True if label["rule4"]+label["rule5"]>=2 else False'
+      criteria: 'lambda label: bool(label["rule4"]+label["rule5"]>=2)'
      categories: CNN
+    rule7:
+      categories: MODEL_DIST
+      store: True
+      metrics:
+        - model-benchmarks:stress-run.*/pytorch-gpt2-large/fp32_train_throughput
+    rule8:
+      function: multi_rules
+      criteria: 'lambda label: bool(min(label["rule7"].values()))<1)'
+      categories: MODEL_DIST
 ```

 This rule file describes the rules used for data diagnosis.
@@ -147,15 +156,18 @@ The criterion used for this rule, which indicates how to compare the data with t

 #### `store`

-True if the current rule is not used alone to filter the defective machine, but will be used by other subsequent rules. False(default) if this rule is used to label the defective machine directly.
+- True: this rule is used to store metrics which will be used by other subsequent rules.
+  - If store is True and criteria/function are not None in the rule, it will store how many metrics in this rule meet the criteria into lable["rule_name"], for example lable["rule_name"]=2 means 2 metrics are identified as defective in this rule;
+  - If store is True and criteria/function are None, it will store the dict of {metric_name: values} of the metrics into lable["rule_name"]
+- False (default): this rule is used to label the defective machine directly.

 #### `function`

 The function used for this rule.

-3 types of rules are supported currently:
+The supported functions are listed as follows:

- `variance`: the rule is to check if the variance between raw data and baseline violates the criteria. variance = (raw data - criteria) / criteria
+- `variance`: the rule is to check if the variance between raw data and baseline violates the criteria. variance = (raw data - baseline) / baseline

  For example, if the 'criteria' is `lambda x:x>0.05`, the rule is that if the variance is larger than 5%, it should be defective.

@@ -164,8 +176,16 @@ The function used for this rule.
  For example, if the 'criteria' is `lambda x:x>0`, the rule is that if the raw data is larger than the 0, it should be defective.

 - `multi_rules`: the rule is to check if the combined results of multiple previous rules and metrics violate the criteria.
+  We would like to list several examples as follows:
+  - `criteria: lambda label: bool(label["rule4"]+label["rule5"]>=2)` means that this rule will be triggered if the sum of labeled metrics in rule4 and rule5 is larger than 2
+  - `criteria: lambda label: bool(min(label["rule7"].values()))<1)` means that if the minimum of the metrics' values in rule6 is smaller than 1, it should be defective.
+    - If you reference a non-existent rule, it will raise exception.
+    - If the test in the referenced rule failed or not run resulting in exception in creteria, it will not raise exception since it will be checked in failure_rule.

-  For example, if the 'criteria' is 'lambda label:True if label["rule4"]+label["rule5"]>=2 else False', the rule is that if the sum of labeled metrics in rule4 and rule5 is larger than 2, it should be defective.
+- `failure_check`: the rule is to check if any metric in this rule fail or miss the test. The metrics in this rule should be like `{benchmark_name}/.*:return_code` used to identify the failure.
+
+  - If any item is never matched with the metrics of the raw data, the rule will identify it as miss test.
+  - If any metric violate the `value` criteria which means return_code is not success(0), the rule will identify it as failed test.

 `Tips`: you must contain a default rule for ${benchmark_name}/return_code as the above in the example, which is used to identify failed tests.

@@ -182,6 +202,8 @@ The output includes all defective machines' information including index, failure
 - Defective Details (diagnosis/issue_details in json format): all violated metrics including metric data and related rule.

 - ${metric}: the data of the metrics defined in the rule file. If the rule is `variance`, the form of the data is variance in percentage; if the rule is `value`, the form of the data is raw data.
+  - `'N/A'` indicates a empty value for the metric in output files.
+

 If you specify '--output-all' in the command, the output includes all machines' information and an extra field to indicate if the machines is defective.


--- a/docs/user-tutorial/result-summary.md
+++ b/docs/user-tutorial/result-summary.md
@@ -58,7 +58,7 @@ superbench:

 ```yaml title="Example"
 # SuperBench rules
-version: v0.5
+version: v0.6
 superbench:
  rules:
    kernel_launch:

--- a/setup.py
+++ b/setup.py
@@ -142,7 +142,7 @@ def run(self):
    install_requires=[
        'ansible_base>=2.10.9;os_name=="posix"',
        'ansible_runner>=2.0.0rc1',
-        'colorlog>=4.7.2',
+        'colorlog>=6.7.0',
        'jinja2>=2.10.1',
        'joblib>=1.0.1',
        'jsonlines>=2.0.0',
@@ -155,6 +155,7 @@ def run(self):
        'omegaconf==2.0.6',
        'openpyxl>=3.0.7',
        'pandas>=1.1.5',
+        'pssh @ git+https://github.com/lilydjwg/pssh.git@v2.3.4',
        'pyyaml>=5.3',
        'requests>=2.27.1',
        'seaborn>=0.11.2',
@@ -169,8 +170,8 @@ def run(self):
            **x,
            'develop': x['dev'] + x['test'],
            'cpuworker': x['torch'],
-            'amdworker': x['torch'] + x['ort'] + x['mpi'],
-            'nvworker': x['torch'] + x['ort'] + x['mpi'] + x['nvidia'],
+            'amdworker': x['torch'] + x['ort'],
+            'nvworker': x['torch'] + x['ort'] + x['nvidia'],
        }
    )(
        {
@@ -199,7 +200,6 @@ def run(self):
                'onnx>=1.10.2',
                'onnxruntime-gpu==1.10.0',
            ],
-            'mpi': ['mpi4py>=3.1.3'],
            'nvidia': ['py3nvml>=0.2.6'],
        }
    ),

--- a/superbench/__init__.py
+++ b/superbench/__init__.py
@@ -6,5 +6,5 @@
 Provide hardware and software benchmarks for AI systems.
 """

-__version__ = '0.5.0'
+__version__ = '0.6.0'
 __author__ = 'Microsoft'
--- a/superbench/analyzer/data_diagnosis.py
+++ b/superbench/analyzer/data_diagnosis.py
@@ -21,6 +21,7 @@ class DataDiagnosis(RuleBase):
    def __init__(self):
        """Init function."""
        super().__init__()
+        self.na = 'N/A'

    def _check_and_format_rules(self, rule, name):
        """Check the rule of the metric whether the formart is valid.
@@ -63,8 +64,6 @@ def _get_baseline_of_metric(self, baseline, metric):
        """
        if metric in baseline:
            return baseline[metric]
-        elif 'return_code' in metric:
-            return 0
        else:
            short = metric
            # exclude rank info, for example, '.*:\d+'->'.*'
@@ -76,8 +75,7 @@ def _get_baseline_of_metric(self, baseline, metric):
                return baseline[short]
            # baseline not defined
            else:
-                logger.warning('DataDiagnosis: get baseline - {} baseline not found'.format(metric))
-                return -1
+                return None

    def __get_metrics_and_baseline(self, rule, benchmark_rules, baseline):
        """Get metrics with baseline in the rule.
@@ -108,8 +106,7 @@ def _parse_rules_and_baseline(self, rules, baseline):
        """
        try:
            if not rules:
-                logger.error('DataDiagnosis: get criteria failed')
-                return False
+                logger.log_and_raise(exception=Exception, msg='DataDiagnosis: get criteria failed')
            self._sb_rules = {}
            self._enable_metrics = set()
            benchmark_rules = rules['superbench']['rules']
@@ -129,8 +126,7 @@ def _parse_rules_and_baseline(self, rules, baseline):
                self.__get_metrics_and_baseline(rule, benchmark_rules, baseline)
            self._enable_metrics = sorted(list(self._enable_metrics))
        except Exception as e:
-            logger.error('DataDiagnosis: get criteria failed - {}'.format(str(e)))
-            return False
+            logger.log_and_raise(exception=Exception, msg='DataDiagnosis: get criteria failed - {}'.format(str(e)))

        return True

@@ -205,32 +201,29 @@ def run_diagnosis_rules(self, rules, baseline):
            data_not_accept_df (DataFrame): defective nodes's detailed information
            label_df (DataFrame): labels for all nodes
        """
-        try:
-            summary_columns = ['Category', 'Defective Details']
-            data_not_accept_df = pd.DataFrame(columns=summary_columns)
-            summary_details_df = pd.DataFrame()
-            label_df = pd.DataFrame(columns=['label'])
-            if not self._parse_rules_and_baseline(rules, baseline):
-                return data_not_accept_df, label_df
-            # run diagnosis rules for each node
-            for node in self._raw_data_df.index:
-                details_row, summary_data_row = self._run_diagnosis_rules_for_single_node(node)
-                if details_row:
-                    data_not_accept_df.loc[node] = details_row
-                    summary_details_df = pd.concat(
-                        [summary_details_df,
-                         pd.DataFrame([summary_data_row.to_dict()], index=[summary_data_row.name])]
-                    )
-                    label_df.loc[node] = 1
-                else:
-                    label_df.loc[node] = 0
-            # combine details for defective nodes
-            if len(data_not_accept_df) != 0:
-                data_not_accept_df = data_not_accept_df.join(summary_details_df)
-                data_not_accept_df = data_not_accept_df.sort_values(by=summary_columns, ascending=False)
+        summary_columns = ['Category', 'Defective Details']
+        data_not_accept_df = pd.DataFrame(columns=summary_columns)
+        summary_details_df = pd.DataFrame()
+        label_df = pd.DataFrame(columns=['label'])
+        if not self._parse_rules_and_baseline(rules, baseline):
+            return data_not_accept_df, label_df
+        # run diagnosis rules for each node
+        for node in self._raw_data_df.index:
+            details_row, summary_data_row = self._run_diagnosis_rules_for_single_node(node)
+            if details_row:
+                data_not_accept_df.loc[node] = details_row
+                summary_details_df = pd.concat(
+                    [summary_details_df,
+                     pd.DataFrame([summary_data_row.to_dict()], index=[summary_data_row.name])]
+                )
+                label_df.loc[node] = 1
+            else:
+                label_df.loc[node] = 0
+        # combine details for defective nodes
+        if len(data_not_accept_df) != 0:
+            data_not_accept_df = data_not_accept_df.join(summary_details_df)
+            data_not_accept_df = data_not_accept_df.sort_values(by=summary_columns, ascending=False)

-        except Exception as e:
-            logger.error('DataDiagnosis: run diagnosis rules failed, message: {}'.format(str(e)))
        return data_not_accept_df, label_df

    def output_all_nodes_results(self, raw_data_df, data_not_accept_df):
@@ -258,24 +251,21 @@ def output_all_nodes_results(self, raw_data_df, data_not_accept_df):
            data_not_accept_df['Number Of Issues'] = data_not_accept_df['Defective Details'].map(
                lambda x: len(x.split(','))
            )
-            for index in range(len(append_columns)):
+            for index in range(len(append_columns) - 1, -1, -1):
                if append_columns[index] not in data_not_accept_df:
-                    logger.warning(
-                        'DataDiagnosis: output_all_nodes_results - column {} not found in data_not_accept_df.'.format(
-                            append_columns[index]
-                        )
+                    logger.log_and_raise(
+                        Exception,
+                        msg='DataDiagnosis: output_all_nodes_results - column {} not found in data_not_accept_df.'.
+                        format(append_columns[index])
                    )
-                    all_data_df[append_columns[index]] = None
                else:
-                    all_data_df = all_data_df.merge(
-                        data_not_accept_df[[append_columns[index]]], left_index=True, right_index=True, how='left'
-                    )
+                    all_data_df = data_not_accept_df[[
+                        append_columns[index]
+                    ]].merge(all_data_df, left_index=True, right_index=True, how='right')
            all_data_df['Accept'] = all_data_df['Accept'].replace(np.nan, True)
            all_data_df['Number Of Issues'] = all_data_df['Number Of Issues'].replace(np.nan, 0)
            all_data_df['Number Of Issues'] = all_data_df['Number Of Issues'].astype(int)

-        all_data_df = all_data_df.replace(np.nan, '')
-
        return all_data_df

    def output_diagnosis_in_excel(self, raw_data_df, data_not_accept_df, output_path, rules):
@@ -288,16 +278,16 @@ def output_diagnosis_in_excel(self, raw_data_df, data_not_accept_df, output_path
            rules (dict): the rules of DataDiagnosis
        """
        try:
+            data_not_accept_df = data_not_accept_df.convert_dtypes()
            writer = pd.ExcelWriter(output_path, engine='xlsxwriter')
            # Check whether writer is valiad
            if not isinstance(writer, pd.ExcelWriter):
-                logger.error('DataDiagnosis: excel_data_output - invalid file path.')
-                return
+                logger.log_and_raise(exception=IOError, msg='DataDiagnosis: excel_data_output - invalid file path.')
            file_handler.output_excel_raw_data(writer, raw_data_df, 'Raw Data')
            file_handler.output_excel_data_not_accept(writer, data_not_accept_df, rules)
            writer.save()
        except Exception as e:
-            logger.error('DataDiagnosis: excel_data_output - {}'.format(str(e)))
+            logger.log_and_raise(exception=Exception, msg='DataDiagnosis: excel_data_output - {}'.format(str(e)))

    def output_diagnosis_in_jsonl(self, data_not_accept_df, output_path):
        """Output data_not_accept_df into jsonl file.
@@ -306,24 +296,29 @@ def output_diagnosis_in_jsonl(self, data_not_accept_df, output_path):
            data_not_accept_df (DataFrame): the DataFrame to output
            output_path (str): the path of output jsonl file
        """
+        data_not_accept_df = data_not_accept_df.convert_dtypes().astype('object').fillna(self.na)
        p = Path(output_path)
        try:
            data_not_accept_json = data_not_accept_df.to_json(orient='index')
            data_not_accept = json.loads(data_not_accept_json)
            if not isinstance(data_not_accept_df, pd.DataFrame):
-                logger.warning('DataDiagnosis: output json data - data_not_accept_df is not DataFrame.')
-                return
+                logger.log_and_raise(
+                    Exception, msg='DataDiagnosis: output json data - data_not_accept_df is not DataFrame.'
+                )
            if data_not_accept_df.empty:
-                logger.warning('DataDiagnosis: output json data - data_not_accept_df is empty.')
+                with p.open('w') as f:
+                    pass
                return
            with p.open('w') as f:
                for node in data_not_accept:
                    line = data_not_accept[node]
-                    line['Index'] = node
+                    line['index'] = node
                    json_str = json.dumps(line)
                    f.write(json_str + '\n')
        except Exception as e:
-            logger.error('DataDiagnosis: output json data failed, msg: {}'.format(str(e)))
+            logger.log_and_raise(
+                exception=Exception, msg='DataDiagnosis: output json data failed, msg: {}'.format(str(e))
+            )

    def output_diagnosis_in_json(self, data_not_accept_df, output_path):
        """Output data_not_accept_df into json file.
@@ -332,7 +327,8 @@ def output_diagnosis_in_json(self, data_not_accept_df, output_path):
            data_not_accept_df (DataFrame): the DataFrame to output
            output_path (str): the path of output jsonl file
        """
-        data_not_accept_df['Index'] = data_not_accept_df.index
+        data_not_accept_df = data_not_accept_df.convert_dtypes().astype('object').fillna(self.na)
+        data_not_accept_df = data_not_accept_df.reset_index()
        data_not_accept_df = data_not_accept_df.rename(
            columns={
                'Defective Details': 'diagnosis/issue_details',
@@ -358,29 +354,31 @@ def generate_md_lines(self, data_not_accept_df, rules, round):
        Returns:
            list: lines in markdown format
        """
-        data_not_accept_df['machine'] = data_not_accept_df.index
+        if len(data_not_accept_df) == 0:
+            return []
+        data_not_accept_df = data_not_accept_df.reset_index()
        header = data_not_accept_df.columns.tolist()
-        header = header[-1:] + header[:-1]
-        data_not_accept_df = data_not_accept_df[header]
        # format precision of values to n decimal digits
        for rule in rules:
-            for metric in rules[rule]['metrics']:
-                if rules[rule]['function'] == 'variance':
-                    if round and isinstance(round, int):
+            if 'function' in rules[rule]:
+                for metric in rules[rule]['metrics']:
+                    if rules[rule]['function'] == 'variance':
+                        if round and isinstance(round, int):
+                            data_not_accept_df[metric] = data_not_accept_df[metric].map(
+                                lambda x: x * 100, na_action='ignore'
+                            )
+                            data_not_accept_df = data_analysis.round_significant_decimal_places(
+                                data_not_accept_df, round, [metric]
+                            )
                        data_not_accept_df[metric] = data_not_accept_df[metric].map(
-                            lambda x: x * 100, na_action='ignore'
-                        )
-                        data_not_accept_df = data_analysis.round_significant_decimal_places(
-                            data_not_accept_df, round, [metric]
-                        )
-                    data_not_accept_df[metric] = data_not_accept_df[metric].map(
-                        lambda x: '{}%'.format(x), na_action='ignore'
-                    )
-                elif rules[rule]['function'] == 'value':
-                    if round and isinstance(round, int):
-                        data_not_accept_df = data_analysis.round_significant_decimal_places(
-                            data_not_accept_df, round, [metric]
+                            lambda x: '{}%'.format(x), na_action='ignore'
                        )
+                    elif rules[rule]['function'] == 'value':
+                        if round and isinstance(round, int):
+                            data_not_accept_df = data_analysis.round_significant_decimal_places(
+                                data_not_accept_df, round, [metric]
+                            )
+        data_not_accept_df = data_not_accept_df.convert_dtypes().astype('object').fillna(self.na)
        lines = file_handler.generate_md_table(data_not_accept_df, header)
        return lines

@@ -401,7 +399,7 @@ def run(
        try:
            rules = self._preprocess(raw_data_file, rule_file)
            # read baseline
-            baseline = file_handler.read_baseline(baseline_file)
+            baseline = file_handler.read_baseline(baseline_file) if baseline_file is not None else {}
            logger.info('DataDiagnosis: Begin to process {} nodes'.format(len(self._raw_data_df)))
            output_df, label_df = self.run_diagnosis_rules(rules, baseline)
            logger.info('DataDiagnosis: Processed finished')
@@ -424,7 +422,9 @@ def run(
                else:
                    file_handler.output_lines_in_html(lines, output_path)
            else:
-                logger.error('DataDiagnosis: output failed - unsupported output format')
+                logger.log_and_raise(
+                    exception=Exception, msg='DataDiagnosis: output failed - unsupported output format'
+                )
            logger.info('DataDiagnosis: Output results to {}'.format(output_path))
        except Exception as e:
-            logger.error('DataDiagnosis: run failed - {}'.format(str(e)))
+            logger.log_and_raise(exception=Exception, msg='DataDiagnosis: run failed - {}'.format(str(e)))
--- a/superbench/analyzer/diagnosis_rule_op.py
+++ b/superbench/analyzer/diagnosis_rule_op.py
@@ -66,7 +66,7 @@ def check_criterion_with_a_value(rule):
        """
        # parse criteria and check if valid
        if not isinstance(eval(rule['criteria'])(0), bool):
-            logger.log_and_raise(exception=Exception, msg='invalid criteria format')
+            logger.log_and_raise(exception=ValueError, msg='invalid criteria format')

    @staticmethod
    def miss_test(metric, rule, data_row, details, categories):
@@ -130,8 +130,10 @@ def variance(data_row, rule, summary_data_row, details, categories):
                # check if metric pass the rule
                val = data_row[metric]
                baseline = rule['metrics'][metric]
-                if baseline == 0:
-                    logger.log_and_raise(exception=Exception, msg='invalid baseline 0 in variance rule')
+                if baseline is None or baseline == 0:
+                    logger.log_and_raise(
+                        exception=ValueError, msg='invalid baseline 0 or baseline not found in variance rule'
+                    )
                var = (val - baseline) / baseline
                summary_data_row[metric] = var
                violate_metric = eval(rule['criteria'])(var)
@@ -203,13 +205,20 @@ def multi_rules(rule, details, categories, store_values):
        Returns:
            number: 0 if the rule is passed, otherwise 1
        """
-        violated = eval(rule['criteria'])(store_values)
-        if not isinstance(violated, bool):
-            logger.log_and_raise(exception=Exception, msg='invalid upper criteria format')
-        if violated:
-            info = '{}:{}'.format(rule['name'], rule['criteria'])
-            RuleOp.add_categories_and_details(info, rule['categories'], details, categories)
-        return 1 if violated else 0
+        try:
+            violated = eval(rule['criteria'])(store_values)
+            if not isinstance(violated, bool):
+                logger.log_and_raise(exception=ValueError, msg='invalid criteria format')
+            if violated:
+                info = '{}:{}'.format(rule['name'], rule['criteria'])
+                RuleOp.add_categories_and_details(info, rule['categories'], details, categories)
+            return 1 if violated else 0
+        # the key defined in criteria is not found
+        except KeyError as e:
+            logger.log_and_raise(exception=KeyError, msg='invalid criteria format - {}'.format(str(e)))
+        # miss/failed test
+        except Exception:
+            return 0

    @staticmethod
    def failure_check(data_row, rule, summary_data_row, details, categories, raw_rule):

--- a/superbench/analyzer/file_handler.py
+++ b/superbench/analyzer/file_handler.py
@@ -28,8 +28,9 @@ def read_raw_data(raw_data_path):
    p = Path(raw_data_path)
    raw_data_df = pd.DataFrame()
    if not p.is_file():
-        logger.error('FileHandler: invalid raw data path - {}'.format(raw_data_path))
-        return raw_data_df
+        logger.log_and_raise(
+            exception=FileNotFoundError, msg='FileHandler: invalid raw data path - {}'.format(raw_data_path)
+        )

    try:
        with p.open(encoding='utf-8') as f:
@@ -38,7 +39,7 @@ def read_raw_data(raw_data_path):
        raw_data_df = raw_data_df.rename(raw_data_df['node'])
        raw_data_df = raw_data_df.drop(columns=['node'])
    except Exception as e:
-        logger.error('Analyzer: invalid raw data fomat - {}'.format(str(e)))
+        logger.log_and_raise(exception=IOError, msg='Analyzer: invalid raw data fomat - {}'.format(str(e)))
    return raw_data_df


@@ -54,8 +55,9 @@ def read_rules(rule_file=None):
    default_rule_file = Path(__file__).parent / 'rule/default_rule.yaml'
    p = Path(rule_file) if rule_file else default_rule_file
    if not p.is_file():
-        logger.error('FileHandler: invalid rule file path - {}'.format(str(p.resolve())))
-        return None
+        logger.log_and_raise(
+            exception=FileNotFoundError, msg='FileHandler: invalid rule file path - {}'.format(str(p.resolve()))
+        )
    baseline = None
    with p.open() as f:
        baseline = yaml.load(f, Loader=yaml.SafeLoader)
@@ -73,8 +75,9 @@ def read_baseline(baseline_file):
    """
    p = Path(baseline_file)
    if not p.is_file():
-        logger.error('FileHandler: invalid baseline file path - {}'.format(str(p.resolve())))
-        return None
+        logger.log_and_raise(
+            exception=FileNotFoundError, msg='FileHandler: invalid baseline file path - {}'.format(str(p.resolve()))
+        )
    baseline = None
    with p.open() as f:
        baseline = json.load(f)
@@ -119,45 +122,46 @@ def output_excel_data_not_accept(writer, data_not_accept_df, rules):
            worksheet = writer.sheets['Not Accept']

            for rule in rules:
-                for metric in rules[rule]['metrics']:
-                    # The column index of the metrics should start from 1
-                    col_index = columns.index(metric) + 1
-                    # Apply percent format for the columns whose rules are variance type.
-                    if rules[rule]['function'] == 'variance':
-                        worksheet.conditional_format(
-                            row_start,
-                            col_index,
-                            row_end,
-                            col_index,    # start_row, start_col, end_row, end_col
-                            {
-                                'type': 'no_blanks',
-                                'format': percent_format
-                            }
-                        )
-                    # Apply red format if the value violates the rule.
-                    if rules[rule]['function'] == 'value' or rules[rule]['function'] == 'variance':
-                        match = re.search(r'(>|<|<=|>=|==|!=)(.+)', rules[rule]['criteria'])
-                        if not match:
-                            continue
-                        symbol = match.group(1)
-                        condition = float(match.group(2))
-                        worksheet.conditional_format(
-                            row_start,
-                            col_index,
-                            row_end,
-                            col_index,    # start_row, start_col, end_row, end_col
-                            {
-                                'type': 'cell',
-                                'criteria': symbol,
-                                'value': condition,
-                                'format': color_format_red
-                            }
-                        )
+                if 'function' in rules[rule]:
+                    for metric in rules[rule]['metrics']:
+                        # The column index of the metrics should start from 1
+                        col_index = columns.index(metric) + 1
+                        # Apply percent format for the columns whose rules are variance type.
+                        if rules[rule]['function'] == 'variance':
+                            worksheet.conditional_format(
+                                row_start,
+                                col_index,
+                                row_end,
+                                col_index,    # start_row, start_col, end_row, end_col
+                                {
+                                    'type': 'no_blanks',
+                                    'format': percent_format
+                                }
+                            )
+                        # Apply red format if the value violates the rule.
+                        if rules[rule]['function'] == 'value' or rules[rule]['function'] == 'variance':
+                            match = re.search(r'(>|<|<=|>=|==|!=)(.+)', rules[rule]['criteria'])
+                            if not match:
+                                continue
+                            symbol = match.group(1)
+                            condition = float(match.group(2))
+                            worksheet.conditional_format(
+                                row_start,
+                                col_index,
+                                row_end,
+                                col_index,    # start_row, start_col, end_row, end_col
+                                {
+                                    'type': 'cell',
+                                    'criteria': symbol,
+                                    'value': condition,
+                                    'format': color_format_red
+                                }
+                            )

        else:
            logger.warning('FileHandler: excel_data_output - data_not_accept_df is empty.')
    else:
-        logger.warning('FileHandler: excel_data_output - data_not_accept_df is not DataFrame.')
+        logger.log_and_raise(RuntimeError, msg='FileHandler: excel_data_output - data_not_accept_df is not DataFrame.')


 def generate_md_table(data_df, header):
@@ -198,12 +202,11 @@ def output_lines_in_md(lines, output_path):
    """
    try:
        if len(lines) == 0:
-            logger.error('FileHandler: md_data_output failed')
-            return
+            logger.warning('FileHandler: md_data_output is empty')
        with open(output_path, 'w') as f:
            f.writelines(lines)
    except Exception as e:
-        logger.error('FileHandler: md_data_output - {}'.format(str(e)))
+        logger.log_and_raise(exception=IOError, msg='FileHandler: md_data_output - {}'.format(str(e)))


 def output_lines_in_html(lines, output_path):
@@ -215,14 +218,13 @@ def output_lines_in_html(lines, output_path):
    """
    try:
        if len(lines) == 0:
-            logger.error('FileHandler: html_data_output failed')
-            return
+            logger.warning('FileHandler: html_data_output is empty')
        lines = ''.join(lines)
        html_str = markdown.markdown(lines, extensions=['markdown.extensions.tables'])
        with open(output_path, 'w') as f:
            f.writelines(html_str)
    except Exception as e:
-        logger.error('FileHandler: html_data_output - {}'.format(str(e)))
+        logger.log_and_raise(exception=IOError, msg='FileHandler: html_data_output - {}'.format(str(e)))


 def merge_column_in_excel(ws, row, column):

--- a/superbench/analyzer/rule_base.py
+++ b/superbench/analyzer/rule_base.py
@@ -103,8 +103,7 @@ def _preprocess(self, raw_data_file, rule_file):
        self._benchmark_metrics_dict = self._get_metrics_by_benchmarks(list(self._raw_data_df.columns))
        # check raw data whether empty
        if len(self._raw_data_df) == 0:
-            logger.error('RuleBase: empty raw data')
-            return None
+            logger.log_and_raise(exception=Exception, msg='RuleBase: empty raw data')
        # read rules
        rules = file_handler.read_rules(rule_file)
        return rules
--- a/superbench/benchmarks/base.py
+++ b/superbench/benchmarks/base.py
@@ -3,6 +3,7 @@

 """Module of the base class."""

+import shlex
 import signal
 import traceback
 import argparse
@@ -39,7 +40,7 @@ def __init__(self, name, parameters=''):
            parameters (str): benchmark parameters.
        """
        self._name = name
-        self._argv = list(filter(None, parameters.split(' '))) if parameters is not None else list()
+        self._argv = list(filter(None, shlex.split(parameters))) if parameters is not None else list()
        self._benchmark_type = None
        self._parser = argparse.ArgumentParser(
            add_help=False,
@@ -170,10 +171,11 @@ def run(self):
        except BaseException as e:
            self._result.set_return_code(ReturnCode.RUNTIME_EXCEPTION_ERROR)
            logger.error('Run benchmark failed - benchmark: {}, message: {}'.format(self._name, str(e)))
+        else:
+            ret &= self._postprocess()
        finally:
            self._end_time = datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S')
            self._result.set_timestamp(self._start_time, self._end_time)
-            ret &= self._postprocess()

        return ret


--- a/superbench/benchmarks/micro_benchmarks/ib_validation_performance.py
+++ b/superbench/benchmarks/micro_benchmarks/ib_validation_performance.py
@@ -254,7 +254,7 @@ def __prepare_config(self):
            if not self._args.hostfile:
                self._args.hostfile = os.path.join(os.environ.get('SB_WORKSPACE', '.'), 'hostfile')
            with open(self._args.hostfile, 'r') as f:
-                hosts = f.readlines()
+                hosts = f.read().splitlines()
            # Generate the config file if not define
            if self._args.config is None:
                self.gen_traffic_pattern(hosts, self._args.pattern, self.__config_path)
@@ -297,15 +297,18 @@ def __prepare_general_ib_command_params(self):
        # Add GPUDirect for ib command
        gpu_dev = ''
        if self._args.gpu_dev is not None:
-            gpu = GPU()
-            if gpu.vendor == 'nvidia':
-                gpu_dev = f'--use_cuda={self._args.gpu_dev}'
-            elif gpu.vendor == 'amd':
-                gpu_dev = f'--use_rocm={self._args.gpu_dev}'
-            else:
-                self._result.set_return_code(ReturnCode.INVALID_ARGUMENT)
-                logger.error('No GPU found - benchmark: {}'.format(self._name))
-                return False
+            if 'bw' in self._args.command:
+                gpu = GPU()
+                if gpu.vendor == 'nvidia':
+                    gpu_dev = f'--use_cuda={self._args.gpu_dev}'
+                elif gpu.vendor == 'amd':
+                    gpu_dev = f'--use_rocm={self._args.gpu_dev}'
+                else:
+                    self._result.set_return_code(ReturnCode.INVALID_ARGUMENT)
+                    logger.error('No GPU found - benchmark: {}'.format(self._name))
+                    return False
+            elif 'lat' in self._args.command:
+                logger.warning('Wrong configuration: Perftest supports CUDA/ROCM only in BW tests')
        # Generate ib command params
        command_params = f'-F -n {self._args.iters} -d {self._args.ib_dev} {msg_size} {gpu_dev}'
        command_params = f'{command_params.strip()} --report_gbits'

--- a/superbench/benchmarks/micro_benchmarks/ib_validation_performance/ib_validation_performance.cc
+++ b/superbench/benchmarks/micro_benchmarks/ib_validation_performance/ib_validation_performance.cc
@@ -260,13 +260,28 @@ void gather_hostnames(vector<string> &hostnames, string filename) {
 }

 // Parse raw output of ib command
-// TODO: does not work latency tests
+// Sample of ib bw command raw
+// #bytes     #iterations BW peak[Gb/sec]    BW average[Gb/sec]  MsgRate[Mpps]
+// 8388608    5000            196.08             195.76            0.002917
+// Sample of ib latency command raw output
+// #bytes  #iterations    t_min    t_max  t_typical   t_avg    t_stdev  99% percentile   99.9% percentile
+// 8388608    5000        581.27   876.26   594.87    595.50     3.33       601.65          621.14
+// parsed result:
+// 195.76 (BW average)
+// 595.50 (t_avg)
 float process_raw_output(string output) {
    float res = -1.0;
    try {
+        string pattern;
        vector<string> lines;
        boost::split(lines, output, boost::is_any_of("\n"), boost::token_compress_on);
-        regex re("\\d+\\s+\\d+\\s+\\d+\\.\\d+\\s+(\\d+\\.\\d+)\\s+\\d+\\.\\d+");
+        if (output.find("BW") != string::npos) {
+            pattern = "\\d+\\s+\\d+\\s+\\d+\\.\\d+\\s+(\\d+\\.\\d+)\\s+\\d+\\.\\d+";
+        } else {
+            pattern = "\\d+\\s+\\d+\\s+\\d+\\.\\d+\\s+\\d+\\.\\d+\\s+\\d+\\.\\d+"
+                      "\\s+(\\d+\\.\\d+)\\s+\\d+\\.\\d+\\s+\\d+\\.\\d+\\s+\\d+\\.\\d+";
+        }
+        regex re(pattern);
        for (string line : lines) {
            smatch m;
            if (regex_search(line, m, re))