Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399)

**Description** Make baseline check optional in data diagnosis and fix bugs. **Major Revision** - make baseline file optional in data diagnosis - fix bugs of output in md and excel format when 'function' is not in the rule - fix bug in multi_rules function that miss/failed test may failed the whole process **Minor Revision** - revise doc related with data diagnosis - resolve warning message about baseline not found check, only raise exception if baseline not found in the 'variance' function - move summary fields into top of json file - unify 'Index','machine' -> 'index' in output file

Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399)
**Description** Make baseline check optional in data diagnosis and fix bugs. **Major Revision** - make baseline file optional in data diagnosis - fix bugs of output in md and excel format when 'function' is not in the rule - fix bug in multi_rules function that miss/failed test may failed the whole process **Minor Revision** - revise doc related with data diagnosis - resolve warning message about baseline not found check, only raise exception if baseline not found in the 'variance' function - move summary fields into top of json file - unify 'Index','machine' -> 'index' in output file
db842892 · Yuting Jiang · GitHub · 4c215578 · db842892 · db842892
Unverified Commit db842892 authored Sep 02, 2022 by Yuting Jiang Committed by GitHub Sep 02, 2022
16 changed files
--- a/docs/cli.md
+++ b/docs/cli.md
@@ -181,8 +181,7 @@ sb result diagnosis --baseline-file
 #### Required arguments

 | Name               | Description            |
-|------------------------|------------------------|
-| `--baseline-file` `-b` | Path to baseline file. |
+|--------------------|------------------------|
 | `--data-file` `-d` | Path to raw data file. |
 | `--rule-file` `-r` | Path to rule file.     |

@@ -190,6 +189,7 @@ sb result diagnosis --baseline-file

 | Name                    | Default | Description                                                                 |
 |-------------------------|---------|-----------------------------------------------------------------------------|
+| `--baseline-file` `-b` | Path to baseline file. |
 | `--decimal-place-value` | 2       | Number of valid decimal places to show in output. Default: 2.               |
 | `--output-all`          | N/A     | Output diagnosis results for all nodes.                                     |
 | `--output-dir`          | `None`  | Path to output directory, outputs/{datetime} will be used if not specified. |

--- a/docs/user-tutorial/data-diagnosis.md
+++ b/docs/user-tutorial/data-diagnosis.md
@@ -32,7 +32,7 @@ The input mainly includes 3 files:

 - **rule file**: It uses YAML format and includes each metrics' rules to filter defective machines for diagnosis.

- - **baseline file**: json file including the baseline values for the metrics.
+ - **baseline file (optional)**: json file including the baseline values for the metrics.

    `Tips`: this file for some representative machine types will be published in [SuperBench Results Repo](https://github.com/microsoft/superbench-results/tree/main) with the release of Superbench.

@@ -52,8 +52,8 @@ superbench:
    ${var_name}: dict
  rules:
    ${rule_name}:
-      function: string
-      criteria: string
+      function: (optional)string
+      criteria: (optional)string
      store: (optional)bool
      categories: string
      metrics:
@@ -69,7 +69,7 @@ version: v0.5
 superbench:
  rules:
    failure-rule:
-      function: value
+      function: failure_check
      criteria: lambda x:x>0
      categories: Failed
      metrics:
@@ -125,8 +125,17 @@ superbench:
        - vgg_models/pytorch-vgg.*/throughput_train_.*\
    rule6:
      function: multi_rules
-      criteria: 'lambda label:True if label["rule4"]+label["rule5"]>=2 else False'
+      criteria: 'lambda label: bool(label["rule4"]+label["rule5"]>=2)'
      categories: CNN
+    rule7:
+      categories: MODEL_DIST
+      store: True
+      metrics:
+        - model-benchmarks:stress-run.*/pytorch-gpt2-large/fp32_train_throughput
+    rule8:
+      function: multi_rules
+      criteria: 'lambda label: bool(min(label["rule7"].values()))<1)'
+      categories: MODEL_DIST
 ```

 This rule file describes the rules used for data diagnosis.
@@ -147,15 +156,18 @@ The criterion used for this rule, which indicates how to compare the data with t

 #### `store`

-True if the current rule is not used alone to filter the defective machine, but will be used by other subsequent rules. False(default) if this rule is used to label the defective machine directly.
+- True: this rule is used to store metrics which will be used by other subsequent rules.
+  - If store is True and criteria/function are not None in the rule, it will store how many metrics in this rule meet the criteria into lable["rule_name"], for example lable["rule_name"]=2 means 2 metrics are identified as defective in this rule;
+  - If store is True and criteria/function are None, it will store the dict of {metric_name: values} of the metrics into lable["rule_name"]
+- False (default): this rule is used to label the defective machine directly.

 #### `function`

 The function used for this rule.

-3 types of rules are supported currently:
+The supported functions are listed as follows:

- `variance`: the rule is to check if the variance between raw data and baseline violates the criteria. variance = (raw data - criteria) / criteria
+- `variance`: the rule is to check if the variance between raw data and baseline violates the criteria. variance = (raw data - baseline) / baseline

  For example, if the 'criteria' is `lambda x:x>0.05`, the rule is that if the variance is larger than 5%, it should be defective.

@@ -164,8 +176,16 @@ The function used for this rule.
  For example, if the 'criteria' is `lambda x:x>0`, the rule is that if the raw data is larger than the 0, it should be defective.

 - `multi_rules`: the rule is to check if the combined results of multiple previous rules and metrics violate the criteria.
+  We would like to list several examples as follows:
+  - `criteria: lambda label: bool(label["rule4"]+label["rule5"]>=2)` means that this rule will be triggered if the sum of labeled metrics in rule4 and rule5 is larger than 2
+  - `criteria: lambda label: bool(min(label["rule7"].values()))<1)` means that if the minimum of the metrics' values in rule6 is smaller than 1, it should be defective.
+    - If you reference a non-existent rule, it will raise exception.
+    - If the test in the referenced rule failed or not run resulting in exception in creteria, it will not raise exception since it will be checked in failure_rule.
+
+- `failure_check`: the rule is to check if any metric in this rule fail or miss the test. The metrics in this rule should be like `{benchmark_name}/.*:return_code` used to identify the failure.

-  For example, if the 'criteria' is 'lambda label:True if label["rule4"]+label["rule5"]>=2 else False', the rule is that if the sum of labeled metrics in rule4 and rule5 is larger than 2, it should be defective.
+  - If any item is never matched with the metrics of the raw data, the rule will identify it as miss test.
+  - If any metric violate the `value` criteria which means return_code is not success(0), the rule will identify it as failed test.

 `Tips`: you must contain a default rule for ${benchmark_name}/return_code as the above in the example, which is used to identify failed tests.


--- a/superbench/analyzer/data_diagnosis.py
+++ b/superbench/analyzer/data_diagnosis.py
@@ -63,8 +63,6 @@ def _get_baseline_of_metric(self, baseline, metric):
        """
        if metric in baseline:
            return baseline[metric]
-        elif 'return_code' in metric:
-            return 0
        else:
            short = metric
            # exclude rank info, for example, '.*:\d+'->'.*'
@@ -76,8 +74,7 @@ def _get_baseline_of_metric(self, baseline, metric):
                return baseline[short]
            # baseline not defined
            else:
-                logger.warning('DataDiagnosis: get baseline - {} baseline not found'.format(metric))
-                return -1
+                return None

    def __get_metrics_and_baseline(self, rule, benchmark_rules, baseline):
        """Get metrics with baseline in the rule.
@@ -253,7 +250,7 @@ def output_all_nodes_results(self, raw_data_df, data_not_accept_df):
            data_not_accept_df['Number Of Issues'] = data_not_accept_df['Defective Details'].map(
                lambda x: len(x.split(','))
            )
-            for index in range(len(append_columns)):
+            for index in range(len(append_columns) - 1, -1, -1):
                if append_columns[index] not in data_not_accept_df:
                    logger.log_and_raise(
                        Exception,
@@ -261,9 +258,9 @@ def output_all_nodes_results(self, raw_data_df, data_not_accept_df):
                        format(append_columns[index])
                    )
                else:
-                    all_data_df = all_data_df.merge(
-                        data_not_accept_df[[append_columns[index]]], left_index=True, right_index=True, how='left'
-                    )
+                    all_data_df = data_not_accept_df[[
+                        append_columns[index]
+                    ]].merge(all_data_df, left_index=True, right_index=True, how='right')
            all_data_df['Accept'] = all_data_df['Accept'].replace(np.nan, True)
            all_data_df['Number Of Issues'] = all_data_df['Number Of Issues'].replace(np.nan, 0)
            all_data_df['Number Of Issues'] = all_data_df['Number Of Issues'].astype(int)
@@ -329,7 +326,7 @@ def output_diagnosis_in_json(self, data_not_accept_df, output_path):
            data_not_accept_df (DataFrame): the DataFrame to output
            output_path (str): the path of output jsonl file
        """
-        data_not_accept_df['Index'] = data_not_accept_df.index
+        data_not_accept_df = data_not_accept_df.reset_index()
        data_not_accept_df = data_not_accept_df.rename(
            columns={
                'Defective Details': 'diagnosis/issue_details',
@@ -357,12 +354,11 @@ def generate_md_lines(self, data_not_accept_df, rules, round):
        """
        if len(data_not_accept_df) == 0:
            return []
-        data_not_accept_df['machine'] = data_not_accept_df.index
+        data_not_accept_df = data_not_accept_df.reset_index()
        header = data_not_accept_df.columns.tolist()
-        header = header[-1:] + header[:-1]
-        data_not_accept_df = data_not_accept_df[header]
        # format precision of values to n decimal digits
        for rule in rules:
+            if 'function' in rules[rule]:
                for metric in rules[rule]['metrics']:
                    if rules[rule]['function'] == 'variance':
                        if round and isinstance(round, int):
@@ -400,7 +396,7 @@ def run(
        try:
            rules = self._preprocess(raw_data_file, rule_file)
            # read baseline
-            baseline = file_handler.read_baseline(baseline_file)
+            baseline = file_handler.read_baseline(baseline_file) if baseline_file is not None else {}
            logger.info('DataDiagnosis: Begin to process {} nodes'.format(len(self._raw_data_df)))
            output_df, label_df = self.run_diagnosis_rules(rules, baseline)
            logger.info('DataDiagnosis: Processed finished')

--- a/superbench/analyzer/diagnosis_rule_op.py
+++ b/superbench/analyzer/diagnosis_rule_op.py
@@ -66,7 +66,7 @@ def check_criterion_with_a_value(rule):
        """
        # parse criteria and check if valid
        if not isinstance(eval(rule['criteria'])(0), bool):
-            logger.log_and_raise(exception=Exception, msg='invalid criteria format')
+            logger.log_and_raise(exception=ValueError, msg='invalid criteria format')

    @staticmethod
    def miss_test(metric, rule, data_row, details, categories):
@@ -130,8 +130,10 @@ def variance(data_row, rule, summary_data_row, details, categories):
                # check if metric pass the rule
                val = data_row[metric]
                baseline = rule['metrics'][metric]
-                if baseline == 0:
-                    logger.log_and_raise(exception=Exception, msg='invalid baseline 0 in variance rule')
+                if baseline is None or baseline == 0:
+                    logger.log_and_raise(
+                        exception=ValueError, msg='invalid baseline 0 or baseline not found in variance rule'
+                    )
                var = (val - baseline) / baseline
                summary_data_row[metric] = var
                violate_metric = eval(rule['criteria'])(var)
@@ -203,13 +205,20 @@ def multi_rules(rule, details, categories, store_values):
        Returns:
            number: 0 if the rule is passed, otherwise 1
        """
+        try:
            violated = eval(rule['criteria'])(store_values)
            if not isinstance(violated, bool):
-            logger.log_and_raise(exception=Exception, msg='invalid upper criteria format')
+                logger.log_and_raise(exception=ValueError, msg='invalid criteria format')
            if violated:
                info = '{}:{}'.format(rule['name'], rule['criteria'])
                RuleOp.add_categories_and_details(info, rule['categories'], details, categories)
            return 1 if violated else 0
+        # the key defined in criteria is not found
+        except KeyError as e:
+            logger.log_and_raise(exception=KeyError, msg='invalid criteria format - {}'.format(str(e)))
+        # miss/failed test
+        except Exception:
+            return 0

    @staticmethod
    def failure_check(data_row, rule, summary_data_row, details, categories, raw_rule):

--- a/superbench/analyzer/file_handler.py
+++ b/superbench/analyzer/file_handler.py
@@ -122,6 +122,7 @@ def output_excel_data_not_accept(writer, data_not_accept_df, rules):
            worksheet = writer.sheets['Not Accept']

            for rule in rules:
+                if 'function' in rules[rule]:
                    for metric in rules[rule]['metrics']:
                        # The column index of the metrics should start from 1
                        col_index = columns.index(metric) + 1

--- a/superbench/cli/_result_handler.py
+++ b/superbench/cli/_result_handler.py
@@ -14,7 +14,7 @@
 def diagnosis_command_handler(
    raw_data_file,
    rule_file,
-    baseline_file,
+    baseline_file=None,
    output_dir=None,
    output_file_format='excel',
    output_all=False,
@@ -40,6 +40,7 @@ def diagnosis_command_handler(
            raise CLIError('Output format must be in {}.'.format(str(supported_output_format)))
        check_argument_file('raw_data_file', raw_data_file)
        check_argument_file('rule_file', rule_file)
+        if baseline_file:
            check_argument_file('baseline_file', baseline_file)
        # Run data diagnosis
        DataDiagnosis().run(

--- a/tests/analyzer/test_data_diagnosis.py
+++ b/tests/analyzer/test_data_diagnosis.py
@@ -53,7 +53,7 @@ def test_data_diagnosis(self):
        test_raw_data_fake = str(self.parent_path / 'test_results_fake.jsonl')
        test_rule_file_fake = str(self.parent_path / 'test_rules_fake.yaml')
        diag2 = DataDiagnosis()
-        self.assertRaises(Exception, file_handler.read_raw_data, test_raw_data_fake)
+        self.assertRaises(FileNotFoundError, file_handler.read_raw_data, test_raw_data_fake)
        diag2._benchmark_metrics_dict = diag2._get_metrics_by_benchmarks([])
        assert (len(diag2._benchmark_metrics_dict) == 0)
        metric_list = [
@@ -67,7 +67,7 @@ def test_data_diagnosis(self):
            }
        )
        # Test - read rules
-        self.assertRaises(Exception, file_handler.read_rules, test_rule_file_fake)
+        self.assertRaises(FileNotFoundError, file_handler.read_rules, test_rule_file_fake)
        rules = file_handler.read_rules(test_rule_file)
        assert (rules)
        # Test - _check_and_format_rules
@@ -129,7 +129,7 @@ def test_data_diagnosis(self):
        baseline = file_handler.read_baseline(test_baseline_file)
        assert (diag1._get_baseline_of_metric(baseline, 'kernel-launch/event_overhead:0') == 0.00596)
        assert (diag1._get_baseline_of_metric(baseline, 'kernel-launch/return_code') == 0)
-        assert (diag1._get_baseline_of_metric(baseline, 'mem-bw/H2D:0') == -1)
+        assert (diag1._get_baseline_of_metric(baseline, 'mem-bw/H2D:0') is None)
        # Test - _parse_rules_and_baseline
        # Negative case
        fake_rules = []
@@ -291,6 +291,38 @@ def test_data_diagnosis_run(self):
            expect_result = f.read()
        assert (data_not_accept_read_from_json == expect_result)

+    def test_data_diagnosis_run_without_baseline(self):
+        """Test for the run process of rule-based data diagnosis."""
+        test_raw_data = str(self.parent_path / 'test_results.jsonl')
+        test_rule_file = str(self.parent_path / 'test_rules_without_baseline.yaml')
+        test_baseline_file = None
+
+        # Test - output in excel
+        DataDiagnosis().run(test_raw_data, test_rule_file, test_baseline_file, str(self.parent_path), 'excel')
+        assert (Path(self.output_excel_file).is_file())
+
+        # Test - output in json
+        DataDiagnosis().run(test_raw_data, test_rule_file, test_baseline_file, str(self.parent_path), 'json')
+        assert (Path(self.output_json_file).is_file())
+
+        # Test - output in jsonl
+        DataDiagnosis().run(test_raw_data, test_rule_file, test_baseline_file, str(self.parent_path), 'jsonl')
+        assert (Path(self.output_jsonl_file).is_file())
+
+        # Test - output in md
+        DataDiagnosis().run(test_raw_data, test_rule_file, test_baseline_file, str(self.parent_path), 'md', round=2)
+        assert (Path(self.output_md_file).is_file())
+
+        # Test - output in html
+        DataDiagnosis().run(test_raw_data, test_rule_file, test_baseline_file, str(self.parent_path), 'html', round=2)
+        assert (Path(self.output_html_file).is_file())
+
+        # Test - output all nodes results
+        DataDiagnosis().run(
+            test_raw_data, test_rule_file, test_baseline_file, str(self.parent_path), 'json', output_all=True
+        )
+        assert (Path(self.output_all_json_file).is_file())
+
    def test_mutli_rules(self):
        """Test multi rules check feature."""
        diag1 = DataDiagnosis()

--- a/tests/analyzer/test_file_handler.py
+++ b/tests/analyzer/test_file_handler.py
@@ -36,13 +36,13 @@ def test_file_handler(self):
        # Test - read_raw_data
        raw_data_df = file_handler.read_raw_data(test_raw_data)
        assert (not raw_data_df.empty)
-        self.assertRaises(Exception, file_handler.read_raw_data, test_raw_data_fake)
+        self.assertRaises(FileNotFoundError, file_handler.read_raw_data, test_raw_data_fake)
        # Test - read rules
-        self.assertRaises(Exception, file_handler.read_rules, test_rule_file_fake)
+        self.assertRaises(FileNotFoundError, file_handler.read_rules, test_rule_file_fake)
        rules = file_handler.read_rules(test_rule_file)
        assert (rules)
        # Test - read baseline
-        self.assertRaises(Exception, file_handler.read_baseline, test_aseline_file_fake)
+        self.assertRaises(FileNotFoundError, file_handler.read_baseline, test_aseline_file_fake)
        baseline = file_handler.read_baseline(test_baseline_file)
        assert (baseline)
        # Test - generate_md_table

--- a/tests/analyzer/test_rulebase.py
+++ b/tests/analyzer/test_rulebase.py
@@ -31,7 +31,7 @@ def test_rule_base(self):

        test_raw_data_fake = str(self.parent_path / 'test_results_fake.jsonl')
        rulebase2 = RuleBase()
-        self.assertRaises(Exception, file_handler.read_raw_data, test_raw_data_fake)
+        self.assertRaises(FileNotFoundError, file_handler.read_raw_data, test_raw_data_fake)
        rulebase2._benchmark_metrics_dict = rulebase2._get_metrics_by_benchmarks([])
        assert (len(rulebase2._benchmark_metrics_dict) == 0)
        metric_list = [

--- a/tests/analyzer/test_ruleop.py
+++ b/tests/analyzer/test_ruleop.py
@@ -65,6 +65,29 @@ def test_rule_op(self):
            self.assertRaises(Exception, RuleOp.variance, data_row, rule, summary_data_row, details, categories)
            self.assertRaises(Exception, RuleOp.value, data_row, rule, summary_data_row, details, categories)

+        # Negative case, if baseline is 0 or None in 'variance' function, raise error
+        false_rule_and_baselines = [
+            {
+                'categories': 'KernelLaunch',
+                'criteria': 'lambda x:x>0.5',
+                'function': 'variance',
+                'metrics': {
+                    'kernel-launch/event_overhead:0': 0,
+                }
+            },
+            {
+                'categories': 'KernelLaunch',
+                'criteria': 'lambda x:x>0.5',
+                'function': 'variance',
+                'metrics': {
+                    'kernel-launch/event_overhead:1': None,
+                }
+            },
+        ]
+
+        for rule in false_rule_and_baselines:
+            self.assertRaises(ValueError, RuleOp.variance, data_row, rule, summary_data_row, details, categories)
+
        # Positive case
        true_baselines = [
            {
@@ -132,7 +155,7 @@ def test_multi_rules_op(self):
        ]
        label = {}
        for rule in false_baselines:
-            self.assertRaises(Exception, RuleOp.multi_rules, rule, details, categories, label)
+            self.assertRaises(KeyError, RuleOp.multi_rules, false_baselines[0], details, categories, label)

        true_baselines = [
            {

--- a/tests/analyzer/test_rules_without_baseline.yaml
+++ b/tests/analyzer/test_rules_without_baseline.yaml
+# SuperBench rules
+version: v0.6
+superbench:
+  rules:
+    rule0:
+      function: value
+      criteria: lambda x:x>0
+      categories: KernelLaunch
+      metrics:
+        - kernel-launch/event_overhead:\d+
+        - kernel-launch/wall_overhead:\d+
+    rule1:
+      categories: Mem
+      store: True
+      metrics:
+        - mem-bw/H2D_Mem_BW:\d+
+        - mem-bw/D2H_Mem_BW:\d+
+    rule2:
+      function: multi_rules
+      criteria: 'lambda label: bool(min(label["rule1"].values())/max(label["rule1"].values())<0.95)'
+      categories: Mem
--- a/tests/cli/test_sb.py
+++ b/tests/cli/test_sb.py
@@ -118,6 +118,11 @@ def test_sb_result_diagnosis(self):
            'sb result diagnosis -d {dir}/test_results.jsonl -r {dir}/test_rules.yaml -b {dir}/test_baseline.json'.
            format(dir=test_analyzer_dir) + ' --output-dir outputs/test-diagnosis/ --output-all'
        )
+        self.cmd(
+            'sb result diagnosis -d {dir}/test_results.jsonl -r {dir}/test_rules_without_baseline.yaml'.
+            format(dir=test_analyzer_dir) +
+            ' --output-dir outputs/test-diagnosis/ --output-all --output-file-format json'
+        )
        # test invalid output format
        self.cmd(
            'sb result diagnosis -d {dir}/test_results.jsonl -r {dir}/test_rules.yaml -b {dir}/test_baseline.json'.

--- a/tests/data/diagnosis_summary.html
+++ b/tests/data/diagnosis_summary.html
 <table>
 <thead>
 <tr>
-<th>machine</th>
+<th>index</th>
 <th>Category</th>
 <th>Defective Details</th>
 <th>kernel-launch/event_overhead:0</th>

--- a/tests/data/diagnosis_summary.json
+++ b/tests/data/diagnosis_summary.json
 [
    {
+        "index": "sb-validation-01",
+        "diagnosis/accept": false,
+        "diagnosis/issue_num": 1,
+        "diagnosis/category": "KernelLaunch",
+        "diagnosis/issue_details": "kernel-launch/event_overhead:0(B/L: 0.0060 VAL: 0.1000 VAR: 1577.85% Rule:lambda x:x>0.05)",
        "bert_models/pytorch-bert-base/steptime_train_float32": 114.5916701062,
        "bert_models/pytorch-bert-base/throughput_train_float32": 279.8794623591,
        "bert_models/pytorch-bert-base/steptime_train_float16": 83.8895108318,
@@ -461,14 +466,14 @@
        "vgg_models/pytorch-vgg19/steptime_train_float32": 74.9348710524,
        "vgg_models/pytorch-vgg19/throughput_train_float32": 429.8092158311,
        "vgg_models/pytorch-vgg19/steptime_train_float16": 45.2033062465,
-        "vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377,
-        "diagnosis/accept": false,
-        "diagnosis/issue_num": 1,
-        "diagnosis/category": "KernelLaunch",
-        "diagnosis/issue_details": "kernel-launch/event_overhead:0(B/L: 0.0060 VAL: 0.1000 VAR: 1577.85% Rule:lambda x:x>0.05)",
-        "Index": "sb-validation-01"
+        "vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377
    },
    {
+        "index": "sb-validation-02",
+        "diagnosis/accept": true,
+        "diagnosis/issue_num": 0,
+        "diagnosis/category": "",
+        "diagnosis/issue_details": "",
        "bert_models/pytorch-bert-base/steptime_train_float32": 114.5916701062,
        "bert_models/pytorch-bert-base/throughput_train_float32": 279.8794623591,
        "bert_models/pytorch-bert-base/steptime_train_float16": 83.8895108318,
@@ -930,14 +935,14 @@
        "vgg_models/pytorch-vgg19/steptime_train_float32": 74.9348710524,
        "vgg_models/pytorch-vgg19/throughput_train_float32": 429.8092158311,
        "vgg_models/pytorch-vgg19/steptime_train_float16": 45.2033062465,
-        "vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377,
-        "diagnosis/accept": true,
-        "diagnosis/issue_num": 0,
-        "diagnosis/category": "",
-        "diagnosis/issue_details": "",
-        "Index": "sb-validation-02"
+        "vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377
    },
    {
+        "index": "sb-validation-03",
+        "diagnosis/accept": false,
+        "diagnosis/issue_num": 17,
+        "diagnosis/category": "FailedTest",
+        "diagnosis/issue_details": "mem-bw/D2H_Mem_BW:0_miss,mem-bw/D2H_Mem_BW:1_miss,mem-bw/D2H_Mem_BW:2_miss,mem-bw/D2H_Mem_BW:3_miss,mem-bw/D2H_Mem_BW:4_miss,mem-bw/D2H_Mem_BW:5_miss,mem-bw/D2H_Mem_BW:6_miss,mem-bw/D2H_Mem_BW:7_miss,mem-bw/H2D_Mem_BW:0_miss,mem-bw/H2D_Mem_BW:1_miss,mem-bw/H2D_Mem_BW:2_miss,mem-bw/H2D_Mem_BW:3_miss,mem-bw/H2D_Mem_BW:4_miss,mem-bw/H2D_Mem_BW:5_miss,mem-bw/H2D_Mem_BW:6_miss,mem-bw/H2D_Mem_BW:7_miss,mem-bw/return_code(VAL: 1.0000 Rule:lambda x:x>0)",
        "bert_models/pytorch-bert-base/steptime_train_float32": 114.5916701062,
        "bert_models/pytorch-bert-base/throughput_train_float32": 279.8794623591,
        "bert_models/pytorch-bert-base/steptime_train_float16": 83.8895108318,
@@ -1399,11 +1404,6 @@
        "vgg_models/pytorch-vgg19/steptime_train_float32": 74.9348710524,
        "vgg_models/pytorch-vgg19/throughput_train_float32": 429.8092158311,
        "vgg_models/pytorch-vgg19/steptime_train_float16": 45.2033062465,
-        "vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377,
-        "diagnosis/accept": false,
-        "diagnosis/issue_num": 17,
-        "diagnosis/category": "FailedTest",
-        "diagnosis/issue_details": "mem-bw/D2H_Mem_BW:0_miss,mem-bw/D2H_Mem_BW:1_miss,mem-bw/D2H_Mem_BW:2_miss,mem-bw/D2H_Mem_BW:3_miss,mem-bw/D2H_Mem_BW:4_miss,mem-bw/D2H_Mem_BW:5_miss,mem-bw/D2H_Mem_BW:6_miss,mem-bw/D2H_Mem_BW:7_miss,mem-bw/H2D_Mem_BW:0_miss,mem-bw/H2D_Mem_BW:1_miss,mem-bw/H2D_Mem_BW:2_miss,mem-bw/H2D_Mem_BW:3_miss,mem-bw/H2D_Mem_BW:4_miss,mem-bw/H2D_Mem_BW:5_miss,mem-bw/H2D_Mem_BW:6_miss,mem-bw/H2D_Mem_BW:7_miss,mem-bw/return_code(VAL: 1.0000 Rule:lambda x:x>0)",
-        "Index": "sb-validation-03"
+        "vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377
    }
 ]
\ No newline at end of file
--- a/tests/data/diagnosis_summary.md
+++ b/tests/data/diagnosis_summary.md
-| machine | Category | Defective Details | kernel-launch/event_overhead:0 | kernel-launch/event_overhead:1 | kernel-launch/event_overhead:2 | kernel-launch/event_overhead:3 | kernel-launch/event_overhead:4 | kernel-launch/event_overhead:5 | kernel-launch/event_overhead:6 | kernel-launch/event_overhead:7 | kernel-launch/return_code | kernel-launch/wall_overhead:0 | kernel-launch/wall_overhead:1 | kernel-launch/wall_overhead:2 | kernel-launch/wall_overhead:3 | kernel-launch/wall_overhead:4 | kernel-launch/wall_overhead:5 | kernel-launch/wall_overhead:6 | kernel-launch/wall_overhead:7 | mem-bw/D2H_Mem_BW:0 | mem-bw/D2H_Mem_BW:1 | mem-bw/D2H_Mem_BW:2 | mem-bw/D2H_Mem_BW:3 | mem-bw/D2H_Mem_BW:4 | mem-bw/D2H_Mem_BW:5 | mem-bw/D2H_Mem_BW:6 | mem-bw/D2H_Mem_BW:7 | mem-bw/H2D_Mem_BW:0 | mem-bw/H2D_Mem_BW:1 | mem-bw/H2D_Mem_BW:2 | mem-bw/H2D_Mem_BW:3 | mem-bw/H2D_Mem_BW:4 | mem-bw/H2D_Mem_BW:5 | mem-bw/H2D_Mem_BW:6 | mem-bw/H2D_Mem_BW:7 | mem-bw/return_code |
+| index | Category | Defective Details | kernel-launch/event_overhead:0 | kernel-launch/event_overhead:1 | kernel-launch/event_overhead:2 | kernel-launch/event_overhead:3 | kernel-launch/event_overhead:4 | kernel-launch/event_overhead:5 | kernel-launch/event_overhead:6 | kernel-launch/event_overhead:7 | kernel-launch/return_code | kernel-launch/wall_overhead:0 | kernel-launch/wall_overhead:1 | kernel-launch/wall_overhead:2 | kernel-launch/wall_overhead:3 | kernel-launch/wall_overhead:4 | kernel-launch/wall_overhead:5 | kernel-launch/wall_overhead:6 | kernel-launch/wall_overhead:7 | mem-bw/D2H_Mem_BW:0 | mem-bw/D2H_Mem_BW:1 | mem-bw/D2H_Mem_BW:2 | mem-bw/D2H_Mem_BW:3 | mem-bw/D2H_Mem_BW:4 | mem-bw/D2H_Mem_BW:5 | mem-bw/D2H_Mem_BW:6 | mem-bw/D2H_Mem_BW:7 | mem-bw/H2D_Mem_BW:0 | mem-bw/H2D_Mem_BW:1 | mem-bw/H2D_Mem_BW:2 | mem-bw/H2D_Mem_BW:3 | mem-bw/H2D_Mem_BW:4 | mem-bw/H2D_Mem_BW:5 | mem-bw/H2D_Mem_BW:6 | mem-bw/H2D_Mem_BW:7 | mem-bw/return_code |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | sb-validation-01 | KernelLaunch | kernel-launch/event_overhead:0(B/L: 0.0060 VAL: 0.1000 VAR: 1577.85% Rule:lambda x:x>0.05) | 1577.85% | -0.17% | -6.54% | -7.72% | -0.67% | -1.17% | -4.03% | -1.01% | 0.0 | 0.0% | 0.0% | 1.95% | 2.24% | 3.61% | -1.95% | 1.85% | 4.39% | 0.0% | 1.23% | 0.82% | 1.23% | 0.0% | 0.0% | -1.65% | 1.23% | 0.0% | 0.78% | 1.56% | 1.95% | 2.34% | 0.78% | -1.17% | 1.95% | 0.0 |
 | sb-validation-03 | FailedTest | mem-bw/D2H_Mem_BW:0_miss,mem-bw/D2H_Mem_BW:1_miss,mem-bw/D2H_Mem_BW:2_miss,mem-bw/D2H_Mem_BW:3_miss,mem-bw/D2H_Mem_BW:4_miss,mem-bw/D2H_Mem_BW:5_miss,mem-bw/D2H_Mem_BW:6_miss,mem-bw/D2H_Mem_BW:7_miss,mem-bw/H2D_Mem_BW:0_miss,mem-bw/H2D_Mem_BW:1_miss,mem-bw/H2D_Mem_BW:2_miss,mem-bw/H2D_Mem_BW:3_miss,mem-bw/H2D_Mem_BW:4_miss,mem-bw/H2D_Mem_BW:5_miss,mem-bw/H2D_Mem_BW:6_miss,mem-bw/H2D_Mem_BW:7_miss,mem-bw/return_code(VAL: 1.0000 Rule:lambda x:x>0) | 0.0% | -0.17% | -6.54% | -7.72% | -0.67% | -1.17% | -4.03% | -1.01% | 0.0 | 0.0% | 0.0% | 1.95% | 2.24% | 3.61% | -1.95% | 1.85% | 4.39% | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1.0 |
--- a/tests/data/diagnosis_summary_json.json
+++ b/tests/data/diagnosis_summary_json.json
 [
    {
+        "index": "sb-validation-01",
        "diagnosis/category": "KernelLaunch",
        "diagnosis/issue_details": "kernel-launch/event_overhead:0(B/L: 0.0060 VAL: 0.1000 VAR: 1577.85% Rule:lambda x:x>0.05)",
        "kernel-launch/event_overhead:0": 15.7785234899,
@@ -35,10 +36,10 @@
        "mem-bw/H2D_Mem_BW:5": 0.0078125,
        "mem-bw/H2D_Mem_BW:6": -0.01171875,
        "mem-bw/H2D_Mem_BW:7": 0.01953125,
-        "mem-bw/return_code": 0.0,
-        "Index": "sb-validation-01"
+        "mem-bw/return_code": 0.0
    },
    {
+        "index": "sb-validation-03",
        "diagnosis/category": "FailedTest",
        "diagnosis/issue_details": "mem-bw/D2H_Mem_BW:0_miss,mem-bw/D2H_Mem_BW:1_miss,mem-bw/D2H_Mem_BW:2_miss,mem-bw/D2H_Mem_BW:3_miss,mem-bw/D2H_Mem_BW:4_miss,mem-bw/D2H_Mem_BW:5_miss,mem-bw/D2H_Mem_BW:6_miss,mem-bw/D2H_Mem_BW:7_miss,mem-bw/H2D_Mem_BW:0_miss,mem-bw/H2D_Mem_BW:1_miss,mem-bw/H2D_Mem_BW:2_miss,mem-bw/H2D_Mem_BW:3_miss,mem-bw/H2D_Mem_BW:4_miss,mem-bw/H2D_Mem_BW:5_miss,mem-bw/H2D_Mem_BW:6_miss,mem-bw/H2D_Mem_BW:7_miss,mem-bw/return_code(VAL: 1.0000 Rule:lambda x:x>0)",
        "kernel-launch/event_overhead:0": 0.0,
@@ -74,7 +75,6 @@
        "mem-bw/H2D_Mem_BW:5": null,
        "mem-bw/H2D_Mem_BW:6": null,
        "mem-bw/H2D_Mem_BW:7": null,
-        "mem-bw/return_code": 1.0,
-        "Index": "sb-validation-03"
+        "mem-bw/return_code": 1.0
    }
 ]
\ No newline at end of file