Unverified Commit db842892 authored by Yuting Jiang's avatar Yuting Jiang Committed by GitHub
Browse files

Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399)

**Description**
 Make baseline check optional in data diagnosis and fix bugs.

**Major Revision**
- make baseline file optional in data diagnosis
- fix bugs of output in md and excel format when 'function' is not in the rule
- fix bug in multi_rules function that miss/failed test may failed the whole process

**Minor Revision**
- revise doc related with data diagnosis
- resolve warning message about baseline not found check, only raise exception if baseline not found in the  'variance' function
- move summary fields into top of json file
- unify 'Index','machine' -> 'index' in output file
parent 4c215578
......@@ -181,8 +181,7 @@ sb result diagnosis --baseline-file
#### Required arguments
| Name | Description |
|------------------------|------------------------|
| `--baseline-file` `-b` | Path to baseline file. |
|--------------------|------------------------|
| `--data-file` `-d` | Path to raw data file. |
| `--rule-file` `-r` | Path to rule file. |
......@@ -190,6 +189,7 @@ sb result diagnosis --baseline-file
| Name | Default | Description |
|-------------------------|---------|-----------------------------------------------------------------------------|
| `--baseline-file` `-b` | Path to baseline file. |
| `--decimal-place-value` | 2 | Number of valid decimal places to show in output. Default: 2. |
| `--output-all` | N/A | Output diagnosis results for all nodes. |
| `--output-dir` | `None` | Path to output directory, outputs/{datetime} will be used if not specified. |
......
......@@ -32,7 +32,7 @@ The input mainly includes 3 files:
- **rule file**: It uses YAML format and includes each metrics' rules to filter defective machines for diagnosis.
- **baseline file**: json file including the baseline values for the metrics.
- **baseline file (optional)**: json file including the baseline values for the metrics.
`Tips`: this file for some representative machine types will be published in [SuperBench Results Repo](https://github.com/microsoft/superbench-results/tree/main) with the release of Superbench.
......@@ -52,8 +52,8 @@ superbench:
${var_name}: dict
rules:
${rule_name}:
function: string
criteria: string
function: (optional)string
criteria: (optional)string
store: (optional)bool
categories: string
metrics:
......@@ -69,7 +69,7 @@ version: v0.5
superbench:
rules:
failure-rule:
function: value
function: failure_check
criteria: lambda x:x>0
categories: Failed
metrics:
......@@ -125,8 +125,17 @@ superbench:
- vgg_models/pytorch-vgg.*/throughput_train_.*\
rule6:
function: multi_rules
criteria: 'lambda label:True if label["rule4"]+label["rule5"]>=2 else False'
criteria: 'lambda label: bool(label["rule4"]+label["rule5"]>=2)'
categories: CNN
rule7:
categories: MODEL_DIST
store: True
metrics:
- model-benchmarks:stress-run.*/pytorch-gpt2-large/fp32_train_throughput
rule8:
function: multi_rules
criteria: 'lambda label: bool(min(label["rule7"].values()))<1)'
categories: MODEL_DIST
```
This rule file describes the rules used for data diagnosis.
......@@ -147,15 +156,18 @@ The criterion used for this rule, which indicates how to compare the data with t
#### `store`
True if the current rule is not used alone to filter the defective machine, but will be used by other subsequent rules. False(default) if this rule is used to label the defective machine directly.
- True: this rule is used to store metrics which will be used by other subsequent rules.
- If store is True and criteria/function are not None in the rule, it will store how many metrics in this rule meet the criteria into lable["rule_name"], for example lable["rule_name"]=2 means 2 metrics are identified as defective in this rule;
- If store is True and criteria/function are None, it will store the dict of {metric_name: values} of the metrics into lable["rule_name"]
- False (default): this rule is used to label the defective machine directly.
#### `function`
The function used for this rule.
3 types of rules are supported currently:
The supported functions are listed as follows:
- `variance`: the rule is to check if the variance between raw data and baseline violates the criteria. variance = (raw data - criteria) / criteria
- `variance`: the rule is to check if the variance between raw data and baseline violates the criteria. variance = (raw data - baseline) / baseline
For example, if the 'criteria' is `lambda x:x>0.05`, the rule is that if the variance is larger than 5%, it should be defective.
......@@ -164,8 +176,16 @@ The function used for this rule.
For example, if the 'criteria' is `lambda x:x>0`, the rule is that if the raw data is larger than the 0, it should be defective.
- `multi_rules`: the rule is to check if the combined results of multiple previous rules and metrics violate the criteria.
We would like to list several examples as follows:
- `criteria: lambda label: bool(label["rule4"]+label["rule5"]>=2)` means that this rule will be triggered if the sum of labeled metrics in rule4 and rule5 is larger than 2
- `criteria: lambda label: bool(min(label["rule7"].values()))<1)` means that if the minimum of the metrics' values in rule6 is smaller than 1, it should be defective.
- If you reference a non-existent rule, it will raise exception.
- If the test in the referenced rule failed or not run resulting in exception in creteria, it will not raise exception since it will be checked in failure_rule.
- `failure_check`: the rule is to check if any metric in this rule fail or miss the test. The metrics in this rule should be like `{benchmark_name}/.*:return_code` used to identify the failure.
For example, if the 'criteria' is 'lambda label:True if label["rule4"]+label["rule5"]>=2 else False', the rule is that if the sum of labeled metrics in rule4 and rule5 is larger than 2, it should be defective.
- If any item is never matched with the metrics of the raw data, the rule will identify it as miss test.
- If any metric violate the `value` criteria which means return_code is not success(0), the rule will identify it as failed test.
`Tips`: you must contain a default rule for ${benchmark_name}/return_code as the above in the example, which is used to identify failed tests.
......
......@@ -63,8 +63,6 @@ def _get_baseline_of_metric(self, baseline, metric):
"""
if metric in baseline:
return baseline[metric]
elif 'return_code' in metric:
return 0
else:
short = metric
# exclude rank info, for example, '.*:\d+'->'.*'
......@@ -76,8 +74,7 @@ def _get_baseline_of_metric(self, baseline, metric):
return baseline[short]
# baseline not defined
else:
logger.warning('DataDiagnosis: get baseline - {} baseline not found'.format(metric))
return -1
return None
def __get_metrics_and_baseline(self, rule, benchmark_rules, baseline):
"""Get metrics with baseline in the rule.
......@@ -253,7 +250,7 @@ def output_all_nodes_results(self, raw_data_df, data_not_accept_df):
data_not_accept_df['Number Of Issues'] = data_not_accept_df['Defective Details'].map(
lambda x: len(x.split(','))
)
for index in range(len(append_columns)):
for index in range(len(append_columns) - 1, -1, -1):
if append_columns[index] not in data_not_accept_df:
logger.log_and_raise(
Exception,
......@@ -261,9 +258,9 @@ def output_all_nodes_results(self, raw_data_df, data_not_accept_df):
format(append_columns[index])
)
else:
all_data_df = all_data_df.merge(
data_not_accept_df[[append_columns[index]]], left_index=True, right_index=True, how='left'
)
all_data_df = data_not_accept_df[[
append_columns[index]
]].merge(all_data_df, left_index=True, right_index=True, how='right')
all_data_df['Accept'] = all_data_df['Accept'].replace(np.nan, True)
all_data_df['Number Of Issues'] = all_data_df['Number Of Issues'].replace(np.nan, 0)
all_data_df['Number Of Issues'] = all_data_df['Number Of Issues'].astype(int)
......@@ -329,7 +326,7 @@ def output_diagnosis_in_json(self, data_not_accept_df, output_path):
data_not_accept_df (DataFrame): the DataFrame to output
output_path (str): the path of output jsonl file
"""
data_not_accept_df['Index'] = data_not_accept_df.index
data_not_accept_df = data_not_accept_df.reset_index()
data_not_accept_df = data_not_accept_df.rename(
columns={
'Defective Details': 'diagnosis/issue_details',
......@@ -357,12 +354,11 @@ def generate_md_lines(self, data_not_accept_df, rules, round):
"""
if len(data_not_accept_df) == 0:
return []
data_not_accept_df['machine'] = data_not_accept_df.index
data_not_accept_df = data_not_accept_df.reset_index()
header = data_not_accept_df.columns.tolist()
header = header[-1:] + header[:-1]
data_not_accept_df = data_not_accept_df[header]
# format precision of values to n decimal digits
for rule in rules:
if 'function' in rules[rule]:
for metric in rules[rule]['metrics']:
if rules[rule]['function'] == 'variance':
if round and isinstance(round, int):
......@@ -400,7 +396,7 @@ def run(
try:
rules = self._preprocess(raw_data_file, rule_file)
# read baseline
baseline = file_handler.read_baseline(baseline_file)
baseline = file_handler.read_baseline(baseline_file) if baseline_file is not None else {}
logger.info('DataDiagnosis: Begin to process {} nodes'.format(len(self._raw_data_df)))
output_df, label_df = self.run_diagnosis_rules(rules, baseline)
logger.info('DataDiagnosis: Processed finished')
......
......@@ -66,7 +66,7 @@ def check_criterion_with_a_value(rule):
"""
# parse criteria and check if valid
if not isinstance(eval(rule['criteria'])(0), bool):
logger.log_and_raise(exception=Exception, msg='invalid criteria format')
logger.log_and_raise(exception=ValueError, msg='invalid criteria format')
@staticmethod
def miss_test(metric, rule, data_row, details, categories):
......@@ -130,8 +130,10 @@ def variance(data_row, rule, summary_data_row, details, categories):
# check if metric pass the rule
val = data_row[metric]
baseline = rule['metrics'][metric]
if baseline == 0:
logger.log_and_raise(exception=Exception, msg='invalid baseline 0 in variance rule')
if baseline is None or baseline == 0:
logger.log_and_raise(
exception=ValueError, msg='invalid baseline 0 or baseline not found in variance rule'
)
var = (val - baseline) / baseline
summary_data_row[metric] = var
violate_metric = eval(rule['criteria'])(var)
......@@ -203,13 +205,20 @@ def multi_rules(rule, details, categories, store_values):
Returns:
number: 0 if the rule is passed, otherwise 1
"""
try:
violated = eval(rule['criteria'])(store_values)
if not isinstance(violated, bool):
logger.log_and_raise(exception=Exception, msg='invalid upper criteria format')
logger.log_and_raise(exception=ValueError, msg='invalid criteria format')
if violated:
info = '{}:{}'.format(rule['name'], rule['criteria'])
RuleOp.add_categories_and_details(info, rule['categories'], details, categories)
return 1 if violated else 0
# the key defined in criteria is not found
except KeyError as e:
logger.log_and_raise(exception=KeyError, msg='invalid criteria format - {}'.format(str(e)))
# miss/failed test
except Exception:
return 0
@staticmethod
def failure_check(data_row, rule, summary_data_row, details, categories, raw_rule):
......
......@@ -122,6 +122,7 @@ def output_excel_data_not_accept(writer, data_not_accept_df, rules):
worksheet = writer.sheets['Not Accept']
for rule in rules:
if 'function' in rules[rule]:
for metric in rules[rule]['metrics']:
# The column index of the metrics should start from 1
col_index = columns.index(metric) + 1
......
......@@ -14,7 +14,7 @@
def diagnosis_command_handler(
raw_data_file,
rule_file,
baseline_file,
baseline_file=None,
output_dir=None,
output_file_format='excel',
output_all=False,
......@@ -40,6 +40,7 @@ def diagnosis_command_handler(
raise CLIError('Output format must be in {}.'.format(str(supported_output_format)))
check_argument_file('raw_data_file', raw_data_file)
check_argument_file('rule_file', rule_file)
if baseline_file:
check_argument_file('baseline_file', baseline_file)
# Run data diagnosis
DataDiagnosis().run(
......
......@@ -53,7 +53,7 @@ def test_data_diagnosis(self):
test_raw_data_fake = str(self.parent_path / 'test_results_fake.jsonl')
test_rule_file_fake = str(self.parent_path / 'test_rules_fake.yaml')
diag2 = DataDiagnosis()
self.assertRaises(Exception, file_handler.read_raw_data, test_raw_data_fake)
self.assertRaises(FileNotFoundError, file_handler.read_raw_data, test_raw_data_fake)
diag2._benchmark_metrics_dict = diag2._get_metrics_by_benchmarks([])
assert (len(diag2._benchmark_metrics_dict) == 0)
metric_list = [
......@@ -67,7 +67,7 @@ def test_data_diagnosis(self):
}
)
# Test - read rules
self.assertRaises(Exception, file_handler.read_rules, test_rule_file_fake)
self.assertRaises(FileNotFoundError, file_handler.read_rules, test_rule_file_fake)
rules = file_handler.read_rules(test_rule_file)
assert (rules)
# Test - _check_and_format_rules
......@@ -129,7 +129,7 @@ def test_data_diagnosis(self):
baseline = file_handler.read_baseline(test_baseline_file)
assert (diag1._get_baseline_of_metric(baseline, 'kernel-launch/event_overhead:0') == 0.00596)
assert (diag1._get_baseline_of_metric(baseline, 'kernel-launch/return_code') == 0)
assert (diag1._get_baseline_of_metric(baseline, 'mem-bw/H2D:0') == -1)
assert (diag1._get_baseline_of_metric(baseline, 'mem-bw/H2D:0') is None)
# Test - _parse_rules_and_baseline
# Negative case
fake_rules = []
......@@ -291,6 +291,38 @@ def test_data_diagnosis_run(self):
expect_result = f.read()
assert (data_not_accept_read_from_json == expect_result)
def test_data_diagnosis_run_without_baseline(self):
"""Test for the run process of rule-based data diagnosis."""
test_raw_data = str(self.parent_path / 'test_results.jsonl')
test_rule_file = str(self.parent_path / 'test_rules_without_baseline.yaml')
test_baseline_file = None
# Test - output in excel
DataDiagnosis().run(test_raw_data, test_rule_file, test_baseline_file, str(self.parent_path), 'excel')
assert (Path(self.output_excel_file).is_file())
# Test - output in json
DataDiagnosis().run(test_raw_data, test_rule_file, test_baseline_file, str(self.parent_path), 'json')
assert (Path(self.output_json_file).is_file())
# Test - output in jsonl
DataDiagnosis().run(test_raw_data, test_rule_file, test_baseline_file, str(self.parent_path), 'jsonl')
assert (Path(self.output_jsonl_file).is_file())
# Test - output in md
DataDiagnosis().run(test_raw_data, test_rule_file, test_baseline_file, str(self.parent_path), 'md', round=2)
assert (Path(self.output_md_file).is_file())
# Test - output in html
DataDiagnosis().run(test_raw_data, test_rule_file, test_baseline_file, str(self.parent_path), 'html', round=2)
assert (Path(self.output_html_file).is_file())
# Test - output all nodes results
DataDiagnosis().run(
test_raw_data, test_rule_file, test_baseline_file, str(self.parent_path), 'json', output_all=True
)
assert (Path(self.output_all_json_file).is_file())
def test_mutli_rules(self):
"""Test multi rules check feature."""
diag1 = DataDiagnosis()
......
......@@ -36,13 +36,13 @@ def test_file_handler(self):
# Test - read_raw_data
raw_data_df = file_handler.read_raw_data(test_raw_data)
assert (not raw_data_df.empty)
self.assertRaises(Exception, file_handler.read_raw_data, test_raw_data_fake)
self.assertRaises(FileNotFoundError, file_handler.read_raw_data, test_raw_data_fake)
# Test - read rules
self.assertRaises(Exception, file_handler.read_rules, test_rule_file_fake)
self.assertRaises(FileNotFoundError, file_handler.read_rules, test_rule_file_fake)
rules = file_handler.read_rules(test_rule_file)
assert (rules)
# Test - read baseline
self.assertRaises(Exception, file_handler.read_baseline, test_aseline_file_fake)
self.assertRaises(FileNotFoundError, file_handler.read_baseline, test_aseline_file_fake)
baseline = file_handler.read_baseline(test_baseline_file)
assert (baseline)
# Test - generate_md_table
......
......@@ -31,7 +31,7 @@ def test_rule_base(self):
test_raw_data_fake = str(self.parent_path / 'test_results_fake.jsonl')
rulebase2 = RuleBase()
self.assertRaises(Exception, file_handler.read_raw_data, test_raw_data_fake)
self.assertRaises(FileNotFoundError, file_handler.read_raw_data, test_raw_data_fake)
rulebase2._benchmark_metrics_dict = rulebase2._get_metrics_by_benchmarks([])
assert (len(rulebase2._benchmark_metrics_dict) == 0)
metric_list = [
......
......@@ -65,6 +65,29 @@ def test_rule_op(self):
self.assertRaises(Exception, RuleOp.variance, data_row, rule, summary_data_row, details, categories)
self.assertRaises(Exception, RuleOp.value, data_row, rule, summary_data_row, details, categories)
# Negative case, if baseline is 0 or None in 'variance' function, raise error
false_rule_and_baselines = [
{
'categories': 'KernelLaunch',
'criteria': 'lambda x:x>0.5',
'function': 'variance',
'metrics': {
'kernel-launch/event_overhead:0': 0,
}
},
{
'categories': 'KernelLaunch',
'criteria': 'lambda x:x>0.5',
'function': 'variance',
'metrics': {
'kernel-launch/event_overhead:1': None,
}
},
]
for rule in false_rule_and_baselines:
self.assertRaises(ValueError, RuleOp.variance, data_row, rule, summary_data_row, details, categories)
# Positive case
true_baselines = [
{
......@@ -132,7 +155,7 @@ def test_multi_rules_op(self):
]
label = {}
for rule in false_baselines:
self.assertRaises(Exception, RuleOp.multi_rules, rule, details, categories, label)
self.assertRaises(KeyError, RuleOp.multi_rules, false_baselines[0], details, categories, label)
true_baselines = [
{
......
# SuperBench rules
version: v0.6
superbench:
rules:
rule0:
function: value
criteria: lambda x:x>0
categories: KernelLaunch
metrics:
- kernel-launch/event_overhead:\d+
- kernel-launch/wall_overhead:\d+
rule1:
categories: Mem
store: True
metrics:
- mem-bw/H2D_Mem_BW:\d+
- mem-bw/D2H_Mem_BW:\d+
rule2:
function: multi_rules
criteria: 'lambda label: bool(min(label["rule1"].values())/max(label["rule1"].values())<0.95)'
categories: Mem
......@@ -118,6 +118,11 @@ def test_sb_result_diagnosis(self):
'sb result diagnosis -d {dir}/test_results.jsonl -r {dir}/test_rules.yaml -b {dir}/test_baseline.json'.
format(dir=test_analyzer_dir) + ' --output-dir outputs/test-diagnosis/ --output-all'
)
self.cmd(
'sb result diagnosis -d {dir}/test_results.jsonl -r {dir}/test_rules_without_baseline.yaml'.
format(dir=test_analyzer_dir) +
' --output-dir outputs/test-diagnosis/ --output-all --output-file-format json'
)
# test invalid output format
self.cmd(
'sb result diagnosis -d {dir}/test_results.jsonl -r {dir}/test_rules.yaml -b {dir}/test_baseline.json'.
......
<table>
<thead>
<tr>
<th>machine</th>
<th>index</th>
<th>Category</th>
<th>Defective Details</th>
<th>kernel-launch/event_overhead:0</th>
......
[
{
"index": "sb-validation-01",
"diagnosis/accept": false,
"diagnosis/issue_num": 1,
"diagnosis/category": "KernelLaunch",
"diagnosis/issue_details": "kernel-launch/event_overhead:0(B/L: 0.0060 VAL: 0.1000 VAR: 1577.85% Rule:lambda x:x>0.05)",
"bert_models/pytorch-bert-base/steptime_train_float32": 114.5916701062,
"bert_models/pytorch-bert-base/throughput_train_float32": 279.8794623591,
"bert_models/pytorch-bert-base/steptime_train_float16": 83.8895108318,
......@@ -461,14 +466,14 @@
"vgg_models/pytorch-vgg19/steptime_train_float32": 74.9348710524,
"vgg_models/pytorch-vgg19/throughput_train_float32": 429.8092158311,
"vgg_models/pytorch-vgg19/steptime_train_float16": 45.2033062465,
"vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377,
"diagnosis/accept": false,
"diagnosis/issue_num": 1,
"diagnosis/category": "KernelLaunch",
"diagnosis/issue_details": "kernel-launch/event_overhead:0(B/L: 0.0060 VAL: 0.1000 VAR: 1577.85% Rule:lambda x:x>0.05)",
"Index": "sb-validation-01"
"vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377
},
{
"index": "sb-validation-02",
"diagnosis/accept": true,
"diagnosis/issue_num": 0,
"diagnosis/category": "",
"diagnosis/issue_details": "",
"bert_models/pytorch-bert-base/steptime_train_float32": 114.5916701062,
"bert_models/pytorch-bert-base/throughput_train_float32": 279.8794623591,
"bert_models/pytorch-bert-base/steptime_train_float16": 83.8895108318,
......@@ -930,14 +935,14 @@
"vgg_models/pytorch-vgg19/steptime_train_float32": 74.9348710524,
"vgg_models/pytorch-vgg19/throughput_train_float32": 429.8092158311,
"vgg_models/pytorch-vgg19/steptime_train_float16": 45.2033062465,
"vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377,
"diagnosis/accept": true,
"diagnosis/issue_num": 0,
"diagnosis/category": "",
"diagnosis/issue_details": "",
"Index": "sb-validation-02"
"vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377
},
{
"index": "sb-validation-03",
"diagnosis/accept": false,
"diagnosis/issue_num": 17,
"diagnosis/category": "FailedTest",
"diagnosis/issue_details": "mem-bw/D2H_Mem_BW:0_miss,mem-bw/D2H_Mem_BW:1_miss,mem-bw/D2H_Mem_BW:2_miss,mem-bw/D2H_Mem_BW:3_miss,mem-bw/D2H_Mem_BW:4_miss,mem-bw/D2H_Mem_BW:5_miss,mem-bw/D2H_Mem_BW:6_miss,mem-bw/D2H_Mem_BW:7_miss,mem-bw/H2D_Mem_BW:0_miss,mem-bw/H2D_Mem_BW:1_miss,mem-bw/H2D_Mem_BW:2_miss,mem-bw/H2D_Mem_BW:3_miss,mem-bw/H2D_Mem_BW:4_miss,mem-bw/H2D_Mem_BW:5_miss,mem-bw/H2D_Mem_BW:6_miss,mem-bw/H2D_Mem_BW:7_miss,mem-bw/return_code(VAL: 1.0000 Rule:lambda x:x>0)",
"bert_models/pytorch-bert-base/steptime_train_float32": 114.5916701062,
"bert_models/pytorch-bert-base/throughput_train_float32": 279.8794623591,
"bert_models/pytorch-bert-base/steptime_train_float16": 83.8895108318,
......@@ -1399,11 +1404,6 @@
"vgg_models/pytorch-vgg19/steptime_train_float32": 74.9348710524,
"vgg_models/pytorch-vgg19/throughput_train_float32": 429.8092158311,
"vgg_models/pytorch-vgg19/steptime_train_float16": 45.2033062465,
"vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377,
"diagnosis/accept": false,
"diagnosis/issue_num": 17,
"diagnosis/category": "FailedTest",
"diagnosis/issue_details": "mem-bw/D2H_Mem_BW:0_miss,mem-bw/D2H_Mem_BW:1_miss,mem-bw/D2H_Mem_BW:2_miss,mem-bw/D2H_Mem_BW:3_miss,mem-bw/D2H_Mem_BW:4_miss,mem-bw/D2H_Mem_BW:5_miss,mem-bw/D2H_Mem_BW:6_miss,mem-bw/D2H_Mem_BW:7_miss,mem-bw/H2D_Mem_BW:0_miss,mem-bw/H2D_Mem_BW:1_miss,mem-bw/H2D_Mem_BW:2_miss,mem-bw/H2D_Mem_BW:3_miss,mem-bw/H2D_Mem_BW:4_miss,mem-bw/H2D_Mem_BW:5_miss,mem-bw/H2D_Mem_BW:6_miss,mem-bw/H2D_Mem_BW:7_miss,mem-bw/return_code(VAL: 1.0000 Rule:lambda x:x>0)",
"Index": "sb-validation-03"
"vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377
}
]
\ No newline at end of file
| machine | Category | Defective Details | kernel-launch/event_overhead:0 | kernel-launch/event_overhead:1 | kernel-launch/event_overhead:2 | kernel-launch/event_overhead:3 | kernel-launch/event_overhead:4 | kernel-launch/event_overhead:5 | kernel-launch/event_overhead:6 | kernel-launch/event_overhead:7 | kernel-launch/return_code | kernel-launch/wall_overhead:0 | kernel-launch/wall_overhead:1 | kernel-launch/wall_overhead:2 | kernel-launch/wall_overhead:3 | kernel-launch/wall_overhead:4 | kernel-launch/wall_overhead:5 | kernel-launch/wall_overhead:6 | kernel-launch/wall_overhead:7 | mem-bw/D2H_Mem_BW:0 | mem-bw/D2H_Mem_BW:1 | mem-bw/D2H_Mem_BW:2 | mem-bw/D2H_Mem_BW:3 | mem-bw/D2H_Mem_BW:4 | mem-bw/D2H_Mem_BW:5 | mem-bw/D2H_Mem_BW:6 | mem-bw/D2H_Mem_BW:7 | mem-bw/H2D_Mem_BW:0 | mem-bw/H2D_Mem_BW:1 | mem-bw/H2D_Mem_BW:2 | mem-bw/H2D_Mem_BW:3 | mem-bw/H2D_Mem_BW:4 | mem-bw/H2D_Mem_BW:5 | mem-bw/H2D_Mem_BW:6 | mem-bw/H2D_Mem_BW:7 | mem-bw/return_code |
| index | Category | Defective Details | kernel-launch/event_overhead:0 | kernel-launch/event_overhead:1 | kernel-launch/event_overhead:2 | kernel-launch/event_overhead:3 | kernel-launch/event_overhead:4 | kernel-launch/event_overhead:5 | kernel-launch/event_overhead:6 | kernel-launch/event_overhead:7 | kernel-launch/return_code | kernel-launch/wall_overhead:0 | kernel-launch/wall_overhead:1 | kernel-launch/wall_overhead:2 | kernel-launch/wall_overhead:3 | kernel-launch/wall_overhead:4 | kernel-launch/wall_overhead:5 | kernel-launch/wall_overhead:6 | kernel-launch/wall_overhead:7 | mem-bw/D2H_Mem_BW:0 | mem-bw/D2H_Mem_BW:1 | mem-bw/D2H_Mem_BW:2 | mem-bw/D2H_Mem_BW:3 | mem-bw/D2H_Mem_BW:4 | mem-bw/D2H_Mem_BW:5 | mem-bw/D2H_Mem_BW:6 | mem-bw/D2H_Mem_BW:7 | mem-bw/H2D_Mem_BW:0 | mem-bw/H2D_Mem_BW:1 | mem-bw/H2D_Mem_BW:2 | mem-bw/H2D_Mem_BW:3 | mem-bw/H2D_Mem_BW:4 | mem-bw/H2D_Mem_BW:5 | mem-bw/H2D_Mem_BW:6 | mem-bw/H2D_Mem_BW:7 | mem-bw/return_code |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| sb-validation-01 | KernelLaunch | kernel-launch/event_overhead:0(B/L: 0.0060 VAL: 0.1000 VAR: 1577.85% Rule:lambda x:x>0.05) | 1577.85% | -0.17% | -6.54% | -7.72% | -0.67% | -1.17% | -4.03% | -1.01% | 0.0 | 0.0% | 0.0% | 1.95% | 2.24% | 3.61% | -1.95% | 1.85% | 4.39% | 0.0% | 1.23% | 0.82% | 1.23% | 0.0% | 0.0% | -1.65% | 1.23% | 0.0% | 0.78% | 1.56% | 1.95% | 2.34% | 0.78% | -1.17% | 1.95% | 0.0 |
| sb-validation-03 | FailedTest | mem-bw/D2H_Mem_BW:0_miss,mem-bw/D2H_Mem_BW:1_miss,mem-bw/D2H_Mem_BW:2_miss,mem-bw/D2H_Mem_BW:3_miss,mem-bw/D2H_Mem_BW:4_miss,mem-bw/D2H_Mem_BW:5_miss,mem-bw/D2H_Mem_BW:6_miss,mem-bw/D2H_Mem_BW:7_miss,mem-bw/H2D_Mem_BW:0_miss,mem-bw/H2D_Mem_BW:1_miss,mem-bw/H2D_Mem_BW:2_miss,mem-bw/H2D_Mem_BW:3_miss,mem-bw/H2D_Mem_BW:4_miss,mem-bw/H2D_Mem_BW:5_miss,mem-bw/H2D_Mem_BW:6_miss,mem-bw/H2D_Mem_BW:7_miss,mem-bw/return_code(VAL: 1.0000 Rule:lambda x:x>0) | 0.0% | -0.17% | -6.54% | -7.72% | -0.67% | -1.17% | -4.03% | -1.01% | 0.0 | 0.0% | 0.0% | 1.95% | 2.24% | 3.61% | -1.95% | 1.85% | 4.39% | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1.0 |
[
{
"index": "sb-validation-01",
"diagnosis/category": "KernelLaunch",
"diagnosis/issue_details": "kernel-launch/event_overhead:0(B/L: 0.0060 VAL: 0.1000 VAR: 1577.85% Rule:lambda x:x>0.05)",
"kernel-launch/event_overhead:0": 15.7785234899,
......@@ -35,10 +36,10 @@
"mem-bw/H2D_Mem_BW:5": 0.0078125,
"mem-bw/H2D_Mem_BW:6": -0.01171875,
"mem-bw/H2D_Mem_BW:7": 0.01953125,
"mem-bw/return_code": 0.0,
"Index": "sb-validation-01"
"mem-bw/return_code": 0.0
},
{
"index": "sb-validation-03",
"diagnosis/category": "FailedTest",
"diagnosis/issue_details": "mem-bw/D2H_Mem_BW:0_miss,mem-bw/D2H_Mem_BW:1_miss,mem-bw/D2H_Mem_BW:2_miss,mem-bw/D2H_Mem_BW:3_miss,mem-bw/D2H_Mem_BW:4_miss,mem-bw/D2H_Mem_BW:5_miss,mem-bw/D2H_Mem_BW:6_miss,mem-bw/D2H_Mem_BW:7_miss,mem-bw/H2D_Mem_BW:0_miss,mem-bw/H2D_Mem_BW:1_miss,mem-bw/H2D_Mem_BW:2_miss,mem-bw/H2D_Mem_BW:3_miss,mem-bw/H2D_Mem_BW:4_miss,mem-bw/H2D_Mem_BW:5_miss,mem-bw/H2D_Mem_BW:6_miss,mem-bw/H2D_Mem_BW:7_miss,mem-bw/return_code(VAL: 1.0000 Rule:lambda x:x>0)",
"kernel-launch/event_overhead:0": 0.0,
......@@ -74,7 +75,6 @@
"mem-bw/H2D_Mem_BW:5": null,
"mem-bw/H2D_Mem_BW:6": null,
"mem-bw/H2D_Mem_BW:7": null,
"mem-bw/return_code": 1.0,
"Index": "sb-validation-03"
"mem-bw/return_code": 1.0
}
]
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment