Unverified Commit 97ed12f9 authored by user4543's avatar user4543 Committed by GitHub
Browse files

Analyzer: Add Feature - Add multi-rules feature for data diagnosis (#289)

**Description**
Add multi-rules feature for data diagnosis to support multiple rules' combined check.

**Major Revision**
- revise rule design to support multiple rules combination check
- update related codes and tests
parent 1f48268b
...@@ -54,6 +54,7 @@ superbench: ...@@ -54,6 +54,7 @@ superbench:
${rule_name}: ${rule_name}:
function: string function: string
criteria: string criteria: string
store: (optional)bool
categories: string categories: string
metrics: metrics:
- ${benchmark_name}/regex - ${benchmark_name}/regex
...@@ -108,11 +109,29 @@ superbench: ...@@ -108,11 +109,29 @@ superbench:
- bert_models/pytorch-bert-base/throughput_train_float(32|16) - bert_models/pytorch-bert-base/throughput_train_float(32|16)
- bert_models/pytorch-bert-large/throughput_train_float(32|16) - bert_models/pytorch-bert-large/throughput_train_float(32|16)
- gpt_models/pytorch-gpt-large/throughput_train_float(32|16) - gpt_models/pytorch-gpt-large/throughput_train_float(32|16)
rule4:
function: variance
criteria: "lambda x:x<-0.05"
store: True
categories: CNN
metrics:
- resnet_models/pytorch-resnet.*/throughput_train_.*
rule5:
function: variance
criteria: "lambda x:x<-0.05"
store: True
categories: CNN
metrics:
- vgg_models/pytorch-vgg.*/throughput_train_.*\
rule6:
function: multi_rules
criteria: 'lambda label:True if label["rule4"]+label["rule5"]>=2 else False'
categories: CNN
``` ```
This rule file describes the rules used for data diagnosis. This rule file describes the rules used for data diagnosis.
They are firstly organized by the rule name, and each rule mainly includes 4 elements: They are firstly organized by the rule name, and each rule mainly includes several elements:
#### `metrics` #### `metrics`
...@@ -124,21 +143,29 @@ The categories belong to this rule. ...@@ -124,21 +143,29 @@ The categories belong to this rule.
#### `criteria` #### `criteria`
The criteria used for this rule, which indicate how to compare the data with the baseline value. The format should be a lambda function supported by Python. The criterion used for this rule, which indicates how to compare the data with the baseline value for each metric. The format should be a lambda function supported by Python.
#### `store`
True if the current rule is not used alone to filter the defective machine, but will be used by other subsequent rules. False(default) if this rule is used to label the defective machine directly.
#### `function` #### `function`
The function used for this rule. The function used for this rule.
2 types of rules are supported currently: 3 types of rules are supported currently:
- `variance`: the rule is to check if the variance between raw data and baseline violates the criteria. variance = (raw data - criteria) / criteria - `variance`: the rule is to check if the variance between raw data and baseline violates the criteria. variance = (raw data - criteria) / criteria
For example, if the criteria are `lambda x:x>0.05`, the rule is that if the variance is larger than 5%, it should be defective. For example, if the 'criteria' is `lambda x:x>0.05`, the rule is that if the variance is larger than 5%, it should be defective.
- `value`: the rule is to check if the raw data violate the criteria. - `value`: the rule is to check if the raw data violate the criteria.
For example, if the criteria are `lambda x:x>0`, the rule is that if the raw data is larger than the 0, it should be defective. For example, if the 'criteria' is `lambda x:x>0`, the rule is that if the raw data is larger than the 0, it should be defective.
- `multi_rules`: the rule is to check if the combined results of multiple previous rules and metrics violate the criteria.
For example, if the 'criteria' is 'lambda label:True if label["rule4"]+label["rule5"]>=2 else False', the rule is that if the sum of labeled metrics in rule4 and rule5 is larger than 2, it should be defective.
`Tips`: you must contain a default rule for ${benchmark_name}/return_code as the above in the example, which is used to identify failed tests. `Tips`: you must contain a default rule for ${benchmark_name}/return_code as the above in the example, which is used to identify failed tests.
......
...@@ -19,7 +19,7 @@ class DataDiagnosis(): ...@@ -19,7 +19,7 @@ class DataDiagnosis():
def __init__(self): def __init__(self):
"""Init function.""" """Init function."""
self._sb_rules = {} self._sb_rules = {}
self._metrics = {} self._benchmark_metrics_dict = {}
def _get_metrics_by_benchmarks(self, metrics_list): def _get_metrics_by_benchmarks(self, metrics_list):
"""Get mappings of benchmarks:metrics of metrics_list. """Get mappings of benchmarks:metrics of metrics_list.
...@@ -65,10 +65,13 @@ def _check_rules(self, rule, name): ...@@ -65,10 +65,13 @@ def _check_rules(self, rule, name):
logger.log_and_raise(exception=Exception, msg='invalid criteria format') logger.log_and_raise(exception=Exception, msg='invalid criteria format')
if 'categories' not in rule: if 'categories' not in rule:
logger.log_and_raise(exception=Exception, msg='{} lack of category'.format(name)) logger.log_and_raise(exception=Exception, msg='{} lack of category'.format(name))
if rule['function'] != 'multi_rules':
if 'metrics' not in rule: if 'metrics' not in rule:
logger.log_and_raise(exception=Exception, msg='{} lack of metrics'.format(name)) logger.log_and_raise(exception=Exception, msg='{} lack of metrics'.format(name))
if isinstance(rule['metrics'], str): if isinstance(rule['metrics'], str):
rule['metrics'] = [rule['metrics']] rule['metrics'] = [rule['metrics']]
if 'store' in rule and not isinstance(rule['store'], bool):
logger.log_and_raise(exception=Exception, msg='{} store must be bool type'.format(name))
return rule return rule
def _get_baseline_of_metric(self, baseline, metric): def _get_baseline_of_metric(self, baseline, metric):
...@@ -93,53 +96,67 @@ def _get_baseline_of_metric(self, baseline, metric): ...@@ -93,53 +96,67 @@ def _get_baseline_of_metric(self, baseline, metric):
logger.warning('DataDiagnosis: get baseline - {} baseline not found'.format(metric)) logger.warning('DataDiagnosis: get baseline - {} baseline not found'.format(metric))
return -1 return -1
def _get_criteria(self, rule_file, baseline_file): def __get_metrics_and_baseline(self, rule, benchmark_rules, baseline):
"""Get and generate criteria of metrics. """Get metrics with baseline in the rule.
Read rule file and baseline file. For each rule, use metric with regex Parse metric regex in the rule, and store the (baseline, metric) pair
in the metrics of the rule to match the metric full name from raw data in _sb_rules[rule]['metrics'] and metric in _enable_metrics。
for each benchmark in the rule, and then merge baseline and rule for
matched metrics.
Args: Args:
rule_file (str): The path of rule yaml file rule (str): the name of the rule
baseline_file (str): The path of baseline json file benchmark_rules (dict): the dict of rules
baseline (dict): the dict of baseline of metrics
"""
if self._sb_rules[rule]['function'] == 'multi_rules':
return
metrics_in_rule = benchmark_rules[rule]['metrics']
benchmark_metrics_dict_in_rule = self._get_metrics_by_benchmarks(metrics_in_rule)
for benchmark_name in benchmark_metrics_dict_in_rule:
if benchmark_name not in self._benchmark_metrics_dict:
logger.warning('DataDiagnosis: get criteria failed - {}'.format(benchmark_name))
continue
# get rules and criteria for each metric
for metric in self._benchmark_metrics_dict[benchmark_name]:
# metric full name in baseline
if metric in metrics_in_rule:
self._sb_rules[rule]['metrics'][metric] = self._get_baseline_of_metric(baseline, metric)
self._enable_metrics.add(metric)
continue
# metric full name not in baseline, use regex to match
for metric_regex in benchmark_metrics_dict_in_rule[benchmark_name]:
if re.search(metric_regex, metric):
self._sb_rules[rule]['metrics'][metric] = self._get_baseline_of_metric(baseline, metric)
self._enable_metrics.add(metric)
def _parse_rules_and_baseline(self, rules, baseline):
"""Parse and merge rules and baseline read from file.
Args:
rules (dict): rules from rule yaml file
baseline (dict): baseline of metrics from baseline json file
Returns: Returns:
bool: return True if successfully get the criteria for all rules, otherwise False. bool: return True if successfully get the criteria for all rules, otherwise False.
""" """
try: try:
rules = file_handler.read_rules(rule_file) if not rules:
baseline = file_handler.read_baseline(baseline_file)
if not rules or not baseline:
logger.error('DataDiagnosis: get criteria failed') logger.error('DataDiagnosis: get criteria failed')
return False return False
self._sb_rules = {} self._sb_rules = {}
self._enable_metrics = [] self._enable_metrics = set()
benchmark_rules = rules['superbench']['rules'] benchmark_rules = rules['superbench']['rules']
for rule in benchmark_rules: for rule in benchmark_rules:
benchmark_rules[rule] = self._check_rules(benchmark_rules[rule], rule) benchmark_rules[rule] = self._check_rules(benchmark_rules[rule], rule)
self._sb_rules[rule] = {} self._sb_rules[rule] = {}
self._sb_rules[rule]['name'] = rule
self._sb_rules[rule]['function'] = benchmark_rules[rule]['function'] self._sb_rules[rule]['function'] = benchmark_rules[rule]['function']
self._sb_rules[rule]['store'] = True if 'store' in benchmark_rules[
rule] and benchmark_rules[rule]['store'] is True else False
self._sb_rules[rule]['criteria'] = benchmark_rules[rule]['criteria'] self._sb_rules[rule]['criteria'] = benchmark_rules[rule]['criteria']
self._sb_rules[rule]['categories'] = benchmark_rules[rule]['categories'] self._sb_rules[rule]['categories'] = benchmark_rules[rule]['categories']
self._sb_rules[rule]['metrics'] = {} self._sb_rules[rule]['metrics'] = {}
single_rule_metrics = benchmark_rules[rule]['metrics'] self.__get_metrics_and_baseline(rule, benchmark_rules, baseline)
benchmark_metrics = self._get_metrics_by_benchmarks(single_rule_metrics) self._enable_metrics = sorted(list(self._enable_metrics))
for benchmark_name in benchmark_metrics:
# get rules and criteria for each metric
for metric in self._metrics[benchmark_name]:
# metric full name in baseline
if metric in single_rule_metrics:
self._sb_rules[rule]['metrics'][metric] = self._get_baseline_of_metric(baseline, metric)
self._enable_metrics.append(metric)
continue
# metric full name not in baseline, use regex to match
for metric_regex in benchmark_metrics[benchmark_name]:
if re.search(metric_regex, metric):
self._sb_rules[rule]['metrics'][metric] = self._get_baseline_of_metric(baseline, metric)
self._enable_metrics.append(metric)
self._enable_metrics.sort()
except Exception as e: except Exception as e:
logger.error('DataDiagnosis: get criteria failed - {}'.format(str(e))) logger.error('DataDiagnosis: get criteria failed - {}'.format(str(e)))
return False return False
...@@ -166,15 +183,22 @@ def _run_diagnosis_rules_for_single_node(self, node): ...@@ -166,15 +183,22 @@ def _run_diagnosis_rules_for_single_node(self, node):
issue_label = False issue_label = False
details = [] details = []
categories = set() categories = set()
violation = {}
summary_data_row = pd.Series(index=self._enable_metrics, name=node, dtype=float) summary_data_row = pd.Series(index=self._enable_metrics, name=node, dtype=float)
# Check each rule # Check each rule
for rule in self._sb_rules: for rule in self._sb_rules:
# Get rule op function and run the rule # Get rule op function and run the rule
function_name = self._sb_rules[rule]['function'] function_name = self._sb_rules[rule]['function']
rule_op = RuleOp.get_rule_func(DiagnosisRuleType(function_name)) rule_op = RuleOp.get_rule_func(DiagnosisRuleType(function_name))
pass_rule = rule_op(data_row, self._sb_rules[rule], summary_data_row, details, categories) violated_num = 0
if rule_op == RuleOp.multi_rules:
violated_num = rule_op(self._sb_rules[rule], details, categories, violation)
else:
violated_num = rule_op(data_row, self._sb_rules[rule], summary_data_row, details, categories)
# label the node as defective one # label the node as defective one
if not pass_rule: if self._sb_rules[rule]['store']:
violation[rule] = violated_num
elif violated_num:
issue_label = True issue_label = True
if issue_label: if issue_label:
# Add category information # Add category information
...@@ -210,7 +234,9 @@ def run_diagnosis_rules(self, rule_file, baseline_file): ...@@ -210,7 +234,9 @@ def run_diagnosis_rules(self, rule_file, baseline_file):
logger.error('DataDiagnosis: empty raw data') logger.error('DataDiagnosis: empty raw data')
return data_not_accept_df, label_df return data_not_accept_df, label_df
# get criteria # get criteria
if not self._get_criteria(rule_file, baseline_file): rules = file_handler.read_rules(rule_file)
baseline = file_handler.read_baseline(baseline_file)
if not self._parse_rules_and_baseline(rules, baseline):
return data_not_accept_df, label_df return data_not_accept_df, label_df
# run diagnosis rules for each node # run diagnosis rules for each node
for node in self._raw_data_df.index: for node in self._raw_data_df.index:
...@@ -242,7 +268,7 @@ def run(self, raw_data_file, rule_file, baseline_file, output_dir, output_format ...@@ -242,7 +268,7 @@ def run(self, raw_data_file, rule_file, baseline_file, output_dir, output_format
""" """
try: try:
self._raw_data_df = file_handler.read_raw_data(raw_data_file) self._raw_data_df = file_handler.read_raw_data(raw_data_file)
self._metrics = self._get_metrics_by_benchmarks(list(self._raw_data_df.columns)) self._benchmark_metrics_dict = self._get_metrics_by_benchmarks(list(self._raw_data_df.columns))
logger.info('DataDiagnosis: Begin to process {} nodes'.format(len(self._raw_data_df))) logger.info('DataDiagnosis: Begin to process {} nodes'.format(len(self._raw_data_df)))
data_not_accept_df, label_df = self.run_diagnosis_rules(rule_file, baseline_file) data_not_accept_df, label_df = self.run_diagnosis_rules(rule_file, baseline_file)
logger.info('DataDiagnosis: Processed finished') logger.info('DataDiagnosis: Processed finished')
......
...@@ -16,6 +16,7 @@ class DiagnosisRuleType(Enum): ...@@ -16,6 +16,7 @@ class DiagnosisRuleType(Enum):
VARIANCE = 'variance' VARIANCE = 'variance'
VALUE = 'value' VALUE = 'value'
MULTI_RULES = 'multi_rules'
class RuleOp: class RuleOp:
...@@ -54,37 +55,77 @@ def get_rule_func(cls, rule_type): ...@@ -54,37 +55,77 @@ def get_rule_func(cls, rule_type):
return None return None
@staticmethod
def check_criterion_with_a_value(rule):
"""Check if the criterion is valid with a numeric variable and return bool type.
Args:
rule (dict): rule including function, criteria, metrics with their baseline values and categories
"""
# parse criteria and check if valid
if not isinstance(eval(rule['criteria'])(0), bool):
logger.log_and_raise(exception=Exception, msg='invalid criteria format')
@staticmethod
def miss_test(metric, rule, data_row, details, categories):
"""Check if the metric in the rule missed test and if so add details and categories.
Args:
metric (str): the name of the metric
data_row (pd.Series): raw data of the metrics
rule (dict): rule including function, criteria, metrics with their baseline values and categories
details (list): details about violated rules and related data
categories (set): categories of violated rules
Returns:
bool: if the metric in the rule missed test, return True, otherwise return False
"""
# metric not in raw_data or the value is none, miss test
if metric not in data_row or pd.isna(data_row[metric]):
RuleOp.add_categories_and_details(metric + '_miss', rule['categories'], details, categories)
return True
return False
@staticmethod
def add_categories_and_details(detail, category, details, categories):
"""Add details and categories.
Args:
detail (str): violated rule and related data
category (str): category of violated rule
details (list): list of details about violated rules and related data
categories (set): set of categories of violated rules
"""
details.append(detail)
categories.add(category)
@staticmethod @staticmethod
def variance(data_row, rule, summary_data_row, details, categories): def variance(data_row, rule, summary_data_row, details, categories):
"""Rule op function of variance. """Rule op function of variance.
Each metric in the rule will calculate the variance (val - baseline / baseline), Each metric in the rule will calculate the variance (val - baseline / baseline),
and use criteria in the rule to determine whether metric's variance meet the criteria, and use criteria in the rule to determine whether metric's variance meet the criteria,
if any metric is labeled, the rule is not passed. if any metric meet the criteria, the rule is not passed.
Args: Args:
data_row (pd.Series): raw data of the metrics data_row (pd.Series): raw data of the metrics
rule (dict): rule including function, criteria, metrics with their baseline values and categories rule (dict): rule including function, criteria, metrics with their baseline values and categories
summary_data_row (pd.Series): results of the metrics processed after the function summary_data_row (pd.Series): results of the metrics processed after the function
details (list): defective details including data and rules details (list): details about violated rules and related data
categories (set): categories of violated rules categories (set): categories of violated rules
Returns: Returns:
bool: whether the rule is passed number: the number of the metrics that violate the rule if the rule is not passed, otherwise 0
""" """
pass_rule = True violated_metric_num = 0
# parse criteria and check if valid RuleOp.check_criterion_with_a_value(rule)
if not isinstance(eval(rule['criteria'])(0), bool):
logger.log_and_raise(exception=Exception, msg='invalid criteria format')
# every metric should pass the rule # every metric should pass the rule
for metric in rule['metrics']: for metric in rule['metrics']:
violate_metric = False
# metric not in raw_data or the value is none, miss test # metric not in raw_data or the value is none, miss test
if metric not in data_row or pd.isna(data_row[metric]): if RuleOp.miss_test(metric, rule, data_row, details, categories):
pass_rule = False violated_metric_num += 1
details.append(metric + '_miss')
categories.add(rule['categories'])
else: else:
violate_metric = False
# check if metric pass the rule # check if metric pass the rule
val = data_row[metric] val = data_row[metric]
baseline = rule['metrics'][metric] baseline = rule['metrics'][metric]
...@@ -95,13 +136,12 @@ def variance(data_row, rule, summary_data_row, details, categories): ...@@ -95,13 +136,12 @@ def variance(data_row, rule, summary_data_row, details, categories):
violate_metric = eval(rule['criteria'])(var) violate_metric = eval(rule['criteria'])(var)
# add issued details and categories # add issued details and categories
if violate_metric: if violate_metric:
pass_rule = False violated_metric_num += 1
info = '(B/L: {:.4f} VAL: {:.4f} VAR: {:.2f}% Rule:{})'.format( info = '(B/L: {:.4f} VAL: {:.4f} VAR: {:.2f}% Rule:{})'.format(
baseline, val, var * 100, rule['criteria'] baseline, val, var * 100, rule['criteria']
) )
details.append(metric + info) RuleOp.add_categories_and_details(metric + info, rule['categories'], details, categories)
categories.add(rule['categories']) return violated_metric_num
return pass_rule
@staticmethod @staticmethod
def value(data_row, rule, summary_data_row, details, categories): def value(data_row, rule, summary_data_row, details, categories):
...@@ -109,43 +149,63 @@ def value(data_row, rule, summary_data_row, details, categories): ...@@ -109,43 +149,63 @@ def value(data_row, rule, summary_data_row, details, categories):
Each metric in the rule will use criteria in the rule Each metric in the rule will use criteria in the rule
to determine whether metric's value meet the criteria, to determine whether metric's value meet the criteria,
if any metric is labeled, the rule is not passed. if any metric meet the criteria, the rule is not passed.
Args: Args:
data_row (pd.Series): raw data of the metrics data_row (pd.Series): raw data of the metrics
rule (dict): rule including function, criteria, metrics with their baseline values and categories rule (dict): rule including function, criteria, metrics with their baseline values and categories
summary_data_row (pd.Series): results of the metrics processed after the function summary_data_row (pd.Series): results of the metrics processed after the function
details (list): defective details including data and rules details (list): details about violated rules and related data
categories (set): categories of violated rules categories (set): categories of violated rules
Returns: Returns:
bool: whether the rule is passed number: the number of the metrics that violate the rule if the rule is not passed, otherwise 0
""" """
pass_rule = True violated_metric_num = 0
# parse criteria and check if valid # parse criteria and check if valid
if not isinstance(eval(rule['criteria'])(0), bool): RuleOp.check_criterion_with_a_value(rule)
logger.log_and_raise(exception=Exception, msg='invalid criteria format')
# every metric should pass the rule # every metric should pass the rule
for metric in rule['metrics']: for metric in rule['metrics']:
violate_metric = False
# metric not in raw_data or the value is none, miss test # metric not in raw_data or the value is none, miss test
if metric not in data_row or pd.isna(data_row[metric]): if RuleOp.miss_test(metric, rule, data_row, details, categories):
pass_rule = False violated_metric_num += 1
details.append(metric + '_miss')
categories.add(rule['categories'])
else: else:
violate_metric = False
# check if metric pass the rule # check if metric pass the rule
val = data_row[metric] val = data_row[metric]
summary_data_row[metric] = val summary_data_row[metric] = val
violate_metric = eval(rule['criteria'])(val) violate_metric = eval(rule['criteria'])(val)
# add issued details and categories # add issued details and categories
if violate_metric: if violate_metric:
pass_rule = False violated_metric_num += 1
info = '(VAL: {:.4f} Rule:{})'.format(val, rule['criteria']) info = '(VAL: {:.4f} Rule:{})'.format(val, rule['criteria'])
details.append(metric + info) RuleOp.add_categories_and_details(metric + info, rule['categories'], details, categories)
categories.add(rule['categories']) return violated_metric_num
return pass_rule
@staticmethod
def multi_rules(rule, details, categories, violation):
"""Rule op function of multi_rules.
The criteria in this rule will use the combined results of multiple previous rules and their metrics
which has been stored in advance to determine whether this rule is passed.
Args:
rule (dict): rule including function, criteria, metrics with their baseline values and categories
details (list): details about violated rules and related data
categories (set): categories of violated rules
violation (dict): the number of the metrics that violate the rules
Returns:
number: 0 if the rule is passed, otherwise 1
"""
violated = eval(rule['criteria'])(violation)
if not isinstance(violated, bool):
logger.log_and_raise(exception=Exception, msg='invalid upper criteria format')
if violated:
info = '{}:{}'.format(rule['name'], rule['criteria'])
RuleOp.add_categories_and_details(info, rule['categories'], details, categories)
return 1 if violated else 0
RuleOp.add_rule_func(DiagnosisRuleType.VARIANCE)(RuleOp.variance) RuleOp.add_rule_func(DiagnosisRuleType.VARIANCE)(RuleOp.variance)
RuleOp.add_rule_func(DiagnosisRuleType.VALUE)(RuleOp.value) RuleOp.add_rule_func(DiagnosisRuleType.VALUE)(RuleOp.value)
RuleOp.add_rule_func(DiagnosisRuleType.MULTI_RULES)(RuleOp.multi_rules)
...@@ -6,4 +6,4 @@ ...@@ -6,4 +6,4 @@
"mem-bw/D2H_Mem_BW": 24.3, "mem-bw/D2H_Mem_BW": 24.3,
"mem-bw/D2D_Mem_BW": 1118.0, "mem-bw/D2D_Mem_BW": 1118.0,
"mem-bw/return_code": 0 "mem-bw/return_code": 0
} }
\ No newline at end of file
...@@ -39,16 +39,16 @@ def test_data_diagnosis(self): ...@@ -39,16 +39,16 @@ def test_data_diagnosis(self):
test_baseline_file = str(self.parent_path / 'test_baseline.json') test_baseline_file = str(self.parent_path / 'test_baseline.json')
diag1 = DataDiagnosis() diag1 = DataDiagnosis()
diag1._raw_data_df = file_handler.read_raw_data(test_raw_data) diag1._raw_data_df = file_handler.read_raw_data(test_raw_data)
diag1._metrics = diag1._get_metrics_by_benchmarks(list(diag1._raw_data_df)) diag1._benchmark_metrics_dict = diag1._get_metrics_by_benchmarks(list(diag1._raw_data_df))
assert (len(diag1._raw_data_df) == 3) assert (len(diag1._raw_data_df) == 3)
# Negative case # Negative case
test_raw_data_fake = str(self.parent_path / 'test_results_fake.jsonl') test_raw_data_fake = str(self.parent_path / 'test_results_fake.jsonl')
test_rule_file_fake = str(self.parent_path / 'test_rules_fake.yaml') test_rule_file_fake = str(self.parent_path / 'test_rules_fake.yaml')
diag2 = DataDiagnosis() diag2 = DataDiagnosis()
diag2._raw_data_df = file_handler.read_raw_data(test_raw_data_fake) diag2._raw_data_df = file_handler.read_raw_data(test_raw_data_fake)
diag2._metrics = diag2._get_metrics_by_benchmarks(list(diag2._raw_data_df)) diag2._benchmark_metrics_dict = diag2._get_metrics_by_benchmarks(list(diag2._raw_data_df))
assert (len(diag2._raw_data_df) == 0) assert (len(diag2._raw_data_df) == 0)
assert (len(diag2._metrics) == 0) assert (len(diag2._benchmark_metrics_dict) == 0)
metric_list = [ metric_list = [
'gpu_temperature', 'gpu_power_limit', 'gemm-flops/FP64', 'gpu_temperature', 'gpu_power_limit', 'gemm-flops/FP64',
'bert_models/pytorch-bert-base/steptime_train_float32' 'bert_models/pytorch-bert-base/steptime_train_float32'
...@@ -124,21 +124,24 @@ def test_data_diagnosis(self): ...@@ -124,21 +124,24 @@ def test_data_diagnosis(self):
assert (diag1._get_baseline_of_metric(baseline, 'kernel-launch/event_overhead:0') == 0.00596) assert (diag1._get_baseline_of_metric(baseline, 'kernel-launch/event_overhead:0') == 0.00596)
assert (diag1._get_baseline_of_metric(baseline, 'kernel-launch/return_code') == 0) assert (diag1._get_baseline_of_metric(baseline, 'kernel-launch/return_code') == 0)
assert (diag1._get_baseline_of_metric(baseline, 'mem-bw/H2D:0') == -1) assert (diag1._get_baseline_of_metric(baseline, 'mem-bw/H2D:0') == -1)
# Test - _get_criteria # Test - _parse_rules_and_baseline
# Negative case # Negative case
assert (diag2._get_criteria(test_rule_file_fake, test_baseline_file) is False) fake_rules = file_handler.read_rules(test_rule_file_fake)
baseline = file_handler.read_baseline(test_baseline_file)
assert (diag2._parse_rules_and_baseline(fake_rules, baseline) is False)
diag2 = DataDiagnosis() diag2 = DataDiagnosis()
diag2._raw_data_df = file_handler.read_raw_data(test_raw_data) diag2._raw_data_df = file_handler.read_raw_data(test_raw_data)
diag2._metrics = diag2._get_metrics_by_benchmarks(list(diag2._raw_data_df)) diag2._benchmark_metrics_dict = diag2._get_metrics_by_benchmarks(list(diag2._raw_data_df))
p = Path(test_rule_file) p = Path(test_rule_file)
with p.open() as f: with p.open() as f:
rules = yaml.load(f, Loader=yaml.SafeLoader) rules = yaml.load(f, Loader=yaml.SafeLoader)
rules['superbench']['rules']['fake'] = false_rules[0] rules['superbench']['rules']['fake'] = false_rules[0]
with open(test_rule_file_fake, 'w') as f: with open(test_rule_file_fake, 'w') as f:
yaml.dump(rules, f) yaml.dump(rules, f)
assert (diag1._get_criteria(test_rule_file_fake, test_baseline_file) is False) assert (diag1._parse_rules_and_baseline(fake_rules, baseline) is False)
# Positive case # Positive case
assert (diag1._get_criteria(test_rule_file, test_baseline_file)) rules = file_handler.read_rules(test_rule_file)
assert (diag1._parse_rules_and_baseline(rules, baseline))
# Test - _run_diagnosis_rules_for_single_node # Test - _run_diagnosis_rules_for_single_node
(details_row, summary_data_row) = diag1._run_diagnosis_rules_for_single_node('sb-validation-01') (details_row, summary_data_row) = diag1._run_diagnosis_rules_for_single_node('sb-validation-01')
assert (details_row) assert (details_row)
...@@ -211,3 +214,80 @@ def test_data_diagnosis_run(self): ...@@ -211,3 +214,80 @@ def test_data_diagnosis_run(self):
with Path(expect_result_file).open() as f: with Path(expect_result_file).open() as f:
expect_result = f.read() expect_result = f.read()
assert (data_not_accept_read_from_json == expect_result) assert (data_not_accept_read_from_json == expect_result)
def test_mutli_rules(self):
"""Test multi rules check feature."""
diag1 = DataDiagnosis()
# test _check_rules
false_rules = [
{
'criteria': 'lambda x:x>0',
'categories': 'KernelLaunch',
'store': 'true',
'metrics': ['kernel-launch/event_overhead:\\d+']
}
]
metric = 'kernel-launch/event_overhead:0'
for rules in false_rules:
self.assertRaises(Exception, diag1._check_rules, rules, metric)
# Positive case
true_rules = [
{
'categories': 'KernelLaunch',
'criteria': 'lambda x:x>0.05',
'store': True,
'function': 'variance',
'metrics': ['kernel-launch/event_overhead:\\d+']
}, {
'categories': 'CNN',
'function': 'multi_rules',
'criteria': 'lambda label:True if label["rule1"]+label["rule2"]>=2 else False'
}
]
for rules in true_rules:
assert (diag1._check_rules(rules, metric))
# test _run_diagnosis_rules_for_single_node
rules = {
'superbench': {
'rules': {
'rule1': {
'categories': 'CNN',
'criteria': 'lambda x:x<-0.5',
'store': True,
'function': 'variance',
'metrics': ['mem-bw/D2H_Mem_BW']
},
'rule2': {
'categories': 'CNN',
'criteria': 'lambda x:x<-0.5',
'function': 'variance',
'store': True,
'metrics': ['kernel-launch/wall_overhead']
},
'rule3': {
'categories': 'CNN',
'function': 'multi_rules',
'criteria': 'lambda label:True if label["rule1"]+label["rule2"]>=2 else False'
}
}
}
}
baseline = {
'kernel-launch/wall_overhead': 0.01026,
'mem-bw/D2H_Mem_BW': 24.3,
}
data = {'kernel-launch/wall_overhead': [0.005, 0.005], 'mem-bw/D2H_Mem_BW': [25, 10]}
diag1._raw_data_df = pd.DataFrame(data, index=['sb-validation-04', 'sb-validation-05'])
diag1._benchmark_metrics_dict = diag1._get_metrics_by_benchmarks(list(diag1._raw_data_df.columns))
diag1._parse_rules_and_baseline(rules, baseline)
(details_row, summary_data_row) = diag1._run_diagnosis_rules_for_single_node('sb-validation-04')
assert (not details_row)
(details_row, summary_data_row) = diag1._run_diagnosis_rules_for_single_node('sb-validation-05')
assert (details_row)
assert ('CNN' in details_row[0])
assert (
details_row[1] == 'kernel-launch/wall_overhead(B/L: 0.0103 VAL: 0.0050 VAR: -51.27% Rule:lambda x:x<-0.5),'
+ 'mem-bw/D2H_Mem_BW(B/L: 24.3000 VAL: 10.0000 VAR: -58.85% Rule:lambda x:x<-0.5),' +
'rule3:lambda label:True if label["rule1"]+label["rule2"]>=2 else False'
)
...@@ -99,21 +99,101 @@ def test_rule_op(self): ...@@ -99,21 +99,101 @@ def test_rule_op(self):
# variance # variance
data = {'kernel-launch/event_overhead:0': 3.1, 'kernel-launch/event_overhead:1': 2} data = {'kernel-launch/event_overhead:0': 3.1, 'kernel-launch/event_overhead:1': 2}
data_row = pd.Series(data) data_row = pd.Series(data)
pass_rule = rule_op(data_row, true_baselines[0], summary_data_row, details, categories) violated_metric_num = rule_op(data_row, true_baselines[0], summary_data_row, details, categories)
assert (not pass_rule) assert (violated_metric_num == 1)
assert (categories == {'KernelLaunch'}) assert (categories == {'KernelLaunch'})
assert (details == ['kernel-launch/event_overhead:0(B/L: 2.0000 VAL: 3.1000 VAR: 55.00% Rule:lambda x:x>0.5)']) assert (details == ['kernel-launch/event_overhead:0(B/L: 2.0000 VAL: 3.1000 VAR: 55.00% Rule:lambda x:x>0.5)'])
data = {'kernel-launch/event_overhead:0': 1.5, 'kernel-launch/event_overhead:1': 1.5} data = {'kernel-launch/event_overhead:0': 1.5, 'kernel-launch/event_overhead:1': 1.5}
data_row = pd.Series(data) data_row = pd.Series(data)
pass_rule = rule_op(data_row, true_baselines[1], summary_data_row, details, categories) violated_metric_num = rule_op(data_row, true_baselines[1], summary_data_row, details, categories)
assert (pass_rule) assert (violated_metric_num == 0)
assert (categories == {'KernelLaunch'}) assert (categories == {'KernelLaunch'})
# value # value
rule_op = RuleOp.get_rule_func(DiagnosisRuleType.VALUE) rule_op = RuleOp.get_rule_func(DiagnosisRuleType.VALUE)
pass_rule = rule_op(data_row, true_baselines[2], summary_data_row, details, categories) violated_metric_num = rule_op(data_row, true_baselines[2], summary_data_row, details, categories)
assert (not pass_rule)
assert (categories == {'KernelLaunch', 'KernelLaunch2'}) assert (categories == {'KernelLaunch', 'KernelLaunch2'})
assert ('kernel-launch/event_overhead:0(VAL: 1.5000 Rule:lambda x:x>0)' in details) assert ('kernel-launch/event_overhead:0(VAL: 1.5000 Rule:lambda x:x>0)' in details)
assert ('kernel-launch/event_overhead:0(B/L: 2.0000 VAL: 3.1000 VAR: 55.00% Rule:lambda x:x>0.5)' in details) assert ('kernel-launch/event_overhead:0(B/L: 2.0000 VAL: 3.1000 VAR: 55.00% Rule:lambda x:x>0.5)' in details)
def test_multi_rules_op(self):
"""multi-rule check."""
details = []
categories = set()
data_row = pd.Series()
summary_data_row = pd.Series(index=['kernel-launch/event_overhead:0'], dtype=float)
false_baselines = [
{
'categories': 'KernelLaunch',
'criteria': 'lambda label:True if label["rule2"]>=2 else False',
'function': 'multi_rules'
}
]
label = {}
for rule in false_baselines:
self.assertRaises(Exception, RuleOp.multi_rules, rule, details, categories, label)
true_baselines = [
{
'name': 'rule1',
'categories': 'CNN',
'criteria': 'lambda x:x<-0.5',
'store': True,
'function': 'variance',
'metrics': {
'resnet_models/pytorch-resnet152/throughput_train_float32': 300,
}
}, {
'name': 'rule2',
'categories': 'CNN',
'criteria': 'lambda x:x<-0.5',
'store': True,
'function': 'variance',
'metrics': {
'vgg_models/pytorch-vgg11/throughput_train_float32': 300
}
}, {
'name': 'rule3',
'categories': 'KernelLaunch',
'criteria': 'lambda label:True if label["rule1"]+label["rule2"]>=2 else False',
'store': False,
'function': 'multi_rules'
}
]
# label["rule1"]+label["rule2"]=1, rule3 pass
data = {
'resnet_models/pytorch-resnet152/throughput_train_float32': 300,
'vgg_models/pytorch-vgg11/throughput_train_float32': 100
}
data_row = pd.Series(data)
rule_op = RuleOp.get_rule_func(DiagnosisRuleType(true_baselines[0]['function']))
label[true_baselines[0]['name']] = rule_op(data_row, true_baselines[0], summary_data_row, details, categories)
label[true_baselines[1]['name']] = rule_op(data_row, true_baselines[1], summary_data_row, details, categories)
rule_op = RuleOp.get_rule_func(DiagnosisRuleType(true_baselines[2]['function']))
violated_metric_num = rule_op(true_baselines[2], details, categories, label)
assert (violated_metric_num == 0)
# label["rule1"]+label["rule2"]=2, rule3 not pass
data = {
'resnet_models/pytorch-resnet152/throughput_train_float32': 100,
'vgg_models/pytorch-vgg11/throughput_train_float32': 100
}
data_row = pd.Series(data)
details = []
categories = set()
rule_op = RuleOp.get_rule_func(DiagnosisRuleType(true_baselines[0]['function']))
label[true_baselines[0]['name']] = rule_op(data_row, true_baselines[0], summary_data_row, details, categories)
label[true_baselines[1]['name']] = rule_op(data_row, true_baselines[1], summary_data_row, details, categories)
rule_op = RuleOp.get_rule_func(DiagnosisRuleType(true_baselines[2]['function']))
violated_metric_num = rule_op(true_baselines[2], details, categories, label)
assert (violated_metric_num)
assert ('CNN' in categories)
assert (
details == [
'resnet_models/pytorch-resnet152/throughput_train_float32' +
'(B/L: 300.0000 VAL: 100.0000 VAR: -66.67% Rule:lambda x:x<-0.5)',
'vgg_models/pytorch-vgg11/throughput_train_float32' +
'(B/L: 300.0000 VAL: 100.0000 VAR: -66.67% Rule:lambda x:x<-0.5)',
'rule3:lambda label:True if label["rule1"]+label["rule2"]>=2 else False'
]
)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment