Unverified Commit 97ed12f9 authored by user4543's avatar user4543 Committed by GitHub
Browse files

Analyzer: Add Feature - Add multi-rules feature for data diagnosis (#289)

**Description**
Add multi-rules feature for data diagnosis to support multiple rules' combined check.

**Major Revision**
- revise rule design to support multiple rules combination check
- update related codes and tests
parent 1f48268b
...@@ -54,6 +54,7 @@ superbench: ...@@ -54,6 +54,7 @@ superbench:
${rule_name}: ${rule_name}:
function: string function: string
criteria: string criteria: string
store: (optional)bool
categories: string categories: string
metrics: metrics:
- ${benchmark_name}/regex - ${benchmark_name}/regex
...@@ -108,11 +109,29 @@ superbench: ...@@ -108,11 +109,29 @@ superbench:
- bert_models/pytorch-bert-base/throughput_train_float(32|16) - bert_models/pytorch-bert-base/throughput_train_float(32|16)
- bert_models/pytorch-bert-large/throughput_train_float(32|16) - bert_models/pytorch-bert-large/throughput_train_float(32|16)
- gpt_models/pytorch-gpt-large/throughput_train_float(32|16) - gpt_models/pytorch-gpt-large/throughput_train_float(32|16)
rule4:
function: variance
criteria: "lambda x:x<-0.05"
store: True
categories: CNN
metrics:
- resnet_models/pytorch-resnet.*/throughput_train_.*
rule5:
function: variance
criteria: "lambda x:x<-0.05"
store: True
categories: CNN
metrics:
- vgg_models/pytorch-vgg.*/throughput_train_.*\
rule6:
function: multi_rules
criteria: 'lambda label:True if label["rule4"]+label["rule5"]>=2 else False'
categories: CNN
``` ```
This rule file describes the rules used for data diagnosis. This rule file describes the rules used for data diagnosis.
They are firstly organized by the rule name, and each rule mainly includes 4 elements: They are firstly organized by the rule name, and each rule mainly includes several elements:
#### `metrics` #### `metrics`
...@@ -124,21 +143,29 @@ The categories belong to this rule. ...@@ -124,21 +143,29 @@ The categories belong to this rule.
#### `criteria` #### `criteria`
The criteria used for this rule, which indicate how to compare the data with the baseline value. The format should be a lambda function supported by Python. The criterion used for this rule, which indicates how to compare the data with the baseline value for each metric. The format should be a lambda function supported by Python.
#### `store`
True if the current rule is not used alone to filter the defective machine, but will be used by other subsequent rules. False(default) if this rule is used to label the defective machine directly.
#### `function` #### `function`
The function used for this rule. The function used for this rule.
2 types of rules are supported currently: 3 types of rules are supported currently:
- `variance`: the rule is to check if the variance between raw data and baseline violates the criteria. variance = (raw data - criteria) / criteria - `variance`: the rule is to check if the variance between raw data and baseline violates the criteria. variance = (raw data - criteria) / criteria
For example, if the criteria are `lambda x:x>0.05`, the rule is that if the variance is larger than 5%, it should be defective. For example, if the 'criteria' is `lambda x:x>0.05`, the rule is that if the variance is larger than 5%, it should be defective.
- `value`: the rule is to check if the raw data violate the criteria. - `value`: the rule is to check if the raw data violate the criteria.
For example, if the criteria are `lambda x:x>0`, the rule is that if the raw data is larger than the 0, it should be defective. For example, if the 'criteria' is `lambda x:x>0`, the rule is that if the raw data is larger than the 0, it should be defective.
- `multi_rules`: the rule is to check if the combined results of multiple previous rules and metrics violate the criteria.
For example, if the 'criteria' is 'lambda label:True if label["rule4"]+label["rule5"]>=2 else False', the rule is that if the sum of labeled metrics in rule4 and rule5 is larger than 2, it should be defective.
`Tips`: you must contain a default rule for ${benchmark_name}/return_code as the above in the example, which is used to identify failed tests. `Tips`: you must contain a default rule for ${benchmark_name}/return_code as the above in the example, which is used to identify failed tests.
......
...@@ -19,7 +19,7 @@ class DataDiagnosis(): ...@@ -19,7 +19,7 @@ class DataDiagnosis():
def __init__(self): def __init__(self):
"""Init function.""" """Init function."""
self._sb_rules = {} self._sb_rules = {}
self._metrics = {} self._benchmark_metrics_dict = {}
def _get_metrics_by_benchmarks(self, metrics_list): def _get_metrics_by_benchmarks(self, metrics_list):
"""Get mappings of benchmarks:metrics of metrics_list. """Get mappings of benchmarks:metrics of metrics_list.
...@@ -65,10 +65,13 @@ def _check_rules(self, rule, name): ...@@ -65,10 +65,13 @@ def _check_rules(self, rule, name):
logger.log_and_raise(exception=Exception, msg='invalid criteria format') logger.log_and_raise(exception=Exception, msg='invalid criteria format')
if 'categories' not in rule: if 'categories' not in rule:
logger.log_and_raise(exception=Exception, msg='{} lack of category'.format(name)) logger.log_and_raise(exception=Exception, msg='{} lack of category'.format(name))
if 'metrics' not in rule: if rule['function'] != 'multi_rules':
logger.log_and_raise(exception=Exception, msg='{} lack of metrics'.format(name)) if 'metrics' not in rule:
if isinstance(rule['metrics'], str): logger.log_and_raise(exception=Exception, msg='{} lack of metrics'.format(name))
rule['metrics'] = [rule['metrics']] if isinstance(rule['metrics'], str):
rule['metrics'] = [rule['metrics']]
if 'store' in rule and not isinstance(rule['store'], bool):
logger.log_and_raise(exception=Exception, msg='{} store must be bool type'.format(name))
return rule return rule
def _get_baseline_of_metric(self, baseline, metric): def _get_baseline_of_metric(self, baseline, metric):
...@@ -93,53 +96,67 @@ def _get_baseline_of_metric(self, baseline, metric): ...@@ -93,53 +96,67 @@ def _get_baseline_of_metric(self, baseline, metric):
logger.warning('DataDiagnosis: get baseline - {} baseline not found'.format(metric)) logger.warning('DataDiagnosis: get baseline - {} baseline not found'.format(metric))
return -1 return -1
def _get_criteria(self, rule_file, baseline_file): def __get_metrics_and_baseline(self, rule, benchmark_rules, baseline):
"""Get and generate criteria of metrics. """Get metrics with baseline in the rule.
Read rule file and baseline file. For each rule, use metric with regex Parse metric regex in the rule, and store the (baseline, metric) pair
in the metrics of the rule to match the metric full name from raw data in _sb_rules[rule]['metrics'] and metric in _enable_metrics。
for each benchmark in the rule, and then merge baseline and rule for
matched metrics.
Args: Args:
rule_file (str): The path of rule yaml file rule (str): the name of the rule
baseline_file (str): The path of baseline json file benchmark_rules (dict): the dict of rules
baseline (dict): the dict of baseline of metrics
"""
if self._sb_rules[rule]['function'] == 'multi_rules':
return
metrics_in_rule = benchmark_rules[rule]['metrics']
benchmark_metrics_dict_in_rule = self._get_metrics_by_benchmarks(metrics_in_rule)
for benchmark_name in benchmark_metrics_dict_in_rule:
if benchmark_name not in self._benchmark_metrics_dict:
logger.warning('DataDiagnosis: get criteria failed - {}'.format(benchmark_name))
continue
# get rules and criteria for each metric
for metric in self._benchmark_metrics_dict[benchmark_name]:
# metric full name in baseline
if metric in metrics_in_rule:
self._sb_rules[rule]['metrics'][metric] = self._get_baseline_of_metric(baseline, metric)
self._enable_metrics.add(metric)
continue
# metric full name not in baseline, use regex to match
for metric_regex in benchmark_metrics_dict_in_rule[benchmark_name]:
if re.search(metric_regex, metric):
self._sb_rules[rule]['metrics'][metric] = self._get_baseline_of_metric(baseline, metric)
self._enable_metrics.add(metric)
def _parse_rules_and_baseline(self, rules, baseline):
"""Parse and merge rules and baseline read from file.
Args:
rules (dict): rules from rule yaml file
baseline (dict): baseline of metrics from baseline json file
Returns: Returns:
bool: return True if successfully get the criteria for all rules, otherwise False. bool: return True if successfully get the criteria for all rules, otherwise False.
""" """
try: try:
rules = file_handler.read_rules(rule_file) if not rules:
baseline = file_handler.read_baseline(baseline_file)
if not rules or not baseline:
logger.error('DataDiagnosis: get criteria failed') logger.error('DataDiagnosis: get criteria failed')
return False return False
self._sb_rules = {} self._sb_rules = {}
self._enable_metrics = [] self._enable_metrics = set()
benchmark_rules = rules['superbench']['rules'] benchmark_rules = rules['superbench']['rules']
for rule in benchmark_rules: for rule in benchmark_rules:
benchmark_rules[rule] = self._check_rules(benchmark_rules[rule], rule) benchmark_rules[rule] = self._check_rules(benchmark_rules[rule], rule)
self._sb_rules[rule] = {} self._sb_rules[rule] = {}
self._sb_rules[rule]['name'] = rule
self._sb_rules[rule]['function'] = benchmark_rules[rule]['function'] self._sb_rules[rule]['function'] = benchmark_rules[rule]['function']
self._sb_rules[rule]['store'] = True if 'store' in benchmark_rules[
rule] and benchmark_rules[rule]['store'] is True else False
self._sb_rules[rule]['criteria'] = benchmark_rules[rule]['criteria'] self._sb_rules[rule]['criteria'] = benchmark_rules[rule]['criteria']
self._sb_rules[rule]['categories'] = benchmark_rules[rule]['categories'] self._sb_rules[rule]['categories'] = benchmark_rules[rule]['categories']
self._sb_rules[rule]['metrics'] = {} self._sb_rules[rule]['metrics'] = {}
single_rule_metrics = benchmark_rules[rule]['metrics'] self.__get_metrics_and_baseline(rule, benchmark_rules, baseline)
benchmark_metrics = self._get_metrics_by_benchmarks(single_rule_metrics) self._enable_metrics = sorted(list(self._enable_metrics))
for benchmark_name in benchmark_metrics:
# get rules and criteria for each metric
for metric in self._metrics[benchmark_name]:
# metric full name in baseline
if metric in single_rule_metrics:
self._sb_rules[rule]['metrics'][metric] = self._get_baseline_of_metric(baseline, metric)
self._enable_metrics.append(metric)
continue
# metric full name not in baseline, use regex to match
for metric_regex in benchmark_metrics[benchmark_name]:
if re.search(metric_regex, metric):
self._sb_rules[rule]['metrics'][metric] = self._get_baseline_of_metric(baseline, metric)
self._enable_metrics.append(metric)
self._enable_metrics.sort()
except Exception as e: except Exception as e:
logger.error('DataDiagnosis: get criteria failed - {}'.format(str(e))) logger.error('DataDiagnosis: get criteria failed - {}'.format(str(e)))
return False return False
...@@ -166,15 +183,22 @@ def _run_diagnosis_rules_for_single_node(self, node): ...@@ -166,15 +183,22 @@ def _run_diagnosis_rules_for_single_node(self, node):
issue_label = False issue_label = False
details = [] details = []
categories = set() categories = set()
violation = {}
summary_data_row = pd.Series(index=self._enable_metrics, name=node, dtype=float) summary_data_row = pd.Series(index=self._enable_metrics, name=node, dtype=float)
# Check each rule # Check each rule
for rule in self._sb_rules: for rule in self._sb_rules:
# Get rule op function and run the rule # Get rule op function and run the rule
function_name = self._sb_rules[rule]['function'] function_name = self._sb_rules[rule]['function']
rule_op = RuleOp.get_rule_func(DiagnosisRuleType(function_name)) rule_op = RuleOp.get_rule_func(DiagnosisRuleType(function_name))
pass_rule = rule_op(data_row, self._sb_rules[rule], summary_data_row, details, categories) violated_num = 0
if rule_op == RuleOp.multi_rules:
violated_num = rule_op(self._sb_rules[rule], details, categories, violation)
else:
violated_num = rule_op(data_row, self._sb_rules[rule], summary_data_row, details, categories)
# label the node as defective one # label the node as defective one
if not pass_rule: if self._sb_rules[rule]['store']:
violation[rule] = violated_num
elif violated_num:
issue_label = True issue_label = True
if issue_label: if issue_label:
# Add category information # Add category information
...@@ -210,7 +234,9 @@ def run_diagnosis_rules(self, rule_file, baseline_file): ...@@ -210,7 +234,9 @@ def run_diagnosis_rules(self, rule_file, baseline_file):
logger.error('DataDiagnosis: empty raw data') logger.error('DataDiagnosis: empty raw data')
return data_not_accept_df, label_df return data_not_accept_df, label_df
# get criteria # get criteria
if not self._get_criteria(rule_file, baseline_file): rules = file_handler.read_rules(rule_file)
baseline = file_handler.read_baseline(baseline_file)
if not self._parse_rules_and_baseline(rules, baseline):
return data_not_accept_df, label_df return data_not_accept_df, label_df
# run diagnosis rules for each node # run diagnosis rules for each node
for node in self._raw_data_df.index: for node in self._raw_data_df.index:
...@@ -242,7 +268,7 @@ def run(self, raw_data_file, rule_file, baseline_file, output_dir, output_format ...@@ -242,7 +268,7 @@ def run(self, raw_data_file, rule_file, baseline_file, output_dir, output_format
""" """
try: try:
self._raw_data_df = file_handler.read_raw_data(raw_data_file) self._raw_data_df = file_handler.read_raw_data(raw_data_file)
self._metrics = self._get_metrics_by_benchmarks(list(self._raw_data_df.columns)) self._benchmark_metrics_dict = self._get_metrics_by_benchmarks(list(self._raw_data_df.columns))
logger.info('DataDiagnosis: Begin to process {} nodes'.format(len(self._raw_data_df))) logger.info('DataDiagnosis: Begin to process {} nodes'.format(len(self._raw_data_df)))
data_not_accept_df, label_df = self.run_diagnosis_rules(rule_file, baseline_file) data_not_accept_df, label_df = self.run_diagnosis_rules(rule_file, baseline_file)
logger.info('DataDiagnosis: Processed finished') logger.info('DataDiagnosis: Processed finished')
......
...@@ -16,6 +16,7 @@ class DiagnosisRuleType(Enum): ...@@ -16,6 +16,7 @@ class DiagnosisRuleType(Enum):
VARIANCE = 'variance' VARIANCE = 'variance'
VALUE = 'value' VALUE = 'value'
MULTI_RULES = 'multi_rules'
class RuleOp: class RuleOp:
...@@ -54,37 +55,77 @@ def get_rule_func(cls, rule_type): ...@@ -54,37 +55,77 @@ def get_rule_func(cls, rule_type):
return None return None
@staticmethod
def check_criterion_with_a_value(rule):
"""Check if the criterion is valid with a numeric variable and return bool type.
Args:
rule (dict): rule including function, criteria, metrics with their baseline values and categories
"""
# parse criteria and check if valid
if not isinstance(eval(rule['criteria'])(0), bool):
logger.log_and_raise(exception=Exception, msg='invalid criteria format')
@staticmethod
def miss_test(metric, rule, data_row, details, categories):
"""Check if the metric in the rule missed test and if so add details and categories.
Args:
metric (str): the name of the metric
data_row (pd.Series): raw data of the metrics
rule (dict): rule including function, criteria, metrics with their baseline values and categories
details (list): details about violated rules and related data
categories (set): categories of violated rules
Returns:
bool: if the metric in the rule missed test, return True, otherwise return False
"""
# metric not in raw_data or the value is none, miss test
if metric not in data_row or pd.isna(data_row[metric]):
RuleOp.add_categories_and_details(metric + '_miss', rule['categories'], details, categories)
return True
return False
@staticmethod
def add_categories_and_details(detail, category, details, categories):
"""Add details and categories.
Args:
detail (str): violated rule and related data
category (str): category of violated rule
details (list): list of details about violated rules and related data
categories (set): set of categories of violated rules
"""
details.append(detail)
categories.add(category)
@staticmethod @staticmethod
def variance(data_row, rule, summary_data_row, details, categories): def variance(data_row, rule, summary_data_row, details, categories):
"""Rule op function of variance. """Rule op function of variance.
Each metric in the rule will calculate the variance (val - baseline / baseline), Each metric in the rule will calculate the variance (val - baseline / baseline),
and use criteria in the rule to determine whether metric's variance meet the criteria, and use criteria in the rule to determine whether metric's variance meet the criteria,
if any metric is labeled, the rule is not passed. if any metric meet the criteria, the rule is not passed.
Args: Args:
data_row (pd.Series): raw data of the metrics data_row (pd.Series): raw data of the metrics
rule (dict): rule including function, criteria, metrics with their baseline values and categories rule (dict): rule including function, criteria, metrics with their baseline values and categories
summary_data_row (pd.Series): results of the metrics processed after the function summary_data_row (pd.Series): results of the metrics processed after the function
details (list): defective details including data and rules details (list): details about violated rules and related data
categories (set): categories of violated rules categories (set): categories of violated rules
Returns: Returns:
bool: whether the rule is passed number: the number of the metrics that violate the rule if the rule is not passed, otherwise 0
""" """
pass_rule = True violated_metric_num = 0
# parse criteria and check if valid RuleOp.check_criterion_with_a_value(rule)
if not isinstance(eval(rule['criteria'])(0), bool):
logger.log_and_raise(exception=Exception, msg='invalid criteria format')
# every metric should pass the rule # every metric should pass the rule
for metric in rule['metrics']: for metric in rule['metrics']:
violate_metric = False
# metric not in raw_data or the value is none, miss test # metric not in raw_data or the value is none, miss test
if metric not in data_row or pd.isna(data_row[metric]): if RuleOp.miss_test(metric, rule, data_row, details, categories):
pass_rule = False violated_metric_num += 1
details.append(metric + '_miss')
categories.add(rule['categories'])
else: else:
violate_metric = False
# check if metric pass the rule # check if metric pass the rule
val = data_row[metric] val = data_row[metric]
baseline = rule['metrics'][metric] baseline = rule['metrics'][metric]
...@@ -95,13 +136,12 @@ def variance(data_row, rule, summary_data_row, details, categories): ...@@ -95,13 +136,12 @@ def variance(data_row, rule, summary_data_row, details, categories):
violate_metric = eval(rule['criteria'])(var) violate_metric = eval(rule['criteria'])(var)
# add issued details and categories # add issued details and categories
if violate_metric: if violate_metric:
pass_rule = False violated_metric_num += 1
info = '(B/L: {:.4f} VAL: {:.4f} VAR: {:.2f}% Rule:{})'.format( info = '(B/L: {:.4f} VAL: {:.4f} VAR: {:.2f}% Rule:{})'.format(
baseline, val, var * 100, rule['criteria'] baseline, val, var * 100, rule['criteria']
) )
details.append(metric + info) RuleOp.add_categories_and_details(metric + info, rule['categories'], details, categories)
categories.add(rule['categories']) return violated_metric_num
return pass_rule
@staticmethod @staticmethod
def value(data_row, rule, summary_data_row, details, categories): def value(data_row, rule, summary_data_row, details, categories):
...@@ -109,43 +149,63 @@ def value(data_row, rule, summary_data_row, details, categories): ...@@ -109,43 +149,63 @@ def value(data_row, rule, summary_data_row, details, categories):
Each metric in the rule will use criteria in the rule Each metric in the rule will use criteria in the rule
to determine whether metric's value meet the criteria, to determine whether metric's value meet the criteria,
if any metric is labeled, the rule is not passed. if any metric meet the criteria, the rule is not passed.
Args: Args:
data_row (pd.Series): raw data of the metrics data_row (pd.Series): raw data of the metrics
rule (dict): rule including function, criteria, metrics with their baseline values and categories rule (dict): rule including function, criteria, metrics with their baseline values and categories
summary_data_row (pd.Series): results of the metrics processed after the function summary_data_row (pd.Series): results of the metrics processed after the function
details (list): defective details including data and rules details (list): details about violated rules and related data
categories (set): categories of violated rules categories (set): categories of violated rules
Returns: Returns:
bool: whether the rule is passed number: the number of the metrics that violate the rule if the rule is not passed, otherwise 0
""" """
pass_rule = True violated_metric_num = 0
# parse criteria and check if valid # parse criteria and check if valid
if not isinstance(eval(rule['criteria'])(0), bool): RuleOp.check_criterion_with_a_value(rule)
logger.log_and_raise(exception=Exception, msg='invalid criteria format')
# every metric should pass the rule # every metric should pass the rule
for metric in rule['metrics']: for metric in rule['metrics']:
violate_metric = False
# metric not in raw_data or the value is none, miss test # metric not in raw_data or the value is none, miss test
if metric not in data_row or pd.isna(data_row[metric]): if RuleOp.miss_test(metric, rule, data_row, details, categories):
pass_rule = False violated_metric_num += 1
details.append(metric + '_miss')
categories.add(rule['categories'])
else: else:
violate_metric = False
# check if metric pass the rule # check if metric pass the rule
val = data_row[metric] val = data_row[metric]
summary_data_row[metric] = val summary_data_row[metric] = val
violate_metric = eval(rule['criteria'])(val) violate_metric = eval(rule['criteria'])(val)
# add issued details and categories # add issued details and categories
if violate_metric: if violate_metric:
pass_rule = False violated_metric_num += 1
info = '(VAL: {:.4f} Rule:{})'.format(val, rule['criteria']) info = '(VAL: {:.4f} Rule:{})'.format(val, rule['criteria'])
details.append(metric + info) RuleOp.add_categories_and_details(metric + info, rule['categories'], details, categories)
categories.add(rule['categories']) return violated_metric_num
return pass_rule
@staticmethod
def multi_rules(rule, details, categories, violation):
"""Rule op function of multi_rules.
The criteria in this rule will use the combined results of multiple previous rules and their metrics
which has been stored in advance to determine whether this rule is passed.
Args:
rule (dict): rule including function, criteria, metrics with their baseline values and categories
details (list): details about violated rules and related data
categories (set): categories of violated rules
violation (dict): the number of the metrics that violate the rules
Returns:
number: 0 if the rule is passed, otherwise 1
"""
violated = eval(rule['criteria'])(violation)
if not isinstance(violated, bool):
logger.log_and_raise(exception=Exception, msg='invalid upper criteria format')
if violated:
info = '{}:{}'.format(rule['name'], rule['criteria'])
RuleOp.add_categories_and_details(info, rule['categories'], details, categories)
return 1 if violated else 0
RuleOp.add_rule_func(DiagnosisRuleType.VARIANCE)(RuleOp.variance) RuleOp.add_rule_func(DiagnosisRuleType.VARIANCE)(RuleOp.variance)
RuleOp.add_rule_func(DiagnosisRuleType.VALUE)(RuleOp.value) RuleOp.add_rule_func(DiagnosisRuleType.VALUE)(RuleOp.value)
RuleOp.add_rule_func(DiagnosisRuleType.MULTI_RULES)(RuleOp.multi_rules)
{ {
"kernel-launch/event_overhead": 0.00596, "kernel-launch/event_overhead": 0.00596,
"kernel-launch/wall_overhead": 0.01026, "kernel-launch/wall_overhead": 0.01026,
"kernel-launch/return_code": 0, "kernel-launch/return_code": 0,
"mem-bw/H2D_Mem_BW": 25.6, "mem-bw/H2D_Mem_BW": 25.6,
"mem-bw/D2H_Mem_BW": 24.3, "mem-bw/D2H_Mem_BW": 24.3,
"mem-bw/D2D_Mem_BW": 1118.0, "mem-bw/D2D_Mem_BW": 1118.0,
"mem-bw/return_code": 0 "mem-bw/return_code": 0
} }
\ No newline at end of file
...@@ -39,16 +39,16 @@ def test_data_diagnosis(self): ...@@ -39,16 +39,16 @@ def test_data_diagnosis(self):
test_baseline_file = str(self.parent_path / 'test_baseline.json') test_baseline_file = str(self.parent_path / 'test_baseline.json')
diag1 = DataDiagnosis() diag1 = DataDiagnosis()
diag1._raw_data_df = file_handler.read_raw_data(test_raw_data) diag1._raw_data_df = file_handler.read_raw_data(test_raw_data)
diag1._metrics = diag1._get_metrics_by_benchmarks(list(diag1._raw_data_df)) diag1._benchmark_metrics_dict = diag1._get_metrics_by_benchmarks(list(diag1._raw_data_df))
assert (len(diag1._raw_data_df) == 3) assert (len(diag1._raw_data_df) == 3)
# Negative case # Negative case
test_raw_data_fake = str(self.parent_path / 'test_results_fake.jsonl') test_raw_data_fake = str(self.parent_path / 'test_results_fake.jsonl')
test_rule_file_fake = str(self.parent_path / 'test_rules_fake.yaml') test_rule_file_fake = str(self.parent_path / 'test_rules_fake.yaml')
diag2 = DataDiagnosis() diag2 = DataDiagnosis()
diag2._raw_data_df = file_handler.read_raw_data(test_raw_data_fake) diag2._raw_data_df = file_handler.read_raw_data(test_raw_data_fake)
diag2._metrics = diag2._get_metrics_by_benchmarks(list(diag2._raw_data_df)) diag2._benchmark_metrics_dict = diag2._get_metrics_by_benchmarks(list(diag2._raw_data_df))
assert (len(diag2._raw_data_df) == 0) assert (len(diag2._raw_data_df) == 0)
assert (len(diag2._metrics) == 0) assert (len(diag2._benchmark_metrics_dict) == 0)
metric_list = [ metric_list = [
'gpu_temperature', 'gpu_power_limit', 'gemm-flops/FP64', 'gpu_temperature', 'gpu_power_limit', 'gemm-flops/FP64',
'bert_models/pytorch-bert-base/steptime_train_float32' 'bert_models/pytorch-bert-base/steptime_train_float32'
...@@ -124,21 +124,24 @@ def test_data_diagnosis(self): ...@@ -124,21 +124,24 @@ def test_data_diagnosis(self):
assert (diag1._get_baseline_of_metric(baseline, 'kernel-launch/event_overhead:0') == 0.00596) assert (diag1._get_baseline_of_metric(baseline, 'kernel-launch/event_overhead:0') == 0.00596)
assert (diag1._get_baseline_of_metric(baseline, 'kernel-launch/return_code') == 0) assert (diag1._get_baseline_of_metric(baseline, 'kernel-launch/return_code') == 0)
assert (diag1._get_baseline_of_metric(baseline, 'mem-bw/H2D:0') == -1) assert (diag1._get_baseline_of_metric(baseline, 'mem-bw/H2D:0') == -1)
# Test - _get_criteria # Test - _parse_rules_and_baseline
# Negative case # Negative case
assert (diag2._get_criteria(test_rule_file_fake, test_baseline_file) is False) fake_rules = file_handler.read_rules(test_rule_file_fake)
baseline = file_handler.read_baseline(test_baseline_file)
assert (diag2._parse_rules_and_baseline(fake_rules, baseline) is False)
diag2 = DataDiagnosis() diag2 = DataDiagnosis()
diag2._raw_data_df = file_handler.read_raw_data(test_raw_data) diag2._raw_data_df = file_handler.read_raw_data(test_raw_data)
diag2._metrics = diag2._get_metrics_by_benchmarks(list(diag2._raw_data_df)) diag2._benchmark_metrics_dict = diag2._get_metrics_by_benchmarks(list(diag2._raw_data_df))
p = Path(test_rule_file) p = Path(test_rule_file)
with p.open() as f: with p.open() as f:
rules = yaml.load(f, Loader=yaml.SafeLoader) rules = yaml.load(f, Loader=yaml.SafeLoader)
rules['superbench']['rules']['fake'] = false_rules[0] rules['superbench']['rules']['fake'] = false_rules[0]
with open(test_rule_file_fake, 'w') as f: with open(test_rule_file_fake, 'w') as f:
yaml.dump(rules, f) yaml.dump(rules, f)
assert (diag1._get_criteria(test_rule_file_fake, test_baseline_file) is False) assert (diag1._parse_rules_and_baseline(fake_rules, baseline) is False)
# Positive case # Positive case
assert (diag1._get_criteria(test_rule_file, test_baseline_file)) rules = file_handler.read_rules(test_rule_file)
assert (diag1._parse_rules_and_baseline(rules, baseline))
# Test - _run_diagnosis_rules_for_single_node # Test - _run_diagnosis_rules_for_single_node
(details_row, summary_data_row) = diag1._run_diagnosis_rules_for_single_node('sb-validation-01') (details_row, summary_data_row) = diag1._run_diagnosis_rules_for_single_node('sb-validation-01')
assert (details_row) assert (details_row)
...@@ -211,3 +214,80 @@ def test_data_diagnosis_run(self): ...@@ -211,3 +214,80 @@ def test_data_diagnosis_run(self):
with Path(expect_result_file).open() as f: with Path(expect_result_file).open() as f:
expect_result = f.read() expect_result = f.read()
assert (data_not_accept_read_from_json == expect_result) assert (data_not_accept_read_from_json == expect_result)
def test_mutli_rules(self):
"""Test multi rules check feature."""
diag1 = DataDiagnosis()
# test _check_rules
false_rules = [
{
'criteria': 'lambda x:x>0',
'categories': 'KernelLaunch',
'store': 'true',
'metrics': ['kernel-launch/event_overhead:\\d+']
}
]
metric = 'kernel-launch/event_overhead:0'
for rules in false_rules:
self.assertRaises(Exception, diag1._check_rules, rules, metric)
# Positive case
true_rules = [
{
'categories': 'KernelLaunch',
'criteria': 'lambda x:x>0.05',
'store': True,
'function': 'variance',
'metrics': ['kernel-launch/event_overhead:\\d+']
}, {
'categories': 'CNN',
'function': 'multi_rules',
'criteria': 'lambda label:True if label["rule1"]+label["rule2"]>=2 else False'
}
]
for rules in true_rules:
assert (diag1._check_rules(rules, metric))
# test _run_diagnosis_rules_for_single_node
rules = {
'superbench': {
'rules': {
'rule1': {
'categories': 'CNN',
'criteria': 'lambda x:x<-0.5',
'store': True,
'function': 'variance',
'metrics': ['mem-bw/D2H_Mem_BW']
},
'rule2': {
'categories': 'CNN',
'criteria': 'lambda x:x<-0.5',
'function': 'variance',
'store': True,
'metrics': ['kernel-launch/wall_overhead']
},
'rule3': {
'categories': 'CNN',
'function': 'multi_rules',
'criteria': 'lambda label:True if label["rule1"]+label["rule2"]>=2 else False'
}
}
}
}
baseline = {
'kernel-launch/wall_overhead': 0.01026,
'mem-bw/D2H_Mem_BW': 24.3,
}
data = {'kernel-launch/wall_overhead': [0.005, 0.005], 'mem-bw/D2H_Mem_BW': [25, 10]}
diag1._raw_data_df = pd.DataFrame(data, index=['sb-validation-04', 'sb-validation-05'])
diag1._benchmark_metrics_dict = diag1._get_metrics_by_benchmarks(list(diag1._raw_data_df.columns))
diag1._parse_rules_and_baseline(rules, baseline)
(details_row, summary_data_row) = diag1._run_diagnosis_rules_for_single_node('sb-validation-04')
assert (not details_row)
(details_row, summary_data_row) = diag1._run_diagnosis_rules_for_single_node('sb-validation-05')
assert (details_row)
assert ('CNN' in details_row[0])
assert (
details_row[1] == 'kernel-launch/wall_overhead(B/L: 0.0103 VAL: 0.0050 VAR: -51.27% Rule:lambda x:x<-0.5),'
+ 'mem-bw/D2H_Mem_BW(B/L: 24.3000 VAL: 10.0000 VAR: -58.85% Rule:lambda x:x<-0.5),' +
'rule3:lambda label:True if label["rule1"]+label["rule2"]>=2 else False'
)
{"node": "sb-validation-01","bert_models/pytorch-bert-base/steptime_train_float32": 114.59167010616511,"bert_models/pytorch-bert-base/throughput_train_float32": 279.8794623591105, "bert_models/pytorch-bert-base/steptime_train_float16": 83.88951083179563, "bert_models/pytorch-bert-base/throughput_train_float16": 382.0672582741963, "bert_models/pytorch-bert-large/steptime_train_float32": 307.9359371913597, "bert_models/pytorch-bert-large/throughput_train_float32": 103.94876097417632, "bert_models/pytorch-bert-large/steptime_train_float16": 206.81141689419746, "bert_models/pytorch-bert-large/throughput_train_float16": 154.84089117113942, "pytorch-computation-communication-overlap/mul_cost:0": 44.182206214372854, "pytorch-computation-communication-overlap/mul_cost:1": 44.18221393893873, "pytorch-computation-communication-overlap/mul_cost:2": 43.970147078084665, "pytorch-computation-communication-overlap/mul_cost:3": 43.97014787559783, "pytorch-computation-communication-overlap/mul_cost:4": 43.97017793166924, "pytorch-computation-communication-overlap/mul_cost:5": 43.97015716062924, "pytorch-computation-communication-overlap/mul_cost:6": 43.97016519828867, "pytorch-computation-communication-overlap/mul_cost:7": 44.179544478538446, "pytorch-computation-communication-overlap/matmul_cost:0": 137.04773705558182, "pytorch-computation-communication-overlap/matmul_cost:1": 137.0478344692856, "pytorch-computation-communication-overlap/matmul_cost:2": 137.04777220902997, "pytorch-computation-communication-overlap/matmul_cost:3": 137.04779697263803, "pytorch-computation-communication-overlap/matmul_cost:4": 137.04813674314664, "pytorch-computation-communication-overlap/matmul_cost:5": 137.04821988767435, "pytorch-computation-communication-overlap/matmul_cost:6": 137.04775322366913, "pytorch-computation-communication-overlap/matmul_cost:7": 137.0478081606734, "densenet_models/pytorch-densenet169/steptime_train_float32": 150.64155543223023, "densenet_models/pytorch-densenet169/throughput_train_float32": 212.47017192425312, "densenet_models/pytorch-densenet169/steptime_train_float16": 145.30819572973996, "densenet_models/pytorch-densenet169/throughput_train_float16": 220.293233730535, "densenet_models/pytorch-densenet201/steptime_train_float32": 182.91117786429822, "densenet_models/pytorch-densenet201/throughput_train_float32": 174.9742278232251, "densenet_models/pytorch-densenet201/steptime_train_float16": 176.3489063596353, "densenet_models/pytorch-densenet201/throughput_train_float16": 181.54465736033902, "gemm-flops/FP64:0": 9031.23, "gemm-flops/FP64:1": 9040.85, "gemm-flops/FP64:2": 9010.56, "gemm-flops/FP64:3": 9041.26, "gemm-flops/FP64:4": 9039.19, "gemm-flops/FP64:5": 9015.69, "gemm-flops/FP64:6": 9022.19, "gemm-flops/FP64:7": 9030.2, "gemm-flops/FP32:0": 18362.1, "gemm-flops/FP32:1": 18375.6, "gemm-flops/FP32:2": 18314.9, "gemm-flops/FP32:3": 18375.6, "gemm-flops/FP32:4": 18368.6, "gemm-flops/FP32:5": 18347.1, "gemm-flops/FP32:6": 18247.4, "gemm-flops/FP32:7": 18318.4, "gemm-flops/FP16:0": 33878.0, "gemm-flops/FP16:1": 33911.1, "gemm-flops/FP16:2": 33769.3, "gemm-flops/FP16:3": 33909.9, "gemm-flops/FP16:4": 33896.5, "gemm-flops/FP16:5": 33798.1, "gemm-flops/FP16:6": 33647.3, "gemm-flops/FP16:7": 33764.8, "gemm-flops/FP64_TC:0": 18963.6, "gemm-flops/FP64_TC:1": 18924.2, "gemm-flops/FP64_TC:2": 18930.3, "gemm-flops/FP64_TC:3": 18971.9, "gemm-flops/FP64_TC:4": 18946.0, "gemm-flops/FP64_TC:5": 18945.0, "gemm-flops/FP64_TC:6": 18822.9, "gemm-flops/FP64_TC:7": 18911.1, "gemm-flops/TF32_TC:0": 127900.0, "gemm-flops/TF32_TC:1": 129094.0, "gemm-flops/TF32_TC:2": 127831.0, "gemm-flops/TF32_TC:3": 128709.0, "gemm-flops/TF32_TC:4": 127388.0, "gemm-flops/TF32_TC:5": 127861.0, "gemm-flops/TF32_TC:6": 128492.0, "gemm-flops/TF32_TC:7": 127720.0, "gemm-flops/BF16_TC:0": 264965.0, "gemm-flops/BF16_TC:1": 266638.0, "gemm-flops/BF16_TC:2": 263151.0, "gemm-flops/BF16_TC:3": 264752.0, "gemm-flops/BF16_TC:4": 263049.0, "gemm-flops/BF16_TC:5": 266605.0, "gemm-flops/BF16_TC:6": 267501.0, "gemm-flops/BF16_TC:7": 263880.0, "gemm-flops/FP16_TC:0": 279474.0, "gemm-flops/FP16_TC:1": 281256.0, "gemm-flops/FP16_TC:2": 277403.0, "gemm-flops/FP16_TC:3": 279147.0, "gemm-flops/FP16_TC:4": 277587.0, "gemm-flops/FP16_TC:5": 281537.0, "gemm-flops/FP16_TC:6": 282132.0, "gemm-flops/FP16_TC:7": 277788.0, "gemm-flops/INT8_TC:0": 475160.0, "gemm-flops/INT8_TC:1": 477725.0, "gemm-flops/INT8_TC:2": 471621.0, "gemm-flops/INT8_TC:3": 473716.0, "gemm-flops/INT8_TC:4": 472124.0, "gemm-flops/INT8_TC:5": 479972.0, "gemm-flops/INT8_TC:6": 481327.0, "gemm-flops/INT8_TC:7": 474710.0, "gemm-flops/INT4_TC:0": 970330.0, "gemm-flops/INT4_TC:1": 976837.0, "gemm-flops/INT4_TC:2": 966003.0, "gemm-flops/INT4_TC:3": 971315.0, "gemm-flops/INT4_TC:4": 964441.0, "gemm-flops/INT4_TC:5": 982461.0, "gemm-flops/INT4_TC:6": 979610.0, "gemm-flops/INT4_TC:7": 968359.0, "gpt_models/pytorch-gpt2-large/steptime_train_float32": 295.0526971835643, "gpt_models/pytorch-gpt2-large/throughput_train_float32": 27.115454396866486, "gpt_models/pytorch-gpt2-large/steptime_train_float16": 194.4957742234692, "gpt_models/pytorch-gpt2-large/throughput_train_float16": 41.139449941061585, "gpu-sm-copy-bw/dtoh:0": 3.91755, "gpu-sm-copy-bw/dtoh:1": 4.45414, "gpu-sm-copy-bw/dtoh:2": 1.26483, "gpu-sm-copy-bw/dtoh:3": 1.30041, "gpu-sm-copy-bw/dtoh:4": 1.31577, "gpu-sm-copy-bw/dtoh:5": 1.27968, "gpu-sm-copy-bw/dtoh:6": 4.47849, "gpu-sm-copy-bw/dtoh:7": 3.96231, "gpu-sm-copy-bw/dtoh:8": 3.91705, "gpu-sm-copy-bw/dtoh:9": 4.45487, "gpu-sm-copy-bw/dtoh:10": 1.26352, "gpu-sm-copy-bw/dtoh:11": 1.2999, "gpu-sm-copy-bw/dtoh:12": 1.31677, "gpu-sm-copy-bw/dtoh:13": 1.27885, "gpu-sm-copy-bw/dtoh:14": 4.47913, "gpu-sm-copy-bw/dtoh:15": 3.95893, "gpu-sm-copy-bw/dtoh:16": 3.91729, "gpu-sm-copy-bw/dtoh:17": 4.45627, "gpu-sm-copy-bw/dtoh:18": 1.26437, "gpu-sm-copy-bw/dtoh:19": 1.30144, "gpu-sm-copy-bw/dtoh:20": 1.31704, "gpu-sm-copy-bw/dtoh:21": 1.27857, "gpu-sm-copy-bw/dtoh:22": 4.47889, "gpu-sm-copy-bw/dtoh:23": 3.95984, "gpu-sm-copy-bw/dtoh:24": 3.92025, "gpu-sm-copy-bw/dtoh:25": 4.45423, "gpu-sm-copy-bw/dtoh:26": 1.26449, "gpu-sm-copy-bw/dtoh:27": 1.29954, "gpu-sm-copy-bw/dtoh:28": 1.31731, "gpu-sm-copy-bw/dtoh:29": 1.27916, "gpu-sm-copy-bw/dtoh:30": 4.4797, "gpu-sm-copy-bw/dtoh:31": 3.96124, "gpu-sm-copy-bw/htod:0": 23.9685, "gpu-sm-copy-bw/htod:1": 23.967, "gpu-sm-copy-bw/htod:2": 19.9898, "gpu-sm-copy-bw/htod:3": 20.2848, "gpu-sm-copy-bw/htod:4": 20.3931, "gpu-sm-copy-bw/htod:5": 20.0888, "gpu-sm-copy-bw/htod:6": 23.9766, "gpu-sm-copy-bw/htod:7": 23.9792, "gpu-sm-copy-bw/htod:8": 23.9497, "gpu-sm-copy-bw/htod:9": 23.9438, "gpu-sm-copy-bw/htod:10": 20.0039, "gpu-sm-copy-bw/htod:11": 20.2469, "gpu-sm-copy-bw/htod:12": 20.4344, "gpu-sm-copy-bw/htod:13": 20.1005, "gpu-sm-copy-bw/htod:14": 23.9491, "gpu-sm-copy-bw/htod:15": 23.9898, "gpu-sm-copy-bw/htod:16": 23.962, "gpu-sm-copy-bw/htod:17": 23.97, "gpu-sm-copy-bw/htod:18": 19.987, "gpu-sm-copy-bw/htod:19": 20.1549, "gpu-sm-copy-bw/htod:20": 20.3931, "gpu-sm-copy-bw/htod:21": 20.1013, "gpu-sm-copy-bw/htod:22": 23.9865, "gpu-sm-copy-bw/htod:23": 23.972, "gpu-sm-copy-bw/htod:24": 23.9569, "gpu-sm-copy-bw/htod:25": 23.9762, "gpu-sm-copy-bw/htod:26": 19.92, "gpu-sm-copy-bw/htod:27": 20.2638, "gpu-sm-copy-bw/htod:28": 20.4419, "gpu-sm-copy-bw/htod:29": 20.1054, "gpu-sm-copy-bw/htod:30": 23.9752, "gpu-sm-copy-bw/htod:31": 23.9631, "ib-loopback/IB_write_512_Avg_0:0": 1492.89, "ib-loopback/IB_write_1024_Avg_0:0": 3224.92, "ib-loopback/IB_write_2048_Avg_0:0": 6714.5, "ib-loopback/IB_write_4096_Avg_0:0": 12871.93, "ib-loopback/IB_write_8192_Avg_0:0": 19990.78, "ib-loopback/IB_write_16384_Avg_0:0": 22172.25, "ib-loopback/IB_write_32768_Avg_0:0": 23073.19, "ib-loopback/IB_write_65536_Avg_0:0": 23527.09, "ib-loopback/IB_write_131072_Avg_0:0": 23805.92, "ib-loopback/IB_write_262144_Avg_0:0": 23380.51, "ib-loopback/IB_write_524288_Avg_0:0": 23856.63, "ib-loopback/IB_write_1048576_Avg_0:0": 23869.58, "ib-loopback/IB_write_2097152_Avg_0:0": 23885.7, "ib-loopback/IB_write_4194304_Avg_0:0": 23914.56, "ib-loopback/IB_write_8388608_Avg_0:0": 23935.21, "ib-loopback/IB_write_512_Avg_1:0": 1479.1, "ib-loopback/IB_write_1024_Avg_1:0": 3246.48, "ib-loopback/IB_write_2048_Avg_1:0": 6754.94, "ib-loopback/IB_write_4096_Avg_1:0": 13101.09, "ib-loopback/IB_write_8192_Avg_1:0": 19945.81, "ib-loopback/IB_write_16384_Avg_1:0": 22631.62, "ib-loopback/IB_write_32768_Avg_1:0": 23381.58, "ib-loopback/IB_write_65536_Avg_1:0": 23515.29, "ib-loopback/IB_write_131072_Avg_1:0": 23756.01, "ib-loopback/IB_write_262144_Avg_1:0": 23875.91, "ib-loopback/IB_write_524288_Avg_1:0": 23911.13, "ib-loopback/IB_write_1048576_Avg_1:0": 23935.42, "ib-loopback/IB_write_2097152_Avg_1:0": 23941.25, "ib-loopback/IB_write_4194304_Avg_1:0": 23922.06, "ib-loopback/IB_write_8388608_Avg_1:0": 23928.59, "ib-loopback/IB_write_512_Avg_2:0": 1505.29, "ib-loopback/IB_write_1024_Avg_2:0": 3215.97, "ib-loopback/IB_write_2048_Avg_2:0": 6745.49, "ib-loopback/IB_write_4096_Avg_2:0": 11548.88, "ib-loopback/IB_write_8192_Avg_2:0": 19432.15, "ib-loopback/IB_write_16384_Avg_2:0": 22765.51, "ib-loopback/IB_write_32768_Avg_2:0": 23235.07, "ib-loopback/IB_write_65536_Avg_2:0": 23620.08, "ib-loopback/IB_write_131072_Avg_2:0": 23759.08, "ib-loopback/IB_write_262144_Avg_2:0": 23859.82, "ib-loopback/IB_write_524288_Avg_2:0": 23775.01, "ib-loopback/IB_write_1048576_Avg_2:0": 23885.96, "ib-loopback/IB_write_2097152_Avg_2:0": 23894.73, "ib-loopback/IB_write_4194304_Avg_2:0": 23909.98, "ib-loopback/IB_write_8388608_Avg_2:0": 23927.21, "ib-loopback/IB_write_512_Avg_3:0": 1474.47, "ib-loopback/IB_write_1024_Avg_3:0": 3344.68, "ib-loopback/IB_write_2048_Avg_3:0": 6606.35, "ib-loopback/IB_write_4096_Avg_3:0": 12071.4, "ib-loopback/IB_write_8192_Avg_3:0": 18408.33, "ib-loopback/IB_write_16384_Avg_3:0": 20789.27, "ib-loopback/IB_write_32768_Avg_3:0": 22469.43, "ib-loopback/IB_write_65536_Avg_3:0": 22777.1, "ib-loopback/IB_write_131072_Avg_3:0": 23461.03, "ib-loopback/IB_write_262144_Avg_3:0": 23397.19, "ib-loopback/IB_write_524288_Avg_3:0": 23526.55, "ib-loopback/IB_write_1048576_Avg_3:0": 23854.76, "ib-loopback/IB_write_2097152_Avg_3:0": 23862.23, "ib-loopback/IB_write_4194304_Avg_3:0": 23931.15, "ib-loopback/IB_write_8388608_Avg_3:0": 23924.44, "ib-loopback/IB_write_512_Avg_4:0": 1523.33, "ib-loopback/IB_write_1024_Avg_4:0": 3233.23, "ib-loopback/IB_write_2048_Avg_4:0": 6792.88, "ib-loopback/IB_write_4096_Avg_4:0": 12616.05, "ib-loopback/IB_write_8192_Avg_4:0": 19324.05, "ib-loopback/IB_write_16384_Avg_4:0": 22082.51, "ib-loopback/IB_write_32768_Avg_4:0": 23294.23, "ib-loopback/IB_write_65536_Avg_4:0": 23546.22, "ib-loopback/IB_write_131072_Avg_4:0": 23727.91, "ib-loopback/IB_write_262144_Avg_4:0": 23843.93, "ib-loopback/IB_write_524288_Avg_4:0": 23905.96, "ib-loopback/IB_write_1048576_Avg_4:0": 23902.37, "ib-loopback/IB_write_2097152_Avg_4:0": 23921.03, "ib-loopback/IB_write_4194304_Avg_4:0": 23921.06, "ib-loopback/IB_write_8388608_Avg_4:0": 23922.4, "ib-loopback/IB_write_512_Avg_5:0": 1506.63, "ib-loopback/IB_write_1024_Avg_5:0": 3261.71, "ib-loopback/IB_write_2048_Avg_5:0": 6752.38, "ib-loopback/IB_write_4096_Avg_5:0": 13592.41, "ib-loopback/IB_write_8192_Avg_5:0": 19989.31, "ib-loopback/IB_write_16384_Avg_5:0": 22560.57, "ib-loopback/IB_write_32768_Avg_5:0": 23389.29, "ib-loopback/IB_write_65536_Avg_5:0": 23503.32, "ib-loopback/IB_write_131072_Avg_5:0": 23741.51, "ib-loopback/IB_write_262144_Avg_5:0": 23866.43, "ib-loopback/IB_write_524288_Avg_5:0": 23898.95, "ib-loopback/IB_write_1048576_Avg_5:0": 23876.36, "ib-loopback/IB_write_2097152_Avg_5:0": 23919.63, "ib-loopback/IB_write_4194304_Avg_5:0": 23924.68, "ib-loopback/IB_write_8388608_Avg_5:0": 23930.37, "ib-loopback/IB_write_512_Avg_6:0": 1467.69, "ib-loopback/IB_write_1024_Avg_6:0": 3157.04, "ib-loopback/IB_write_2048_Avg_6:0": 6494.61, "ib-loopback/IB_write_4096_Avg_6:0": 12883.51, "ib-loopback/IB_write_8192_Avg_6:0": 19207.67, "ib-loopback/IB_write_16384_Avg_6:0": 22519.39, "ib-loopback/IB_write_32768_Avg_6:0": 23323.46, "ib-loopback/IB_write_65536_Avg_6:0": 23523.6, "ib-loopback/IB_write_131072_Avg_6:0": 23626.67, "ib-loopback/IB_write_262144_Avg_6:0": 23836.99, "ib-loopback/IB_write_524288_Avg_6:0": 23904.51, "ib-loopback/IB_write_1048576_Avg_6:0": 23919.07, "ib-loopback/IB_write_2097152_Avg_6:0": 23943.82, "ib-loopback/IB_write_4194304_Avg_6:0": 23936.77, "ib-loopback/IB_write_8388608_Avg_6:0": 23941.57, "ib-loopback/IB_write_512_Avg_7:0": 1505.63, "ib-loopback/IB_write_1024_Avg_7:0": 3259.93, "ib-loopback/IB_write_2048_Avg_7:0": 6738.6, "ib-loopback/IB_write_4096_Avg_7:0": 13352.06, "ib-loopback/IB_write_8192_Avg_7:0": 19941.35, "ib-loopback/IB_write_16384_Avg_7:0": 22566.09, "ib-loopback/IB_write_32768_Avg_7:0": 23244.77, "ib-loopback/IB_write_65536_Avg_7:0": 23377.67, "ib-loopback/IB_write_131072_Avg_7:0": 23736.17, "ib-loopback/IB_write_262144_Avg_7:0": 23829.25, "ib-loopback/IB_write_524288_Avg_7:0": 23879.6, "ib-loopback/IB_write_1048576_Avg_7:0": 23895.1, "ib-loopback/IB_write_2097152_Avg_7:0": 23930.64, "ib-loopback/IB_write_4194304_Avg_7:0": 23845.63, "ib-loopback/IB_write_8388608_Avg_7:0": 23896.94, "kernel-launch/return_code":0, "kernel-launch/event_overhead:0": 0.1, "kernel-launch/event_overhead:1": 0.00595, "kernel-launch/event_overhead:2": 0.00557, "kernel-launch/event_overhead:3": 0.0055, "kernel-launch/event_overhead:4": 0.00592, "kernel-launch/event_overhead:5": 0.00589, "kernel-launch/event_overhead:6": 0.00572, "kernel-launch/event_overhead:7": 0.0059, "kernel-launch/wall_overhead:0": 0.01026, "kernel-launch/wall_overhead:1": 0.01026, "kernel-launch/wall_overhead:2": 0.01046, "kernel-launch/wall_overhead:3": 0.01049, "kernel-launch/wall_overhead:4": 0.01063, "kernel-launch/wall_overhead:5": 0.01006, "kernel-launch/wall_overhead:6": 0.01045, "kernel-launch/wall_overhead:7": 0.01071, "lstm_models/pytorch-lstm/steptime_train_float32": 48.07024518959224, "lstm_models/pytorch-lstm/throughput_train_float32": 4806.472441132788, "lstm_models/pytorch-lstm/steptime_train_float16": 25.95312986522913, "lstm_models/pytorch-lstm/throughput_train_float16": 9069.90809255883, "pytorch-matmul/nosharding": 34.64499759674072, "mem-bw/return_code":0, "mem-bw/H2D_Mem_BW:0": 25.6, "mem-bw/H2D_Mem_BW:1": 25.8, "mem-bw/H2D_Mem_BW:2": 26.0, "mem-bw/H2D_Mem_BW:3": 26.1, "mem-bw/H2D_Mem_BW:4": 26.2, "mem-bw/H2D_Mem_BW:5": 25.8, "mem-bw/H2D_Mem_BW:6": 25.3, "mem-bw/H2D_Mem_BW:7": 26.1, "mem-bw/D2H_Mem_BW:0": 24.3, "mem-bw/D2H_Mem_BW:1": 24.6, "mem-bw/D2H_Mem_BW:2": 24.5, "mem-bw/D2H_Mem_BW:3": 24.6, "mem-bw/D2H_Mem_BW:4": 24.3, "mem-bw/D2H_Mem_BW:5": 24.3, "mem-bw/D2H_Mem_BW:6": 23.9, "mem-bw/D2H_Mem_BW:7": 24.6, "mem-bw/D2D_Mem_BW:0": 1118.0, "mem-bw/D2D_Mem_BW:1": 1114.6, "mem-bw/D2D_Mem_BW:2": 1119.7, "mem-bw/D2D_Mem_BW:3": 1121.9, "mem-bw/D2D_Mem_BW:4": 1109.7, "mem-bw/D2D_Mem_BW:5": 1110.1, "mem-bw/D2D_Mem_BW:6": 1123.3, "mem-bw/D2D_Mem_BW:7": 1117.6, "nccl-bw/allreduce_8_busbw:0": 0.0, "nccl-bw/allreduce_8_algbw:0": 0.0, "nccl-bw/allreduce_8_time:0": 37.84, "nccl-bw/allreduce_16_busbw:0": 0.0, "nccl-bw/allreduce_16_algbw:0": 0.0, "nccl-bw/allreduce_16_time:0": 36.42, "nccl-bw/allreduce_32_busbw:0": 0.0, "nccl-bw/allreduce_32_algbw:0": 0.0, "nccl-bw/allreduce_32_time:0": 36.87, "nccl-bw/allreduce_64_busbw:0": 0.0, "nccl-bw/allreduce_64_algbw:0": 0.0, "nccl-bw/allreduce_64_time:0": 35.83, "nccl-bw/allreduce_128_busbw:0": 0.01, "nccl-bw/allreduce_128_algbw:0": 0.0, "nccl-bw/allreduce_128_time:0": 36.91, "nccl-bw/allreduce_256_busbw:0": 0.01, "nccl-bw/allreduce_256_algbw:0": 0.01, "nccl-bw/allreduce_256_time:0": 37.58, "nccl-bw/allreduce_512_busbw:0": 0.02, "nccl-bw/allreduce_512_algbw:0": 0.01, "nccl-bw/allreduce_512_time:0": 36.98, "nccl-bw/allreduce_1024_busbw:0": 0.05, "nccl-bw/allreduce_1024_algbw:0": 0.03, "nccl-bw/allreduce_1024_time:0": 36.93, "nccl-bw/allreduce_2048_busbw:0": 0.1, "nccl-bw/allreduce_2048_algbw:0": 0.06, "nccl-bw/allreduce_2048_time:0": 36.06, "nccl-bw/allreduce_4096_busbw:0": 0.19, "nccl-bw/allreduce_4096_algbw:0": 0.11, "nccl-bw/allreduce_4096_time:0": 37.2, "nccl-bw/allreduce_8192_busbw:0": 0.39, "nccl-bw/allreduce_8192_algbw:0": 0.22, "nccl-bw/allreduce_8192_time:0": 37.04, "nccl-bw/allreduce_16384_busbw:0": 0.77, "nccl-bw/allreduce_16384_algbw:0": 0.44, "nccl-bw/allreduce_16384_time:0": 37.46, "nccl-bw/allreduce_32768_busbw:0": 1.52, "nccl-bw/allreduce_32768_algbw:0": 0.87, "nccl-bw/allreduce_32768_time:0": 37.64, "nccl-bw/allreduce_65536_busbw:0": 3.0, "nccl-bw/allreduce_65536_algbw:0": 1.71, "nccl-bw/allreduce_65536_time:0": 38.22, "nccl-bw/allreduce_131072_busbw:0": 5.31, "nccl-bw/allreduce_131072_algbw:0": 3.04, "nccl-bw/allreduce_131072_time:0": 43.17, "nccl-bw/allreduce_262144_busbw:0": 9.5, "nccl-bw/allreduce_262144_algbw:0": 5.43, "nccl-bw/allreduce_262144_time:0": 48.29, "nccl-bw/allreduce_524288_busbw:0": 15.11, "nccl-bw/allreduce_524288_algbw:0": 8.64, "nccl-bw/allreduce_524288_time:0": 60.71, "nccl-bw/allreduce_1048576_busbw:0": 24.1, "nccl-bw/allreduce_1048576_algbw:0": 13.77, "nccl-bw/allreduce_1048576_time:0": 76.13, "nccl-bw/allreduce_2097152_busbw:0": 38.12, "nccl-bw/allreduce_2097152_algbw:0": 21.78, "nccl-bw/allreduce_2097152_time:0": 96.28, "nccl-bw/allreduce_4194304_busbw:0": 65.75, "nccl-bw/allreduce_4194304_algbw:0": 37.57, "nccl-bw/allreduce_4194304_time:0": 111.6, "nccl-bw/allreduce_8388608_busbw:0": 89.51, "nccl-bw/allreduce_8388608_algbw:0": 51.15, "nccl-bw/allreduce_8388608_time:0": 164.0, "nccl-bw/allreduce_16777216_busbw:0": 114.38, "nccl-bw/allreduce_16777216_algbw:0": 65.36, "nccl-bw/allreduce_16777216_time:0": 256.7, "nccl-bw/allreduce_33554432_busbw:0": 154.89, "nccl-bw/allreduce_33554432_algbw:0": 88.51, "nccl-bw/allreduce_33554432_time:0": 379.1, "nccl-bw/allreduce_67108864_busbw:0": 200.01, "nccl-bw/allreduce_67108864_algbw:0": 114.29, "nccl-bw/allreduce_67108864_time:0": 587.2, "nccl-bw/allreduce_134217728_busbw:0": 202.97, "nccl-bw/allreduce_134217728_algbw:0": 115.98, "nccl-bw/allreduce_134217728_time:0": 1157.2, "nccl-bw/allreduce_268435456_busbw:0": 221.82, "nccl-bw/allreduce_268435456_algbw:0": 126.75, "nccl-bw/allreduce_268435456_time:0": 2117.8, "nccl-bw/allreduce_536870912_busbw:0": 224.54, "nccl-bw/allreduce_536870912_algbw:0": 128.31, "nccl-bw/allreduce_536870912_time:0": 4184.2, "nccl-bw/allreduce_1073741824_busbw:0": 230.15, "nccl-bw/allreduce_1073741824_algbw:0": 131.51, "nccl-bw/allreduce_1073741824_time:0": 8164.5, "nccl-bw/allreduce_2147483648_busbw:0": 231.89, "nccl-bw/allreduce_2147483648_algbw:0": 132.51, "nccl-bw/allreduce_2147483648_time:0": 16207.0, "nccl-bw/allreduce_4294967296_busbw:0": 234.45, "nccl-bw/allreduce_4294967296_algbw:0": 133.97, "nccl-bw/allreduce_4294967296_time:0": 32059.0, "nccl-bw/allreduce_8589934592_busbw:0": 235.36, "nccl-bw/allreduce_8589934592_algbw:0": 134.49, "nccl-bw/allreduce_8589934592_time:0": 63870.0, "resnet_models/pytorch-resnet50/steptime_train_float32": 253.95522732287645, "resnet_models/pytorch-resnet50/throughput_train_float32": 760.3348099129964, "resnet_models/pytorch-resnet50/steptime_train_float16": 200.08606184273958, "resnet_models/pytorch-resnet50/throughput_train_float16": 971.0651430922575, "resnet_models/pytorch-resnet101/steptime_train_float32": 389.08605091273785, "resnet_models/pytorch-resnet101/throughput_train_float32": 496.11747409298965, "resnet_models/pytorch-resnet101/steptime_train_float16": 308.6274107918143, "resnet_models/pytorch-resnet101/throughput_train_float16": 627.2056272195069, "resnet_models/pytorch-resnet152/steptime_train_float32": 547.6558278314769, "resnet_models/pytorch-resnet152/throughput_train_float32": 352.07099543348215, "resnet_models/pytorch-resnet152/steptime_train_float16": 424.5809856802225, "resnet_models/pytorch-resnet152/throughput_train_float16": 454.8335998153649, "pytorch-sharding-matmul/allreduce": 10.574411869049072, "pytorch-sharding-matmul/allgather": 10.084696769714355, "vgg_models/pytorch-vgg11/steptime_train_float32": 40.35283671692014, "vgg_models/pytorch-vgg11/throughput_train_float32": 796.3615936949874, "vgg_models/pytorch-vgg11/steptime_train_float16": 24.133514845743775, "vgg_models/pytorch-vgg11/throughput_train_float16": 1330.411361458461, "vgg_models/pytorch-vgg13/steptime_train_float32": 55.466310936026275, "vgg_models/pytorch-vgg13/throughput_train_float32": 580.234107444399, "vgg_models/pytorch-vgg13/steptime_train_float16": 33.35228993091732, "vgg_models/pytorch-vgg13/throughput_train_float16": 962.5332023901524, "vgg_models/pytorch-vgg16/steptime_train_float32": 65.22519944701344, "vgg_models/pytorch-vgg16/throughput_train_float32": 493.4268638875934, "vgg_models/pytorch-vgg16/steptime_train_float16": 39.25287735182792, "vgg_models/pytorch-vgg16/throughput_train_float16": 817.2008546147621, "vgg_models/pytorch-vgg19/steptime_train_float32": 74.93487105239183, "vgg_models/pytorch-vgg19/throughput_train_float32": 429.80921583106164, "vgg_models/pytorch-vgg19/steptime_train_float16": 45.20330624654889, "vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377091} {"node": "sb-validation-01","bert_models/pytorch-bert-base/steptime_train_float32": 114.59167010616511,"bert_models/pytorch-bert-base/throughput_train_float32": 279.8794623591105, "bert_models/pytorch-bert-base/steptime_train_float16": 83.88951083179563, "bert_models/pytorch-bert-base/throughput_train_float16": 382.0672582741963, "bert_models/pytorch-bert-large/steptime_train_float32": 307.9359371913597, "bert_models/pytorch-bert-large/throughput_train_float32": 103.94876097417632, "bert_models/pytorch-bert-large/steptime_train_float16": 206.81141689419746, "bert_models/pytorch-bert-large/throughput_train_float16": 154.84089117113942, "pytorch-computation-communication-overlap/mul_cost:0": 44.182206214372854, "pytorch-computation-communication-overlap/mul_cost:1": 44.18221393893873, "pytorch-computation-communication-overlap/mul_cost:2": 43.970147078084665, "pytorch-computation-communication-overlap/mul_cost:3": 43.97014787559783, "pytorch-computation-communication-overlap/mul_cost:4": 43.97017793166924, "pytorch-computation-communication-overlap/mul_cost:5": 43.97015716062924, "pytorch-computation-communication-overlap/mul_cost:6": 43.97016519828867, "pytorch-computation-communication-overlap/mul_cost:7": 44.179544478538446, "pytorch-computation-communication-overlap/matmul_cost:0": 137.04773705558182, "pytorch-computation-communication-overlap/matmul_cost:1": 137.0478344692856, "pytorch-computation-communication-overlap/matmul_cost:2": 137.04777220902997, "pytorch-computation-communication-overlap/matmul_cost:3": 137.04779697263803, "pytorch-computation-communication-overlap/matmul_cost:4": 137.04813674314664, "pytorch-computation-communication-overlap/matmul_cost:5": 137.04821988767435, "pytorch-computation-communication-overlap/matmul_cost:6": 137.04775322366913, "pytorch-computation-communication-overlap/matmul_cost:7": 137.0478081606734, "densenet_models/pytorch-densenet169/steptime_train_float32": 150.64155543223023, "densenet_models/pytorch-densenet169/throughput_train_float32": 212.47017192425312, "densenet_models/pytorch-densenet169/steptime_train_float16": 145.30819572973996, "densenet_models/pytorch-densenet169/throughput_train_float16": 220.293233730535, "densenet_models/pytorch-densenet201/steptime_train_float32": 182.91117786429822, "densenet_models/pytorch-densenet201/throughput_train_float32": 174.9742278232251, "densenet_models/pytorch-densenet201/steptime_train_float16": 176.3489063596353, "densenet_models/pytorch-densenet201/throughput_train_float16": 181.54465736033902, "gemm-flops/FP64:0": 9031.23, "gemm-flops/FP64:1": 9040.85, "gemm-flops/FP64:2": 9010.56, "gemm-flops/FP64:3": 9041.26, "gemm-flops/FP64:4": 9039.19, "gemm-flops/FP64:5": 9015.69, "gemm-flops/FP64:6": 9022.19, "gemm-flops/FP64:7": 9030.2, "gemm-flops/FP32:0": 18362.1, "gemm-flops/FP32:1": 18375.6, "gemm-flops/FP32:2": 18314.9, "gemm-flops/FP32:3": 18375.6, "gemm-flops/FP32:4": 18368.6, "gemm-flops/FP32:5": 18347.1, "gemm-flops/FP32:6": 18247.4, "gemm-flops/FP32:7": 18318.4, "gemm-flops/FP16:0": 33878.0, "gemm-flops/FP16:1": 33911.1, "gemm-flops/FP16:2": 33769.3, "gemm-flops/FP16:3": 33909.9, "gemm-flops/FP16:4": 33896.5, "gemm-flops/FP16:5": 33798.1, "gemm-flops/FP16:6": 33647.3, "gemm-flops/FP16:7": 33764.8, "gemm-flops/FP64_TC:0": 18963.6, "gemm-flops/FP64_TC:1": 18924.2, "gemm-flops/FP64_TC:2": 18930.3, "gemm-flops/FP64_TC:3": 18971.9, "gemm-flops/FP64_TC:4": 18946.0, "gemm-flops/FP64_TC:5": 18945.0, "gemm-flops/FP64_TC:6": 18822.9, "gemm-flops/FP64_TC:7": 18911.1, "gemm-flops/TF32_TC:0": 127900.0, "gemm-flops/TF32_TC:1": 129094.0, "gemm-flops/TF32_TC:2": 127831.0, "gemm-flops/TF32_TC:3": 128709.0, "gemm-flops/TF32_TC:4": 127388.0, "gemm-flops/TF32_TC:5": 127861.0, "gemm-flops/TF32_TC:6": 128492.0, "gemm-flops/TF32_TC:7": 127720.0, "gemm-flops/BF16_TC:0": 264965.0, "gemm-flops/BF16_TC:1": 266638.0, "gemm-flops/BF16_TC:2": 263151.0, "gemm-flops/BF16_TC:3": 264752.0, "gemm-flops/BF16_TC:4": 263049.0, "gemm-flops/BF16_TC:5": 266605.0, "gemm-flops/BF16_TC:6": 267501.0, "gemm-flops/BF16_TC:7": 263880.0, "gemm-flops/FP16_TC:0": 279474.0, "gemm-flops/FP16_TC:1": 281256.0, "gemm-flops/FP16_TC:2": 277403.0, "gemm-flops/FP16_TC:3": 279147.0, "gemm-flops/FP16_TC:4": 277587.0, "gemm-flops/FP16_TC:5": 281537.0, "gemm-flops/FP16_TC:6": 282132.0, "gemm-flops/FP16_TC:7": 277788.0, "gemm-flops/INT8_TC:0": 475160.0, "gemm-flops/INT8_TC:1": 477725.0, "gemm-flops/INT8_TC:2": 471621.0, "gemm-flops/INT8_TC:3": 473716.0, "gemm-flops/INT8_TC:4": 472124.0, "gemm-flops/INT8_TC:5": 479972.0, "gemm-flops/INT8_TC:6": 481327.0, "gemm-flops/INT8_TC:7": 474710.0, "gemm-flops/INT4_TC:0": 970330.0, "gemm-flops/INT4_TC:1": 976837.0, "gemm-flops/INT4_TC:2": 966003.0, "gemm-flops/INT4_TC:3": 971315.0, "gemm-flops/INT4_TC:4": 964441.0, "gemm-flops/INT4_TC:5": 982461.0, "gemm-flops/INT4_TC:6": 979610.0, "gemm-flops/INT4_TC:7": 968359.0, "gpt_models/pytorch-gpt2-large/steptime_train_float32": 295.0526971835643, "gpt_models/pytorch-gpt2-large/throughput_train_float32": 27.115454396866486, "gpt_models/pytorch-gpt2-large/steptime_train_float16": 194.4957742234692, "gpt_models/pytorch-gpt2-large/throughput_train_float16": 41.139449941061585, "gpu-sm-copy-bw/dtoh:0": 3.91755, "gpu-sm-copy-bw/dtoh:1": 4.45414, "gpu-sm-copy-bw/dtoh:2": 1.26483, "gpu-sm-copy-bw/dtoh:3": 1.30041, "gpu-sm-copy-bw/dtoh:4": 1.31577, "gpu-sm-copy-bw/dtoh:5": 1.27968, "gpu-sm-copy-bw/dtoh:6": 4.47849, "gpu-sm-copy-bw/dtoh:7": 3.96231, "gpu-sm-copy-bw/dtoh:8": 3.91705, "gpu-sm-copy-bw/dtoh:9": 4.45487, "gpu-sm-copy-bw/dtoh:10": 1.26352, "gpu-sm-copy-bw/dtoh:11": 1.2999, "gpu-sm-copy-bw/dtoh:12": 1.31677, "gpu-sm-copy-bw/dtoh:13": 1.27885, "gpu-sm-copy-bw/dtoh:14": 4.47913, "gpu-sm-copy-bw/dtoh:15": 3.95893, "gpu-sm-copy-bw/dtoh:16": 3.91729, "gpu-sm-copy-bw/dtoh:17": 4.45627, "gpu-sm-copy-bw/dtoh:18": 1.26437, "gpu-sm-copy-bw/dtoh:19": 1.30144, "gpu-sm-copy-bw/dtoh:20": 1.31704, "gpu-sm-copy-bw/dtoh:21": 1.27857, "gpu-sm-copy-bw/dtoh:22": 4.47889, "gpu-sm-copy-bw/dtoh:23": 3.95984, "gpu-sm-copy-bw/dtoh:24": 3.92025, "gpu-sm-copy-bw/dtoh:25": 4.45423, "gpu-sm-copy-bw/dtoh:26": 1.26449, "gpu-sm-copy-bw/dtoh:27": 1.29954, "gpu-sm-copy-bw/dtoh:28": 1.31731, "gpu-sm-copy-bw/dtoh:29": 1.27916, "gpu-sm-copy-bw/dtoh:30": 4.4797, "gpu-sm-copy-bw/dtoh:31": 3.96124, "gpu-sm-copy-bw/htod:0": 23.9685, "gpu-sm-copy-bw/htod:1": 23.967, "gpu-sm-copy-bw/htod:2": 19.9898, "gpu-sm-copy-bw/htod:3": 20.2848, "gpu-sm-copy-bw/htod:4": 20.3931, "gpu-sm-copy-bw/htod:5": 20.0888, "gpu-sm-copy-bw/htod:6": 23.9766, "gpu-sm-copy-bw/htod:7": 23.9792, "gpu-sm-copy-bw/htod:8": 23.9497, "gpu-sm-copy-bw/htod:9": 23.9438, "gpu-sm-copy-bw/htod:10": 20.0039, "gpu-sm-copy-bw/htod:11": 20.2469, "gpu-sm-copy-bw/htod:12": 20.4344, "gpu-sm-copy-bw/htod:13": 20.1005, "gpu-sm-copy-bw/htod:14": 23.9491, "gpu-sm-copy-bw/htod:15": 23.9898, "gpu-sm-copy-bw/htod:16": 23.962, "gpu-sm-copy-bw/htod:17": 23.97, "gpu-sm-copy-bw/htod:18": 19.987, "gpu-sm-copy-bw/htod:19": 20.1549, "gpu-sm-copy-bw/htod:20": 20.3931, "gpu-sm-copy-bw/htod:21": 20.1013, "gpu-sm-copy-bw/htod:22": 23.9865, "gpu-sm-copy-bw/htod:23": 23.972, "gpu-sm-copy-bw/htod:24": 23.9569, "gpu-sm-copy-bw/htod:25": 23.9762, "gpu-sm-copy-bw/htod:26": 19.92, "gpu-sm-copy-bw/htod:27": 20.2638, "gpu-sm-copy-bw/htod:28": 20.4419, "gpu-sm-copy-bw/htod:29": 20.1054, "gpu-sm-copy-bw/htod:30": 23.9752, "gpu-sm-copy-bw/htod:31": 23.9631, "ib-loopback/IB_write_512_Avg_0:0": 1492.89, "ib-loopback/IB_write_1024_Avg_0:0": 3224.92, "ib-loopback/IB_write_2048_Avg_0:0": 6714.5, "ib-loopback/IB_write_4096_Avg_0:0": 12871.93, "ib-loopback/IB_write_8192_Avg_0:0": 19990.78, "ib-loopback/IB_write_16384_Avg_0:0": 22172.25, "ib-loopback/IB_write_32768_Avg_0:0": 23073.19, "ib-loopback/IB_write_65536_Avg_0:0": 23527.09, "ib-loopback/IB_write_131072_Avg_0:0": 23805.92, "ib-loopback/IB_write_262144_Avg_0:0": 23380.51, "ib-loopback/IB_write_524288_Avg_0:0": 23856.63, "ib-loopback/IB_write_1048576_Avg_0:0": 23869.58, "ib-loopback/IB_write_2097152_Avg_0:0": 23885.7, "ib-loopback/IB_write_4194304_Avg_0:0": 23914.56, "ib-loopback/IB_write_8388608_Avg_0:0": 23935.21, "ib-loopback/IB_write_512_Avg_1:0": 1479.1, "ib-loopback/IB_write_1024_Avg_1:0": 3246.48, "ib-loopback/IB_write_2048_Avg_1:0": 6754.94, "ib-loopback/IB_write_4096_Avg_1:0": 13101.09, "ib-loopback/IB_write_8192_Avg_1:0": 19945.81, "ib-loopback/IB_write_16384_Avg_1:0": 22631.62, "ib-loopback/IB_write_32768_Avg_1:0": 23381.58, "ib-loopback/IB_write_65536_Avg_1:0": 23515.29, "ib-loopback/IB_write_131072_Avg_1:0": 23756.01, "ib-loopback/IB_write_262144_Avg_1:0": 23875.91, "ib-loopback/IB_write_524288_Avg_1:0": 23911.13, "ib-loopback/IB_write_1048576_Avg_1:0": 23935.42, "ib-loopback/IB_write_2097152_Avg_1:0": 23941.25, "ib-loopback/IB_write_4194304_Avg_1:0": 23922.06, "ib-loopback/IB_write_8388608_Avg_1:0": 23928.59, "ib-loopback/IB_write_512_Avg_2:0": 1505.29, "ib-loopback/IB_write_1024_Avg_2:0": 3215.97, "ib-loopback/IB_write_2048_Avg_2:0": 6745.49, "ib-loopback/IB_write_4096_Avg_2:0": 11548.88, "ib-loopback/IB_write_8192_Avg_2:0": 19432.15, "ib-loopback/IB_write_16384_Avg_2:0": 22765.51, "ib-loopback/IB_write_32768_Avg_2:0": 23235.07, "ib-loopback/IB_write_65536_Avg_2:0": 23620.08, "ib-loopback/IB_write_131072_Avg_2:0": 23759.08, "ib-loopback/IB_write_262144_Avg_2:0": 23859.82, "ib-loopback/IB_write_524288_Avg_2:0": 23775.01, "ib-loopback/IB_write_1048576_Avg_2:0": 23885.96, "ib-loopback/IB_write_2097152_Avg_2:0": 23894.73, "ib-loopback/IB_write_4194304_Avg_2:0": 23909.98, "ib-loopback/IB_write_8388608_Avg_2:0": 23927.21, "ib-loopback/IB_write_512_Avg_3:0": 1474.47, "ib-loopback/IB_write_1024_Avg_3:0": 3344.68, "ib-loopback/IB_write_2048_Avg_3:0": 6606.35, "ib-loopback/IB_write_4096_Avg_3:0": 12071.4, "ib-loopback/IB_write_8192_Avg_3:0": 18408.33, "ib-loopback/IB_write_16384_Avg_3:0": 20789.27, "ib-loopback/IB_write_32768_Avg_3:0": 22469.43, "ib-loopback/IB_write_65536_Avg_3:0": 22777.1, "ib-loopback/IB_write_131072_Avg_3:0": 23461.03, "ib-loopback/IB_write_262144_Avg_3:0": 23397.19, "ib-loopback/IB_write_524288_Avg_3:0": 23526.55, "ib-loopback/IB_write_1048576_Avg_3:0": 23854.76, "ib-loopback/IB_write_2097152_Avg_3:0": 23862.23, "ib-loopback/IB_write_4194304_Avg_3:0": 23931.15, "ib-loopback/IB_write_8388608_Avg_3:0": 23924.44, "ib-loopback/IB_write_512_Avg_4:0": 1523.33, "ib-loopback/IB_write_1024_Avg_4:0": 3233.23, "ib-loopback/IB_write_2048_Avg_4:0": 6792.88, "ib-loopback/IB_write_4096_Avg_4:0": 12616.05, "ib-loopback/IB_write_8192_Avg_4:0": 19324.05, "ib-loopback/IB_write_16384_Avg_4:0": 22082.51, "ib-loopback/IB_write_32768_Avg_4:0": 23294.23, "ib-loopback/IB_write_65536_Avg_4:0": 23546.22, "ib-loopback/IB_write_131072_Avg_4:0": 23727.91, "ib-loopback/IB_write_262144_Avg_4:0": 23843.93, "ib-loopback/IB_write_524288_Avg_4:0": 23905.96, "ib-loopback/IB_write_1048576_Avg_4:0": 23902.37, "ib-loopback/IB_write_2097152_Avg_4:0": 23921.03, "ib-loopback/IB_write_4194304_Avg_4:0": 23921.06, "ib-loopback/IB_write_8388608_Avg_4:0": 23922.4, "ib-loopback/IB_write_512_Avg_5:0": 1506.63, "ib-loopback/IB_write_1024_Avg_5:0": 3261.71, "ib-loopback/IB_write_2048_Avg_5:0": 6752.38, "ib-loopback/IB_write_4096_Avg_5:0": 13592.41, "ib-loopback/IB_write_8192_Avg_5:0": 19989.31, "ib-loopback/IB_write_16384_Avg_5:0": 22560.57, "ib-loopback/IB_write_32768_Avg_5:0": 23389.29, "ib-loopback/IB_write_65536_Avg_5:0": 23503.32, "ib-loopback/IB_write_131072_Avg_5:0": 23741.51, "ib-loopback/IB_write_262144_Avg_5:0": 23866.43, "ib-loopback/IB_write_524288_Avg_5:0": 23898.95, "ib-loopback/IB_write_1048576_Avg_5:0": 23876.36, "ib-loopback/IB_write_2097152_Avg_5:0": 23919.63, "ib-loopback/IB_write_4194304_Avg_5:0": 23924.68, "ib-loopback/IB_write_8388608_Avg_5:0": 23930.37, "ib-loopback/IB_write_512_Avg_6:0": 1467.69, "ib-loopback/IB_write_1024_Avg_6:0": 3157.04, "ib-loopback/IB_write_2048_Avg_6:0": 6494.61, "ib-loopback/IB_write_4096_Avg_6:0": 12883.51, "ib-loopback/IB_write_8192_Avg_6:0": 19207.67, "ib-loopback/IB_write_16384_Avg_6:0": 22519.39, "ib-loopback/IB_write_32768_Avg_6:0": 23323.46, "ib-loopback/IB_write_65536_Avg_6:0": 23523.6, "ib-loopback/IB_write_131072_Avg_6:0": 23626.67, "ib-loopback/IB_write_262144_Avg_6:0": 23836.99, "ib-loopback/IB_write_524288_Avg_6:0": 23904.51, "ib-loopback/IB_write_1048576_Avg_6:0": 23919.07, "ib-loopback/IB_write_2097152_Avg_6:0": 23943.82, "ib-loopback/IB_write_4194304_Avg_6:0": 23936.77, "ib-loopback/IB_write_8388608_Avg_6:0": 23941.57, "ib-loopback/IB_write_512_Avg_7:0": 1505.63, "ib-loopback/IB_write_1024_Avg_7:0": 3259.93, "ib-loopback/IB_write_2048_Avg_7:0": 6738.6, "ib-loopback/IB_write_4096_Avg_7:0": 13352.06, "ib-loopback/IB_write_8192_Avg_7:0": 19941.35, "ib-loopback/IB_write_16384_Avg_7:0": 22566.09, "ib-loopback/IB_write_32768_Avg_7:0": 23244.77, "ib-loopback/IB_write_65536_Avg_7:0": 23377.67, "ib-loopback/IB_write_131072_Avg_7:0": 23736.17, "ib-loopback/IB_write_262144_Avg_7:0": 23829.25, "ib-loopback/IB_write_524288_Avg_7:0": 23879.6, "ib-loopback/IB_write_1048576_Avg_7:0": 23895.1, "ib-loopback/IB_write_2097152_Avg_7:0": 23930.64, "ib-loopback/IB_write_4194304_Avg_7:0": 23845.63, "ib-loopback/IB_write_8388608_Avg_7:0": 23896.94, "kernel-launch/return_code":0, "kernel-launch/event_overhead:0": 0.1, "kernel-launch/event_overhead:1": 0.00595, "kernel-launch/event_overhead:2": 0.00557, "kernel-launch/event_overhead:3": 0.0055, "kernel-launch/event_overhead:4": 0.00592, "kernel-launch/event_overhead:5": 0.00589, "kernel-launch/event_overhead:6": 0.00572, "kernel-launch/event_overhead:7": 0.0059, "kernel-launch/wall_overhead:0": 0.01026, "kernel-launch/wall_overhead:1": 0.01026, "kernel-launch/wall_overhead:2": 0.01046, "kernel-launch/wall_overhead:3": 0.01049, "kernel-launch/wall_overhead:4": 0.01063, "kernel-launch/wall_overhead:5": 0.01006, "kernel-launch/wall_overhead:6": 0.01045, "kernel-launch/wall_overhead:7": 0.01071, "lstm_models/pytorch-lstm/steptime_train_float32": 48.07024518959224, "lstm_models/pytorch-lstm/throughput_train_float32": 4806.472441132788, "lstm_models/pytorch-lstm/steptime_train_float16": 25.95312986522913, "lstm_models/pytorch-lstm/throughput_train_float16": 9069.90809255883, "pytorch-matmul/nosharding": 34.64499759674072, "mem-bw/return_code":0, "mem-bw/H2D_Mem_BW:0": 25.6, "mem-bw/H2D_Mem_BW:1": 25.8, "mem-bw/H2D_Mem_BW:2": 26.0, "mem-bw/H2D_Mem_BW:3": 26.1, "mem-bw/H2D_Mem_BW:4": 26.2, "mem-bw/H2D_Mem_BW:5": 25.8, "mem-bw/H2D_Mem_BW:6": 25.3, "mem-bw/H2D_Mem_BW:7": 26.1, "mem-bw/D2H_Mem_BW:0": 24.3, "mem-bw/D2H_Mem_BW:1": 24.6, "mem-bw/D2H_Mem_BW:2": 24.5, "mem-bw/D2H_Mem_BW:3": 24.6, "mem-bw/D2H_Mem_BW:4": 24.3, "mem-bw/D2H_Mem_BW:5": 24.3, "mem-bw/D2H_Mem_BW:6": 23.9, "mem-bw/D2H_Mem_BW:7": 24.6, "mem-bw/D2D_Mem_BW:0": 1118.0, "mem-bw/D2D_Mem_BW:1": 1114.6, "mem-bw/D2D_Mem_BW:2": 1119.7, "mem-bw/D2D_Mem_BW:3": 1121.9, "mem-bw/D2D_Mem_BW:4": 1109.7, "mem-bw/D2D_Mem_BW:5": 1110.1, "mem-bw/D2D_Mem_BW:6": 1123.3, "mem-bw/D2D_Mem_BW:7": 1117.6, "nccl-bw/allreduce_8_busbw:0": 0.0, "nccl-bw/allreduce_8_algbw:0": 0.0, "nccl-bw/allreduce_8_time:0": 37.84, "nccl-bw/allreduce_16_busbw:0": 0.0, "nccl-bw/allreduce_16_algbw:0": 0.0, "nccl-bw/allreduce_16_time:0": 36.42, "nccl-bw/allreduce_32_busbw:0": 0.0, "nccl-bw/allreduce_32_algbw:0": 0.0, "nccl-bw/allreduce_32_time:0": 36.87, "nccl-bw/allreduce_64_busbw:0": 0.0, "nccl-bw/allreduce_64_algbw:0": 0.0, "nccl-bw/allreduce_64_time:0": 35.83, "nccl-bw/allreduce_128_busbw:0": 0.01, "nccl-bw/allreduce_128_algbw:0": 0.0, "nccl-bw/allreduce_128_time:0": 36.91, "nccl-bw/allreduce_256_busbw:0": 0.01, "nccl-bw/allreduce_256_algbw:0": 0.01, "nccl-bw/allreduce_256_time:0": 37.58, "nccl-bw/allreduce_512_busbw:0": 0.02, "nccl-bw/allreduce_512_algbw:0": 0.01, "nccl-bw/allreduce_512_time:0": 36.98, "nccl-bw/allreduce_1024_busbw:0": 0.05, "nccl-bw/allreduce_1024_algbw:0": 0.03, "nccl-bw/allreduce_1024_time:0": 36.93, "nccl-bw/allreduce_2048_busbw:0": 0.1, "nccl-bw/allreduce_2048_algbw:0": 0.06, "nccl-bw/allreduce_2048_time:0": 36.06, "nccl-bw/allreduce_4096_busbw:0": 0.19, "nccl-bw/allreduce_4096_algbw:0": 0.11, "nccl-bw/allreduce_4096_time:0": 37.2, "nccl-bw/allreduce_8192_busbw:0": 0.39, "nccl-bw/allreduce_8192_algbw:0": 0.22, "nccl-bw/allreduce_8192_time:0": 37.04, "nccl-bw/allreduce_16384_busbw:0": 0.77, "nccl-bw/allreduce_16384_algbw:0": 0.44, "nccl-bw/allreduce_16384_time:0": 37.46, "nccl-bw/allreduce_32768_busbw:0": 1.52, "nccl-bw/allreduce_32768_algbw:0": 0.87, "nccl-bw/allreduce_32768_time:0": 37.64, "nccl-bw/allreduce_65536_busbw:0": 3.0, "nccl-bw/allreduce_65536_algbw:0": 1.71, "nccl-bw/allreduce_65536_time:0": 38.22, "nccl-bw/allreduce_131072_busbw:0": 5.31, "nccl-bw/allreduce_131072_algbw:0": 3.04, "nccl-bw/allreduce_131072_time:0": 43.17, "nccl-bw/allreduce_262144_busbw:0": 9.5, "nccl-bw/allreduce_262144_algbw:0": 5.43, "nccl-bw/allreduce_262144_time:0": 48.29, "nccl-bw/allreduce_524288_busbw:0": 15.11, "nccl-bw/allreduce_524288_algbw:0": 8.64, "nccl-bw/allreduce_524288_time:0": 60.71, "nccl-bw/allreduce_1048576_busbw:0": 24.1, "nccl-bw/allreduce_1048576_algbw:0": 13.77, "nccl-bw/allreduce_1048576_time:0": 76.13, "nccl-bw/allreduce_2097152_busbw:0": 38.12, "nccl-bw/allreduce_2097152_algbw:0": 21.78, "nccl-bw/allreduce_2097152_time:0": 96.28, "nccl-bw/allreduce_4194304_busbw:0": 65.75, "nccl-bw/allreduce_4194304_algbw:0": 37.57, "nccl-bw/allreduce_4194304_time:0": 111.6, "nccl-bw/allreduce_8388608_busbw:0": 89.51, "nccl-bw/allreduce_8388608_algbw:0": 51.15, "nccl-bw/allreduce_8388608_time:0": 164.0, "nccl-bw/allreduce_16777216_busbw:0": 114.38, "nccl-bw/allreduce_16777216_algbw:0": 65.36, "nccl-bw/allreduce_16777216_time:0": 256.7, "nccl-bw/allreduce_33554432_busbw:0": 154.89, "nccl-bw/allreduce_33554432_algbw:0": 88.51, "nccl-bw/allreduce_33554432_time:0": 379.1, "nccl-bw/allreduce_67108864_busbw:0": 200.01, "nccl-bw/allreduce_67108864_algbw:0": 114.29, "nccl-bw/allreduce_67108864_time:0": 587.2, "nccl-bw/allreduce_134217728_busbw:0": 202.97, "nccl-bw/allreduce_134217728_algbw:0": 115.98, "nccl-bw/allreduce_134217728_time:0": 1157.2, "nccl-bw/allreduce_268435456_busbw:0": 221.82, "nccl-bw/allreduce_268435456_algbw:0": 126.75, "nccl-bw/allreduce_268435456_time:0": 2117.8, "nccl-bw/allreduce_536870912_busbw:0": 224.54, "nccl-bw/allreduce_536870912_algbw:0": 128.31, "nccl-bw/allreduce_536870912_time:0": 4184.2, "nccl-bw/allreduce_1073741824_busbw:0": 230.15, "nccl-bw/allreduce_1073741824_algbw:0": 131.51, "nccl-bw/allreduce_1073741824_time:0": 8164.5, "nccl-bw/allreduce_2147483648_busbw:0": 231.89, "nccl-bw/allreduce_2147483648_algbw:0": 132.51, "nccl-bw/allreduce_2147483648_time:0": 16207.0, "nccl-bw/allreduce_4294967296_busbw:0": 234.45, "nccl-bw/allreduce_4294967296_algbw:0": 133.97, "nccl-bw/allreduce_4294967296_time:0": 32059.0, "nccl-bw/allreduce_8589934592_busbw:0": 235.36, "nccl-bw/allreduce_8589934592_algbw:0": 134.49, "nccl-bw/allreduce_8589934592_time:0": 63870.0, "resnet_models/pytorch-resnet50/steptime_train_float32": 253.95522732287645, "resnet_models/pytorch-resnet50/throughput_train_float32": 760.3348099129964, "resnet_models/pytorch-resnet50/steptime_train_float16": 200.08606184273958, "resnet_models/pytorch-resnet50/throughput_train_float16": 971.0651430922575, "resnet_models/pytorch-resnet101/steptime_train_float32": 389.08605091273785, "resnet_models/pytorch-resnet101/throughput_train_float32": 496.11747409298965, "resnet_models/pytorch-resnet101/steptime_train_float16": 308.6274107918143, "resnet_models/pytorch-resnet101/throughput_train_float16": 627.2056272195069, "resnet_models/pytorch-resnet152/steptime_train_float32": 547.6558278314769, "resnet_models/pytorch-resnet152/throughput_train_float32": 352.07099543348215, "resnet_models/pytorch-resnet152/steptime_train_float16": 424.5809856802225, "resnet_models/pytorch-resnet152/throughput_train_float16": 454.8335998153649, "pytorch-sharding-matmul/allreduce": 10.574411869049072, "pytorch-sharding-matmul/allgather": 10.084696769714355, "vgg_models/pytorch-vgg11/steptime_train_float32": 40.35283671692014, "vgg_models/pytorch-vgg11/throughput_train_float32": 796.3615936949874, "vgg_models/pytorch-vgg11/steptime_train_float16": 24.133514845743775, "vgg_models/pytorch-vgg11/throughput_train_float16": 1330.411361458461, "vgg_models/pytorch-vgg13/steptime_train_float32": 55.466310936026275, "vgg_models/pytorch-vgg13/throughput_train_float32": 580.234107444399, "vgg_models/pytorch-vgg13/steptime_train_float16": 33.35228993091732, "vgg_models/pytorch-vgg13/throughput_train_float16": 962.5332023901524, "vgg_models/pytorch-vgg16/steptime_train_float32": 65.22519944701344, "vgg_models/pytorch-vgg16/throughput_train_float32": 493.4268638875934, "vgg_models/pytorch-vgg16/steptime_train_float16": 39.25287735182792, "vgg_models/pytorch-vgg16/throughput_train_float16": 817.2008546147621, "vgg_models/pytorch-vgg19/steptime_train_float32": 74.93487105239183, "vgg_models/pytorch-vgg19/throughput_train_float32": 429.80921583106164, "vgg_models/pytorch-vgg19/steptime_train_float16": 45.20330624654889, "vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377091}
{"node": "sb-validation-02","bert_models/pytorch-bert-base/steptime_train_float32": 114.59167010616511,"bert_models/pytorch-bert-base/throughput_train_float32": 279.8794623591105, "bert_models/pytorch-bert-base/steptime_train_float16": 83.88951083179563, "bert_models/pytorch-bert-base/throughput_train_float16": 382.0672582741963, "bert_models/pytorch-bert-large/steptime_train_float32": 307.9359371913597, "bert_models/pytorch-bert-large/throughput_train_float32": 103.94876097417632, "bert_models/pytorch-bert-large/steptime_train_float16": 206.81141689419746, "bert_models/pytorch-bert-large/throughput_train_float16": 154.84089117113942, "pytorch-computation-communication-overlap/mul_cost:0": 44.182206214372854, "pytorch-computation-communication-overlap/mul_cost:1": 44.18221393893873, "pytorch-computation-communication-overlap/mul_cost:2": 43.970147078084665, "pytorch-computation-communication-overlap/mul_cost:3": 43.97014787559783, "pytorch-computation-communication-overlap/mul_cost:4": 43.97017793166924, "pytorch-computation-communication-overlap/mul_cost:5": 43.97015716062924, "pytorch-computation-communication-overlap/mul_cost:6": 43.97016519828867, "pytorch-computation-communication-overlap/mul_cost:7": 44.179544478538446, "pytorch-computation-communication-overlap/matmul_cost:0": 137.04773705558182, "pytorch-computation-communication-overlap/matmul_cost:1": 137.0478344692856, "pytorch-computation-communication-overlap/matmul_cost:2": 137.04777220902997, "pytorch-computation-communication-overlap/matmul_cost:3": 137.04779697263803, "pytorch-computation-communication-overlap/matmul_cost:4": 137.04813674314664, "pytorch-computation-communication-overlap/matmul_cost:5": 137.04821988767435, "pytorch-computation-communication-overlap/matmul_cost:6": 137.04775322366913, "pytorch-computation-communication-overlap/matmul_cost:7": 137.0478081606734, "densenet_models/pytorch-densenet169/steptime_train_float32": 150.64155543223023, "densenet_models/pytorch-densenet169/throughput_train_float32": 212.47017192425312, "densenet_models/pytorch-densenet169/steptime_train_float16": 145.30819572973996, "densenet_models/pytorch-densenet169/throughput_train_float16": 220.293233730535, "densenet_models/pytorch-densenet201/steptime_train_float32": 182.91117786429822, "densenet_models/pytorch-densenet201/throughput_train_float32": 174.9742278232251, "densenet_models/pytorch-densenet201/steptime_train_float16": 176.3489063596353, "densenet_models/pytorch-densenet201/throughput_train_float16": 181.54465736033902, "gemm-flops/FP64:0": 9031.23, "gemm-flops/FP64:1": 9040.85, "gemm-flops/FP64:2": 9010.56, "gemm-flops/FP64:3": 9041.26, "gemm-flops/FP64:4": 9039.19, "gemm-flops/FP64:5": 9015.69, "gemm-flops/FP64:6": 9022.19, "gemm-flops/FP64:7": 9030.2, "gemm-flops/FP32:0": 18362.1, "gemm-flops/FP32:1": 18375.6, "gemm-flops/FP32:2": 18314.9, "gemm-flops/FP32:3": 18375.6, "gemm-flops/FP32:4": 18368.6, "gemm-flops/FP32:5": 18347.1, "gemm-flops/FP32:6": 18247.4, "gemm-flops/FP32:7": 18318.4, "gemm-flops/FP16:0": 33878.0, "gemm-flops/FP16:1": 33911.1, "gemm-flops/FP16:2": 33769.3, "gemm-flops/FP16:3": 33909.9, "gemm-flops/FP16:4": 33896.5, "gemm-flops/FP16:5": 33798.1, "gemm-flops/FP16:6": 33647.3, "gemm-flops/FP16:7": 33764.8, "gemm-flops/FP64_TC:0": 18963.6, "gemm-flops/FP64_TC:1": 18924.2, "gemm-flops/FP64_TC:2": 18930.3, "gemm-flops/FP64_TC:3": 18971.9, "gemm-flops/FP64_TC:4": 18946.0, "gemm-flops/FP64_TC:5": 18945.0, "gemm-flops/FP64_TC:6": 18822.9, "gemm-flops/FP64_TC:7": 18911.1, "gemm-flops/TF32_TC:0": 127900.0, "gemm-flops/TF32_TC:1": 129094.0, "gemm-flops/TF32_TC:2": 127831.0, "gemm-flops/TF32_TC:3": 128709.0, "gemm-flops/TF32_TC:4": 127388.0, "gemm-flops/TF32_TC:5": 127861.0, "gemm-flops/TF32_TC:6": 128492.0, "gemm-flops/TF32_TC:7": 127720.0, "gemm-flops/BF16_TC:0": 264965.0, "gemm-flops/BF16_TC:1": 266638.0, "gemm-flops/BF16_TC:2": 263151.0, "gemm-flops/BF16_TC:3": 264752.0, "gemm-flops/BF16_TC:4": 263049.0, "gemm-flops/BF16_TC:5": 266605.0, "gemm-flops/BF16_TC:6": 267501.0, "gemm-flops/BF16_TC:7": 263880.0, "gemm-flops/FP16_TC:0": 279474.0, "gemm-flops/FP16_TC:1": 281256.0, "gemm-flops/FP16_TC:2": 277403.0, "gemm-flops/FP16_TC:3": 279147.0, "gemm-flops/FP16_TC:4": 277587.0, "gemm-flops/FP16_TC:5": 281537.0, "gemm-flops/FP16_TC:6": 282132.0, "gemm-flops/FP16_TC:7": 277788.0, "gemm-flops/INT8_TC:0": 475160.0, "gemm-flops/INT8_TC:1": 477725.0, "gemm-flops/INT8_TC:2": 471621.0, "gemm-flops/INT8_TC:3": 473716.0, "gemm-flops/INT8_TC:4": 472124.0, "gemm-flops/INT8_TC:5": 479972.0, "gemm-flops/INT8_TC:6": 481327.0, "gemm-flops/INT8_TC:7": 474710.0, "gemm-flops/INT4_TC:0": 970330.0, "gemm-flops/INT4_TC:1": 976837.0, "gemm-flops/INT4_TC:2": 966003.0, "gemm-flops/INT4_TC:3": 971315.0, "gemm-flops/INT4_TC:4": 964441.0, "gemm-flops/INT4_TC:5": 982461.0, "gemm-flops/INT4_TC:6": 979610.0, "gemm-flops/INT4_TC:7": 968359.0, "gpt_models/pytorch-gpt2-large/steptime_train_float32": 295.0526971835643, "gpt_models/pytorch-gpt2-large/throughput_train_float32": 27.115454396866486, "gpt_models/pytorch-gpt2-large/steptime_train_float16": 194.4957742234692, "gpt_models/pytorch-gpt2-large/throughput_train_float16": 41.139449941061585, "gpu-sm-copy-bw/dtoh:0": 3.91755, "gpu-sm-copy-bw/dtoh:1": 4.45414, "gpu-sm-copy-bw/dtoh:2": 1.26483, "gpu-sm-copy-bw/dtoh:3": 1.30041, "gpu-sm-copy-bw/dtoh:4": 1.31577, "gpu-sm-copy-bw/dtoh:5": 1.27968, "gpu-sm-copy-bw/dtoh:6": 4.47849, "gpu-sm-copy-bw/dtoh:7": 3.96231, "gpu-sm-copy-bw/dtoh:8": 3.91705, "gpu-sm-copy-bw/dtoh:9": 4.45487, "gpu-sm-copy-bw/dtoh:10": 1.26352, "gpu-sm-copy-bw/dtoh:11": 1.2999, "gpu-sm-copy-bw/dtoh:12": 1.31677, "gpu-sm-copy-bw/dtoh:13": 1.27885, "gpu-sm-copy-bw/dtoh:14": 4.47913, "gpu-sm-copy-bw/dtoh:15": 3.95893, "gpu-sm-copy-bw/dtoh:16": 3.91729, "gpu-sm-copy-bw/dtoh:17": 4.45627, "gpu-sm-copy-bw/dtoh:18": 1.26437, "gpu-sm-copy-bw/dtoh:19": 1.30144, "gpu-sm-copy-bw/dtoh:20": 1.31704, "gpu-sm-copy-bw/dtoh:21": 1.27857, "gpu-sm-copy-bw/dtoh:22": 4.47889, "gpu-sm-copy-bw/dtoh:23": 3.95984, "gpu-sm-copy-bw/dtoh:24": 3.92025, "gpu-sm-copy-bw/dtoh:25": 4.45423, "gpu-sm-copy-bw/dtoh:26": 1.26449, "gpu-sm-copy-bw/dtoh:27": 1.29954, "gpu-sm-copy-bw/dtoh:28": 1.31731, "gpu-sm-copy-bw/dtoh:29": 1.27916, "gpu-sm-copy-bw/dtoh:30": 4.4797, "gpu-sm-copy-bw/dtoh:31": 3.96124, "gpu-sm-copy-bw/htod:0": 23.9685, "gpu-sm-copy-bw/htod:1": 23.967, "gpu-sm-copy-bw/htod:2": 19.9898, "gpu-sm-copy-bw/htod:3": 20.2848, "gpu-sm-copy-bw/htod:4": 20.3931, "gpu-sm-copy-bw/htod:5": 20.0888, "gpu-sm-copy-bw/htod:6": 23.9766, "gpu-sm-copy-bw/htod:7": 23.9792, "gpu-sm-copy-bw/htod:8": 23.9497, "gpu-sm-copy-bw/htod:9": 23.9438, "gpu-sm-copy-bw/htod:10": 20.0039, "gpu-sm-copy-bw/htod:11": 20.2469, "gpu-sm-copy-bw/htod:12": 20.4344, "gpu-sm-copy-bw/htod:13": 20.1005, "gpu-sm-copy-bw/htod:14": 23.9491, "gpu-sm-copy-bw/htod:15": 23.9898, "gpu-sm-copy-bw/htod:16": 23.962, "gpu-sm-copy-bw/htod:17": 23.97, "gpu-sm-copy-bw/htod:18": 19.987, "gpu-sm-copy-bw/htod:19": 20.1549, "gpu-sm-copy-bw/htod:20": 20.3931, "gpu-sm-copy-bw/htod:21": 20.1013, "gpu-sm-copy-bw/htod:22": 23.9865, "gpu-sm-copy-bw/htod:23": 23.972, "gpu-sm-copy-bw/htod:24": 23.9569, "gpu-sm-copy-bw/htod:25": 23.9762, "gpu-sm-copy-bw/htod:26": 19.92, "gpu-sm-copy-bw/htod:27": 20.2638, "gpu-sm-copy-bw/htod:28": 20.4419, "gpu-sm-copy-bw/htod:29": 20.1054, "gpu-sm-copy-bw/htod:30": 23.9752, "gpu-sm-copy-bw/htod:31": 23.9631, "ib-loopback/IB_write_512_Avg_0:0": 1492.89, "ib-loopback/IB_write_1024_Avg_0:0": 3224.92, "ib-loopback/IB_write_2048_Avg_0:0": 6714.5, "ib-loopback/IB_write_4096_Avg_0:0": 12871.93, "ib-loopback/IB_write_8192_Avg_0:0": 19990.78, "ib-loopback/IB_write_16384_Avg_0:0": 22172.25, "ib-loopback/IB_write_32768_Avg_0:0": 23073.19, "ib-loopback/IB_write_65536_Avg_0:0": 23527.09, "ib-loopback/IB_write_131072_Avg_0:0": 23805.92, "ib-loopback/IB_write_262144_Avg_0:0": 23380.51, "ib-loopback/IB_write_524288_Avg_0:0": 23856.63, "ib-loopback/IB_write_1048576_Avg_0:0": 23869.58, "ib-loopback/IB_write_2097152_Avg_0:0": 23885.7, "ib-loopback/IB_write_4194304_Avg_0:0": 23914.56, "ib-loopback/IB_write_8388608_Avg_0:0": 23935.21, "ib-loopback/IB_write_512_Avg_1:0": 1479.1, "ib-loopback/IB_write_1024_Avg_1:0": 3246.48, "ib-loopback/IB_write_2048_Avg_1:0": 6754.94, "ib-loopback/IB_write_4096_Avg_1:0": 13101.09, "ib-loopback/IB_write_8192_Avg_1:0": 19945.81, "ib-loopback/IB_write_16384_Avg_1:0": 22631.62, "ib-loopback/IB_write_32768_Avg_1:0": 23381.58, "ib-loopback/IB_write_65536_Avg_1:0": 23515.29, "ib-loopback/IB_write_131072_Avg_1:0": 23756.01, "ib-loopback/IB_write_262144_Avg_1:0": 23875.91, "ib-loopback/IB_write_524288_Avg_1:0": 23911.13, "ib-loopback/IB_write_1048576_Avg_1:0": 23935.42, "ib-loopback/IB_write_2097152_Avg_1:0": 23941.25, "ib-loopback/IB_write_4194304_Avg_1:0": 23922.06, "ib-loopback/IB_write_8388608_Avg_1:0": 23928.59, "ib-loopback/IB_write_512_Avg_2:0": 1505.29, "ib-loopback/IB_write_1024_Avg_2:0": 3215.97, "ib-loopback/IB_write_2048_Avg_2:0": 6745.49, "ib-loopback/IB_write_4096_Avg_2:0": 11548.88, "ib-loopback/IB_write_8192_Avg_2:0": 19432.15, "ib-loopback/IB_write_16384_Avg_2:0": 22765.51, "ib-loopback/IB_write_32768_Avg_2:0": 23235.07, "ib-loopback/IB_write_65536_Avg_2:0": 23620.08, "ib-loopback/IB_write_131072_Avg_2:0": 23759.08, "ib-loopback/IB_write_262144_Avg_2:0": 23859.82, "ib-loopback/IB_write_524288_Avg_2:0": 23775.01, "ib-loopback/IB_write_1048576_Avg_2:0": 23885.96, "ib-loopback/IB_write_2097152_Avg_2:0": 23894.73, "ib-loopback/IB_write_4194304_Avg_2:0": 23909.98, "ib-loopback/IB_write_8388608_Avg_2:0": 23927.21, "ib-loopback/IB_write_512_Avg_3:0": 1474.47, "ib-loopback/IB_write_1024_Avg_3:0": 3344.68, "ib-loopback/IB_write_2048_Avg_3:0": 6606.35, "ib-loopback/IB_write_4096_Avg_3:0": 12071.4, "ib-loopback/IB_write_8192_Avg_3:0": 18408.33, "ib-loopback/IB_write_16384_Avg_3:0": 20789.27, "ib-loopback/IB_write_32768_Avg_3:0": 22469.43, "ib-loopback/IB_write_65536_Avg_3:0": 22777.1, "ib-loopback/IB_write_131072_Avg_3:0": 23461.03, "ib-loopback/IB_write_262144_Avg_3:0": 23397.19, "ib-loopback/IB_write_524288_Avg_3:0": 23526.55, "ib-loopback/IB_write_1048576_Avg_3:0": 23854.76, "ib-loopback/IB_write_2097152_Avg_3:0": 23862.23, "ib-loopback/IB_write_4194304_Avg_3:0": 23931.15, "ib-loopback/IB_write_8388608_Avg_3:0": 23924.44, "ib-loopback/IB_write_512_Avg_4:0": 1523.33, "ib-loopback/IB_write_1024_Avg_4:0": 3233.23, "ib-loopback/IB_write_2048_Avg_4:0": 6792.88, "ib-loopback/IB_write_4096_Avg_4:0": 12616.05, "ib-loopback/IB_write_8192_Avg_4:0": 19324.05, "ib-loopback/IB_write_16384_Avg_4:0": 22082.51, "ib-loopback/IB_write_32768_Avg_4:0": 23294.23, "ib-loopback/IB_write_65536_Avg_4:0": 23546.22, "ib-loopback/IB_write_131072_Avg_4:0": 23727.91, "ib-loopback/IB_write_262144_Avg_4:0": 23843.93, "ib-loopback/IB_write_524288_Avg_4:0": 23905.96, "ib-loopback/IB_write_1048576_Avg_4:0": 23902.37, "ib-loopback/IB_write_2097152_Avg_4:0": 23921.03, "ib-loopback/IB_write_4194304_Avg_4:0": 23921.06, "ib-loopback/IB_write_8388608_Avg_4:0": 23922.4, "ib-loopback/IB_write_512_Avg_5:0": 1506.63, "ib-loopback/IB_write_1024_Avg_5:0": 3261.71, "ib-loopback/IB_write_2048_Avg_5:0": 6752.38, "ib-loopback/IB_write_4096_Avg_5:0": 13592.41, "ib-loopback/IB_write_8192_Avg_5:0": 19989.31, "ib-loopback/IB_write_16384_Avg_5:0": 22560.57, "ib-loopback/IB_write_32768_Avg_5:0": 23389.29, "ib-loopback/IB_write_65536_Avg_5:0": 23503.32, "ib-loopback/IB_write_131072_Avg_5:0": 23741.51, "ib-loopback/IB_write_262144_Avg_5:0": 23866.43, "ib-loopback/IB_write_524288_Avg_5:0": 23898.95, "ib-loopback/IB_write_1048576_Avg_5:0": 23876.36, "ib-loopback/IB_write_2097152_Avg_5:0": 23919.63, "ib-loopback/IB_write_4194304_Avg_5:0": 23924.68, "ib-loopback/IB_write_8388608_Avg_5:0": 23930.37, "ib-loopback/IB_write_512_Avg_6:0": 1467.69, "ib-loopback/IB_write_1024_Avg_6:0": 3157.04, "ib-loopback/IB_write_2048_Avg_6:0": 6494.61, "ib-loopback/IB_write_4096_Avg_6:0": 12883.51, "ib-loopback/IB_write_8192_Avg_6:0": 19207.67, "ib-loopback/IB_write_16384_Avg_6:0": 22519.39, "ib-loopback/IB_write_32768_Avg_6:0": 23323.46, "ib-loopback/IB_write_65536_Avg_6:0": 23523.6, "ib-loopback/IB_write_131072_Avg_6:0": 23626.67, "ib-loopback/IB_write_262144_Avg_6:0": 23836.99, "ib-loopback/IB_write_524288_Avg_6:0": 23904.51, "ib-loopback/IB_write_1048576_Avg_6:0": 23919.07, "ib-loopback/IB_write_2097152_Avg_6:0": 23943.82, "ib-loopback/IB_write_4194304_Avg_6:0": 23936.77, "ib-loopback/IB_write_8388608_Avg_6:0": 23941.57, "ib-loopback/IB_write_512_Avg_7:0": 1505.63, "ib-loopback/IB_write_1024_Avg_7:0": 3259.93, "ib-loopback/IB_write_2048_Avg_7:0": 6738.6, "ib-loopback/IB_write_4096_Avg_7:0": 13352.06, "ib-loopback/IB_write_8192_Avg_7:0": 19941.35, "ib-loopback/IB_write_16384_Avg_7:0": 22566.09, "ib-loopback/IB_write_32768_Avg_7:0": 23244.77, "ib-loopback/IB_write_65536_Avg_7:0": 23377.67, "ib-loopback/IB_write_131072_Avg_7:0": 23736.17, "ib-loopback/IB_write_262144_Avg_7:0": 23829.25, "ib-loopback/IB_write_524288_Avg_7:0": 23879.6, "ib-loopback/IB_write_1048576_Avg_7:0": 23895.1, "ib-loopback/IB_write_2097152_Avg_7:0": 23930.64, "ib-loopback/IB_write_4194304_Avg_7:0": 23845.63, "ib-loopback/IB_write_8388608_Avg_7:0": 23896.94, "kernel-launch/return_code":0, "kernel-launch/event_overhead:0": 0.00595, "kernel-launch/event_overhead:1": 0.00595, "kernel-launch/event_overhead:2": 0.00557, "kernel-launch/event_overhead:3": 0.0055, "kernel-launch/event_overhead:4": 0.00592, "kernel-launch/event_overhead:5": 0.00589, "kernel-launch/event_overhead:6": 0.00572, "kernel-launch/event_overhead:7": 0.0059, "kernel-launch/wall_overhead:0": 0.01026, "kernel-launch/wall_overhead:1": 0.01026, "kernel-launch/wall_overhead:2": 0.01046, "kernel-launch/wall_overhead:3": 0.01049, "kernel-launch/wall_overhead:4": 0.01063, "kernel-launch/wall_overhead:5": 0.01006, "kernel-launch/wall_overhead:6": 0.01045, "kernel-launch/wall_overhead:7": 0.01071, "lstm_models/pytorch-lstm/steptime_train_float32": 48.07024518959224, "lstm_models/pytorch-lstm/throughput_train_float32": 4806.472441132788, "lstm_models/pytorch-lstm/steptime_train_float16": 25.95312986522913, "lstm_models/pytorch-lstm/throughput_train_float16": 9069.90809255883, "pytorch-matmul/nosharding": 34.64499759674072, "mem-bw/return_code":0, "mem-bw/H2D_Mem_BW:0": 25.6, "mem-bw/H2D_Mem_BW:1": 25.8, "mem-bw/H2D_Mem_BW:2": 26.0, "mem-bw/H2D_Mem_BW:3": 26.1, "mem-bw/H2D_Mem_BW:4": 26.2, "mem-bw/H2D_Mem_BW:5": 25.8, "mem-bw/H2D_Mem_BW:6": 25.3, "mem-bw/H2D_Mem_BW:7": 26.1, "mem-bw/D2H_Mem_BW:0": 24.3, "mem-bw/D2H_Mem_BW:1": 24.6, "mem-bw/D2H_Mem_BW:2": 24.5, "mem-bw/D2H_Mem_BW:3": 24.6, "mem-bw/D2H_Mem_BW:4": 24.3, "mem-bw/D2H_Mem_BW:5": 24.3, "mem-bw/D2H_Mem_BW:6": 23.9, "mem-bw/D2H_Mem_BW:7": 24.6, "mem-bw/D2D_Mem_BW:0": 1118.0, "mem-bw/D2D_Mem_BW:1": 1114.6, "mem-bw/D2D_Mem_BW:2": 1119.7, "mem-bw/D2D_Mem_BW:3": 1121.9, "mem-bw/D2D_Mem_BW:4": 1109.7, "mem-bw/D2D_Mem_BW:5": 1110.1, "mem-bw/D2D_Mem_BW:6": 1123.3, "mem-bw/D2D_Mem_BW:7": 1117.6, "nccl-bw/allreduce_8_busbw:0": 0.0, "nccl-bw/allreduce_8_algbw:0": 0.0, "nccl-bw/allreduce_8_time:0": 37.84, "nccl-bw/allreduce_16_busbw:0": 0.0, "nccl-bw/allreduce_16_algbw:0": 0.0, "nccl-bw/allreduce_16_time:0": 36.42, "nccl-bw/allreduce_32_busbw:0": 0.0, "nccl-bw/allreduce_32_algbw:0": 0.0, "nccl-bw/allreduce_32_time:0": 36.87, "nccl-bw/allreduce_64_busbw:0": 0.0, "nccl-bw/allreduce_64_algbw:0": 0.0, "nccl-bw/allreduce_64_time:0": 35.83, "nccl-bw/allreduce_128_busbw:0": 0.01, "nccl-bw/allreduce_128_algbw:0": 0.0, "nccl-bw/allreduce_128_time:0": 36.91, "nccl-bw/allreduce_256_busbw:0": 0.01, "nccl-bw/allreduce_256_algbw:0": 0.01, "nccl-bw/allreduce_256_time:0": 37.58, "nccl-bw/allreduce_512_busbw:0": 0.02, "nccl-bw/allreduce_512_algbw:0": 0.01, "nccl-bw/allreduce_512_time:0": 36.98, "nccl-bw/allreduce_1024_busbw:0": 0.05, "nccl-bw/allreduce_1024_algbw:0": 0.03, "nccl-bw/allreduce_1024_time:0": 36.93, "nccl-bw/allreduce_2048_busbw:0": 0.1, "nccl-bw/allreduce_2048_algbw:0": 0.06, "nccl-bw/allreduce_2048_time:0": 36.06, "nccl-bw/allreduce_4096_busbw:0": 0.19, "nccl-bw/allreduce_4096_algbw:0": 0.11, "nccl-bw/allreduce_4096_time:0": 37.2, "nccl-bw/allreduce_8192_busbw:0": 0.39, "nccl-bw/allreduce_8192_algbw:0": 0.22, "nccl-bw/allreduce_8192_time:0": 37.04, "nccl-bw/allreduce_16384_busbw:0": 0.77, "nccl-bw/allreduce_16384_algbw:0": 0.44, "nccl-bw/allreduce_16384_time:0": 37.46, "nccl-bw/allreduce_32768_busbw:0": 1.52, "nccl-bw/allreduce_32768_algbw:0": 0.87, "nccl-bw/allreduce_32768_time:0": 37.64, "nccl-bw/allreduce_65536_busbw:0": 3.0, "nccl-bw/allreduce_65536_algbw:0": 1.71, "nccl-bw/allreduce_65536_time:0": 38.22, "nccl-bw/allreduce_131072_busbw:0": 5.31, "nccl-bw/allreduce_131072_algbw:0": 3.04, "nccl-bw/allreduce_131072_time:0": 43.17, "nccl-bw/allreduce_262144_busbw:0": 9.5, "nccl-bw/allreduce_262144_algbw:0": 5.43, "nccl-bw/allreduce_262144_time:0": 48.29, "nccl-bw/allreduce_524288_busbw:0": 15.11, "nccl-bw/allreduce_524288_algbw:0": 8.64, "nccl-bw/allreduce_524288_time:0": 60.71, "nccl-bw/allreduce_1048576_busbw:0": 24.1, "nccl-bw/allreduce_1048576_algbw:0": 13.77, "nccl-bw/allreduce_1048576_time:0": 76.13, "nccl-bw/allreduce_2097152_busbw:0": 38.12, "nccl-bw/allreduce_2097152_algbw:0": 21.78, "nccl-bw/allreduce_2097152_time:0": 96.28, "nccl-bw/allreduce_4194304_busbw:0": 65.75, "nccl-bw/allreduce_4194304_algbw:0": 37.57, "nccl-bw/allreduce_4194304_time:0": 111.6, "nccl-bw/allreduce_8388608_busbw:0": 89.51, "nccl-bw/allreduce_8388608_algbw:0": 51.15, "nccl-bw/allreduce_8388608_time:0": 164.0, "nccl-bw/allreduce_16777216_busbw:0": 114.38, "nccl-bw/allreduce_16777216_algbw:0": 65.36, "nccl-bw/allreduce_16777216_time:0": 256.7, "nccl-bw/allreduce_33554432_busbw:0": 154.89, "nccl-bw/allreduce_33554432_algbw:0": 88.51, "nccl-bw/allreduce_33554432_time:0": 379.1, "nccl-bw/allreduce_67108864_busbw:0": 200.01, "nccl-bw/allreduce_67108864_algbw:0": 114.29, "nccl-bw/allreduce_67108864_time:0": 587.2, "nccl-bw/allreduce_134217728_busbw:0": 202.97, "nccl-bw/allreduce_134217728_algbw:0": 115.98, "nccl-bw/allreduce_134217728_time:0": 1157.2, "nccl-bw/allreduce_268435456_busbw:0": 221.82, "nccl-bw/allreduce_268435456_algbw:0": 126.75, "nccl-bw/allreduce_268435456_time:0": 2117.8, "nccl-bw/allreduce_536870912_busbw:0": 224.54, "nccl-bw/allreduce_536870912_algbw:0": 128.31, "nccl-bw/allreduce_536870912_time:0": 4184.2, "nccl-bw/allreduce_1073741824_busbw:0": 230.15, "nccl-bw/allreduce_1073741824_algbw:0": 131.51, "nccl-bw/allreduce_1073741824_time:0": 8164.5, "nccl-bw/allreduce_2147483648_busbw:0": 231.89, "nccl-bw/allreduce_2147483648_algbw:0": 132.51, "nccl-bw/allreduce_2147483648_time:0": 16207.0, "nccl-bw/allreduce_4294967296_busbw:0": 234.45, "nccl-bw/allreduce_4294967296_algbw:0": 133.97, "nccl-bw/allreduce_4294967296_time:0": 32059.0, "nccl-bw/allreduce_8589934592_busbw:0": 235.36, "nccl-bw/allreduce_8589934592_algbw:0": 134.49, "nccl-bw/allreduce_8589934592_time:0": 63870.0, "resnet_models/pytorch-resnet50/steptime_train_float32": 253.95522732287645, "resnet_models/pytorch-resnet50/throughput_train_float32": 760.3348099129964, "resnet_models/pytorch-resnet50/steptime_train_float16": 200.08606184273958, "resnet_models/pytorch-resnet50/throughput_train_float16": 971.0651430922575, "resnet_models/pytorch-resnet101/steptime_train_float32": 389.08605091273785, "resnet_models/pytorch-resnet101/throughput_train_float32": 496.11747409298965, "resnet_models/pytorch-resnet101/steptime_train_float16": 308.6274107918143, "resnet_models/pytorch-resnet101/throughput_train_float16": 627.2056272195069, "resnet_models/pytorch-resnet152/steptime_train_float32": 547.6558278314769, "resnet_models/pytorch-resnet152/throughput_train_float32": 352.07099543348215, "resnet_models/pytorch-resnet152/steptime_train_float16": 424.5809856802225, "resnet_models/pytorch-resnet152/throughput_train_float16": 454.8335998153649, "pytorch-sharding-matmul/allreduce": 10.574411869049072, "pytorch-sharding-matmul/allgather": 10.084696769714355, "vgg_models/pytorch-vgg11/steptime_train_float32": 40.35283671692014, "vgg_models/pytorch-vgg11/throughput_train_float32": 796.3615936949874, "vgg_models/pytorch-vgg11/steptime_train_float16": 24.133514845743775, "vgg_models/pytorch-vgg11/throughput_train_float16": 1330.411361458461, "vgg_models/pytorch-vgg13/steptime_train_float32": 55.466310936026275, "vgg_models/pytorch-vgg13/throughput_train_float32": 580.234107444399, "vgg_models/pytorch-vgg13/steptime_train_float16": 33.35228993091732, "vgg_models/pytorch-vgg13/throughput_train_float16": 962.5332023901524, "vgg_models/pytorch-vgg16/steptime_train_float32": 65.22519944701344, "vgg_models/pytorch-vgg16/throughput_train_float32": 493.4268638875934, "vgg_models/pytorch-vgg16/steptime_train_float16": 39.25287735182792, "vgg_models/pytorch-vgg16/throughput_train_float16": 817.2008546147621, "vgg_models/pytorch-vgg19/steptime_train_float32": 74.93487105239183, "vgg_models/pytorch-vgg19/throughput_train_float32": 429.80921583106164, "vgg_models/pytorch-vgg19/steptime_train_float16": 45.20330624654889, "vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377091} {"node": "sb-validation-02","bert_models/pytorch-bert-base/steptime_train_float32": 114.59167010616511,"bert_models/pytorch-bert-base/throughput_train_float32": 279.8794623591105, "bert_models/pytorch-bert-base/steptime_train_float16": 83.88951083179563, "bert_models/pytorch-bert-base/throughput_train_float16": 382.0672582741963, "bert_models/pytorch-bert-large/steptime_train_float32": 307.9359371913597, "bert_models/pytorch-bert-large/throughput_train_float32": 103.94876097417632, "bert_models/pytorch-bert-large/steptime_train_float16": 206.81141689419746, "bert_models/pytorch-bert-large/throughput_train_float16": 154.84089117113942, "pytorch-computation-communication-overlap/mul_cost:0": 44.182206214372854, "pytorch-computation-communication-overlap/mul_cost:1": 44.18221393893873, "pytorch-computation-communication-overlap/mul_cost:2": 43.970147078084665, "pytorch-computation-communication-overlap/mul_cost:3": 43.97014787559783, "pytorch-computation-communication-overlap/mul_cost:4": 43.97017793166924, "pytorch-computation-communication-overlap/mul_cost:5": 43.97015716062924, "pytorch-computation-communication-overlap/mul_cost:6": 43.97016519828867, "pytorch-computation-communication-overlap/mul_cost:7": 44.179544478538446, "pytorch-computation-communication-overlap/matmul_cost:0": 137.04773705558182, "pytorch-computation-communication-overlap/matmul_cost:1": 137.0478344692856, "pytorch-computation-communication-overlap/matmul_cost:2": 137.04777220902997, "pytorch-computation-communication-overlap/matmul_cost:3": 137.04779697263803, "pytorch-computation-communication-overlap/matmul_cost:4": 137.04813674314664, "pytorch-computation-communication-overlap/matmul_cost:5": 137.04821988767435, "pytorch-computation-communication-overlap/matmul_cost:6": 137.04775322366913, "pytorch-computation-communication-overlap/matmul_cost:7": 137.0478081606734, "densenet_models/pytorch-densenet169/steptime_train_float32": 150.64155543223023, "densenet_models/pytorch-densenet169/throughput_train_float32": 212.47017192425312, "densenet_models/pytorch-densenet169/steptime_train_float16": 145.30819572973996, "densenet_models/pytorch-densenet169/throughput_train_float16": 220.293233730535, "densenet_models/pytorch-densenet201/steptime_train_float32": 182.91117786429822, "densenet_models/pytorch-densenet201/throughput_train_float32": 174.9742278232251, "densenet_models/pytorch-densenet201/steptime_train_float16": 176.3489063596353, "densenet_models/pytorch-densenet201/throughput_train_float16": 181.54465736033902, "gemm-flops/FP64:0": 9031.23, "gemm-flops/FP64:1": 9040.85, "gemm-flops/FP64:2": 9010.56, "gemm-flops/FP64:3": 9041.26, "gemm-flops/FP64:4": 9039.19, "gemm-flops/FP64:5": 9015.69, "gemm-flops/FP64:6": 9022.19, "gemm-flops/FP64:7": 9030.2, "gemm-flops/FP32:0": 18362.1, "gemm-flops/FP32:1": 18375.6, "gemm-flops/FP32:2": 18314.9, "gemm-flops/FP32:3": 18375.6, "gemm-flops/FP32:4": 18368.6, "gemm-flops/FP32:5": 18347.1, "gemm-flops/FP32:6": 18247.4, "gemm-flops/FP32:7": 18318.4, "gemm-flops/FP16:0": 33878.0, "gemm-flops/FP16:1": 33911.1, "gemm-flops/FP16:2": 33769.3, "gemm-flops/FP16:3": 33909.9, "gemm-flops/FP16:4": 33896.5, "gemm-flops/FP16:5": 33798.1, "gemm-flops/FP16:6": 33647.3, "gemm-flops/FP16:7": 33764.8, "gemm-flops/FP64_TC:0": 18963.6, "gemm-flops/FP64_TC:1": 18924.2, "gemm-flops/FP64_TC:2": 18930.3, "gemm-flops/FP64_TC:3": 18971.9, "gemm-flops/FP64_TC:4": 18946.0, "gemm-flops/FP64_TC:5": 18945.0, "gemm-flops/FP64_TC:6": 18822.9, "gemm-flops/FP64_TC:7": 18911.1, "gemm-flops/TF32_TC:0": 127900.0, "gemm-flops/TF32_TC:1": 129094.0, "gemm-flops/TF32_TC:2": 127831.0, "gemm-flops/TF32_TC:3": 128709.0, "gemm-flops/TF32_TC:4": 127388.0, "gemm-flops/TF32_TC:5": 127861.0, "gemm-flops/TF32_TC:6": 128492.0, "gemm-flops/TF32_TC:7": 127720.0, "gemm-flops/BF16_TC:0": 264965.0, "gemm-flops/BF16_TC:1": 266638.0, "gemm-flops/BF16_TC:2": 263151.0, "gemm-flops/BF16_TC:3": 264752.0, "gemm-flops/BF16_TC:4": 263049.0, "gemm-flops/BF16_TC:5": 266605.0, "gemm-flops/BF16_TC:6": 267501.0, "gemm-flops/BF16_TC:7": 263880.0, "gemm-flops/FP16_TC:0": 279474.0, "gemm-flops/FP16_TC:1": 281256.0, "gemm-flops/FP16_TC:2": 277403.0, "gemm-flops/FP16_TC:3": 279147.0, "gemm-flops/FP16_TC:4": 277587.0, "gemm-flops/FP16_TC:5": 281537.0, "gemm-flops/FP16_TC:6": 282132.0, "gemm-flops/FP16_TC:7": 277788.0, "gemm-flops/INT8_TC:0": 475160.0, "gemm-flops/INT8_TC:1": 477725.0, "gemm-flops/INT8_TC:2": 471621.0, "gemm-flops/INT8_TC:3": 473716.0, "gemm-flops/INT8_TC:4": 472124.0, "gemm-flops/INT8_TC:5": 479972.0, "gemm-flops/INT8_TC:6": 481327.0, "gemm-flops/INT8_TC:7": 474710.0, "gemm-flops/INT4_TC:0": 970330.0, "gemm-flops/INT4_TC:1": 976837.0, "gemm-flops/INT4_TC:2": 966003.0, "gemm-flops/INT4_TC:3": 971315.0, "gemm-flops/INT4_TC:4": 964441.0, "gemm-flops/INT4_TC:5": 982461.0, "gemm-flops/INT4_TC:6": 979610.0, "gemm-flops/INT4_TC:7": 968359.0, "gpt_models/pytorch-gpt2-large/steptime_train_float32": 295.0526971835643, "gpt_models/pytorch-gpt2-large/throughput_train_float32": 27.115454396866486, "gpt_models/pytorch-gpt2-large/steptime_train_float16": 194.4957742234692, "gpt_models/pytorch-gpt2-large/throughput_train_float16": 41.139449941061585, "gpu-sm-copy-bw/dtoh:0": 3.91755, "gpu-sm-copy-bw/dtoh:1": 4.45414, "gpu-sm-copy-bw/dtoh:2": 1.26483, "gpu-sm-copy-bw/dtoh:3": 1.30041, "gpu-sm-copy-bw/dtoh:4": 1.31577, "gpu-sm-copy-bw/dtoh:5": 1.27968, "gpu-sm-copy-bw/dtoh:6": 4.47849, "gpu-sm-copy-bw/dtoh:7": 3.96231, "gpu-sm-copy-bw/dtoh:8": 3.91705, "gpu-sm-copy-bw/dtoh:9": 4.45487, "gpu-sm-copy-bw/dtoh:10": 1.26352, "gpu-sm-copy-bw/dtoh:11": 1.2999, "gpu-sm-copy-bw/dtoh:12": 1.31677, "gpu-sm-copy-bw/dtoh:13": 1.27885, "gpu-sm-copy-bw/dtoh:14": 4.47913, "gpu-sm-copy-bw/dtoh:15": 3.95893, "gpu-sm-copy-bw/dtoh:16": 3.91729, "gpu-sm-copy-bw/dtoh:17": 4.45627, "gpu-sm-copy-bw/dtoh:18": 1.26437, "gpu-sm-copy-bw/dtoh:19": 1.30144, "gpu-sm-copy-bw/dtoh:20": 1.31704, "gpu-sm-copy-bw/dtoh:21": 1.27857, "gpu-sm-copy-bw/dtoh:22": 4.47889, "gpu-sm-copy-bw/dtoh:23": 3.95984, "gpu-sm-copy-bw/dtoh:24": 3.92025, "gpu-sm-copy-bw/dtoh:25": 4.45423, "gpu-sm-copy-bw/dtoh:26": 1.26449, "gpu-sm-copy-bw/dtoh:27": 1.29954, "gpu-sm-copy-bw/dtoh:28": 1.31731, "gpu-sm-copy-bw/dtoh:29": 1.27916, "gpu-sm-copy-bw/dtoh:30": 4.4797, "gpu-sm-copy-bw/dtoh:31": 3.96124, "gpu-sm-copy-bw/htod:0": 23.9685, "gpu-sm-copy-bw/htod:1": 23.967, "gpu-sm-copy-bw/htod:2": 19.9898, "gpu-sm-copy-bw/htod:3": 20.2848, "gpu-sm-copy-bw/htod:4": 20.3931, "gpu-sm-copy-bw/htod:5": 20.0888, "gpu-sm-copy-bw/htod:6": 23.9766, "gpu-sm-copy-bw/htod:7": 23.9792, "gpu-sm-copy-bw/htod:8": 23.9497, "gpu-sm-copy-bw/htod:9": 23.9438, "gpu-sm-copy-bw/htod:10": 20.0039, "gpu-sm-copy-bw/htod:11": 20.2469, "gpu-sm-copy-bw/htod:12": 20.4344, "gpu-sm-copy-bw/htod:13": 20.1005, "gpu-sm-copy-bw/htod:14": 23.9491, "gpu-sm-copy-bw/htod:15": 23.9898, "gpu-sm-copy-bw/htod:16": 23.962, "gpu-sm-copy-bw/htod:17": 23.97, "gpu-sm-copy-bw/htod:18": 19.987, "gpu-sm-copy-bw/htod:19": 20.1549, "gpu-sm-copy-bw/htod:20": 20.3931, "gpu-sm-copy-bw/htod:21": 20.1013, "gpu-sm-copy-bw/htod:22": 23.9865, "gpu-sm-copy-bw/htod:23": 23.972, "gpu-sm-copy-bw/htod:24": 23.9569, "gpu-sm-copy-bw/htod:25": 23.9762, "gpu-sm-copy-bw/htod:26": 19.92, "gpu-sm-copy-bw/htod:27": 20.2638, "gpu-sm-copy-bw/htod:28": 20.4419, "gpu-sm-copy-bw/htod:29": 20.1054, "gpu-sm-copy-bw/htod:30": 23.9752, "gpu-sm-copy-bw/htod:31": 23.9631, "ib-loopback/IB_write_512_Avg_0:0": 1492.89, "ib-loopback/IB_write_1024_Avg_0:0": 3224.92, "ib-loopback/IB_write_2048_Avg_0:0": 6714.5, "ib-loopback/IB_write_4096_Avg_0:0": 12871.93, "ib-loopback/IB_write_8192_Avg_0:0": 19990.78, "ib-loopback/IB_write_16384_Avg_0:0": 22172.25, "ib-loopback/IB_write_32768_Avg_0:0": 23073.19, "ib-loopback/IB_write_65536_Avg_0:0": 23527.09, "ib-loopback/IB_write_131072_Avg_0:0": 23805.92, "ib-loopback/IB_write_262144_Avg_0:0": 23380.51, "ib-loopback/IB_write_524288_Avg_0:0": 23856.63, "ib-loopback/IB_write_1048576_Avg_0:0": 23869.58, "ib-loopback/IB_write_2097152_Avg_0:0": 23885.7, "ib-loopback/IB_write_4194304_Avg_0:0": 23914.56, "ib-loopback/IB_write_8388608_Avg_0:0": 23935.21, "ib-loopback/IB_write_512_Avg_1:0": 1479.1, "ib-loopback/IB_write_1024_Avg_1:0": 3246.48, "ib-loopback/IB_write_2048_Avg_1:0": 6754.94, "ib-loopback/IB_write_4096_Avg_1:0": 13101.09, "ib-loopback/IB_write_8192_Avg_1:0": 19945.81, "ib-loopback/IB_write_16384_Avg_1:0": 22631.62, "ib-loopback/IB_write_32768_Avg_1:0": 23381.58, "ib-loopback/IB_write_65536_Avg_1:0": 23515.29, "ib-loopback/IB_write_131072_Avg_1:0": 23756.01, "ib-loopback/IB_write_262144_Avg_1:0": 23875.91, "ib-loopback/IB_write_524288_Avg_1:0": 23911.13, "ib-loopback/IB_write_1048576_Avg_1:0": 23935.42, "ib-loopback/IB_write_2097152_Avg_1:0": 23941.25, "ib-loopback/IB_write_4194304_Avg_1:0": 23922.06, "ib-loopback/IB_write_8388608_Avg_1:0": 23928.59, "ib-loopback/IB_write_512_Avg_2:0": 1505.29, "ib-loopback/IB_write_1024_Avg_2:0": 3215.97, "ib-loopback/IB_write_2048_Avg_2:0": 6745.49, "ib-loopback/IB_write_4096_Avg_2:0": 11548.88, "ib-loopback/IB_write_8192_Avg_2:0": 19432.15, "ib-loopback/IB_write_16384_Avg_2:0": 22765.51, "ib-loopback/IB_write_32768_Avg_2:0": 23235.07, "ib-loopback/IB_write_65536_Avg_2:0": 23620.08, "ib-loopback/IB_write_131072_Avg_2:0": 23759.08, "ib-loopback/IB_write_262144_Avg_2:0": 23859.82, "ib-loopback/IB_write_524288_Avg_2:0": 23775.01, "ib-loopback/IB_write_1048576_Avg_2:0": 23885.96, "ib-loopback/IB_write_2097152_Avg_2:0": 23894.73, "ib-loopback/IB_write_4194304_Avg_2:0": 23909.98, "ib-loopback/IB_write_8388608_Avg_2:0": 23927.21, "ib-loopback/IB_write_512_Avg_3:0": 1474.47, "ib-loopback/IB_write_1024_Avg_3:0": 3344.68, "ib-loopback/IB_write_2048_Avg_3:0": 6606.35, "ib-loopback/IB_write_4096_Avg_3:0": 12071.4, "ib-loopback/IB_write_8192_Avg_3:0": 18408.33, "ib-loopback/IB_write_16384_Avg_3:0": 20789.27, "ib-loopback/IB_write_32768_Avg_3:0": 22469.43, "ib-loopback/IB_write_65536_Avg_3:0": 22777.1, "ib-loopback/IB_write_131072_Avg_3:0": 23461.03, "ib-loopback/IB_write_262144_Avg_3:0": 23397.19, "ib-loopback/IB_write_524288_Avg_3:0": 23526.55, "ib-loopback/IB_write_1048576_Avg_3:0": 23854.76, "ib-loopback/IB_write_2097152_Avg_3:0": 23862.23, "ib-loopback/IB_write_4194304_Avg_3:0": 23931.15, "ib-loopback/IB_write_8388608_Avg_3:0": 23924.44, "ib-loopback/IB_write_512_Avg_4:0": 1523.33, "ib-loopback/IB_write_1024_Avg_4:0": 3233.23, "ib-loopback/IB_write_2048_Avg_4:0": 6792.88, "ib-loopback/IB_write_4096_Avg_4:0": 12616.05, "ib-loopback/IB_write_8192_Avg_4:0": 19324.05, "ib-loopback/IB_write_16384_Avg_4:0": 22082.51, "ib-loopback/IB_write_32768_Avg_4:0": 23294.23, "ib-loopback/IB_write_65536_Avg_4:0": 23546.22, "ib-loopback/IB_write_131072_Avg_4:0": 23727.91, "ib-loopback/IB_write_262144_Avg_4:0": 23843.93, "ib-loopback/IB_write_524288_Avg_4:0": 23905.96, "ib-loopback/IB_write_1048576_Avg_4:0": 23902.37, "ib-loopback/IB_write_2097152_Avg_4:0": 23921.03, "ib-loopback/IB_write_4194304_Avg_4:0": 23921.06, "ib-loopback/IB_write_8388608_Avg_4:0": 23922.4, "ib-loopback/IB_write_512_Avg_5:0": 1506.63, "ib-loopback/IB_write_1024_Avg_5:0": 3261.71, "ib-loopback/IB_write_2048_Avg_5:0": 6752.38, "ib-loopback/IB_write_4096_Avg_5:0": 13592.41, "ib-loopback/IB_write_8192_Avg_5:0": 19989.31, "ib-loopback/IB_write_16384_Avg_5:0": 22560.57, "ib-loopback/IB_write_32768_Avg_5:0": 23389.29, "ib-loopback/IB_write_65536_Avg_5:0": 23503.32, "ib-loopback/IB_write_131072_Avg_5:0": 23741.51, "ib-loopback/IB_write_262144_Avg_5:0": 23866.43, "ib-loopback/IB_write_524288_Avg_5:0": 23898.95, "ib-loopback/IB_write_1048576_Avg_5:0": 23876.36, "ib-loopback/IB_write_2097152_Avg_5:0": 23919.63, "ib-loopback/IB_write_4194304_Avg_5:0": 23924.68, "ib-loopback/IB_write_8388608_Avg_5:0": 23930.37, "ib-loopback/IB_write_512_Avg_6:0": 1467.69, "ib-loopback/IB_write_1024_Avg_6:0": 3157.04, "ib-loopback/IB_write_2048_Avg_6:0": 6494.61, "ib-loopback/IB_write_4096_Avg_6:0": 12883.51, "ib-loopback/IB_write_8192_Avg_6:0": 19207.67, "ib-loopback/IB_write_16384_Avg_6:0": 22519.39, "ib-loopback/IB_write_32768_Avg_6:0": 23323.46, "ib-loopback/IB_write_65536_Avg_6:0": 23523.6, "ib-loopback/IB_write_131072_Avg_6:0": 23626.67, "ib-loopback/IB_write_262144_Avg_6:0": 23836.99, "ib-loopback/IB_write_524288_Avg_6:0": 23904.51, "ib-loopback/IB_write_1048576_Avg_6:0": 23919.07, "ib-loopback/IB_write_2097152_Avg_6:0": 23943.82, "ib-loopback/IB_write_4194304_Avg_6:0": 23936.77, "ib-loopback/IB_write_8388608_Avg_6:0": 23941.57, "ib-loopback/IB_write_512_Avg_7:0": 1505.63, "ib-loopback/IB_write_1024_Avg_7:0": 3259.93, "ib-loopback/IB_write_2048_Avg_7:0": 6738.6, "ib-loopback/IB_write_4096_Avg_7:0": 13352.06, "ib-loopback/IB_write_8192_Avg_7:0": 19941.35, "ib-loopback/IB_write_16384_Avg_7:0": 22566.09, "ib-loopback/IB_write_32768_Avg_7:0": 23244.77, "ib-loopback/IB_write_65536_Avg_7:0": 23377.67, "ib-loopback/IB_write_131072_Avg_7:0": 23736.17, "ib-loopback/IB_write_262144_Avg_7:0": 23829.25, "ib-loopback/IB_write_524288_Avg_7:0": 23879.6, "ib-loopback/IB_write_1048576_Avg_7:0": 23895.1, "ib-loopback/IB_write_2097152_Avg_7:0": 23930.64, "ib-loopback/IB_write_4194304_Avg_7:0": 23845.63, "ib-loopback/IB_write_8388608_Avg_7:0": 23896.94, "kernel-launch/return_code":0, "kernel-launch/event_overhead:0": 0.00595, "kernel-launch/event_overhead:1": 0.00595, "kernel-launch/event_overhead:2": 0.00557, "kernel-launch/event_overhead:3": 0.0055, "kernel-launch/event_overhead:4": 0.00592, "kernel-launch/event_overhead:5": 0.00589, "kernel-launch/event_overhead:6": 0.00572, "kernel-launch/event_overhead:7": 0.0059, "kernel-launch/wall_overhead:0": 0.01026, "kernel-launch/wall_overhead:1": 0.01026, "kernel-launch/wall_overhead:2": 0.01046, "kernel-launch/wall_overhead:3": 0.01049, "kernel-launch/wall_overhead:4": 0.01063, "kernel-launch/wall_overhead:5": 0.01006, "kernel-launch/wall_overhead:6": 0.01045, "kernel-launch/wall_overhead:7": 0.01071, "lstm_models/pytorch-lstm/steptime_train_float32": 48.07024518959224, "lstm_models/pytorch-lstm/throughput_train_float32": 4806.472441132788, "lstm_models/pytorch-lstm/steptime_train_float16": 25.95312986522913, "lstm_models/pytorch-lstm/throughput_train_float16": 9069.90809255883, "pytorch-matmul/nosharding": 34.64499759674072, "mem-bw/return_code":0, "mem-bw/H2D_Mem_BW:0": 25.6, "mem-bw/H2D_Mem_BW:1": 25.8, "mem-bw/H2D_Mem_BW:2": 26.0, "mem-bw/H2D_Mem_BW:3": 26.1, "mem-bw/H2D_Mem_BW:4": 26.2, "mem-bw/H2D_Mem_BW:5": 25.8, "mem-bw/H2D_Mem_BW:6": 25.3, "mem-bw/H2D_Mem_BW:7": 26.1, "mem-bw/D2H_Mem_BW:0": 24.3, "mem-bw/D2H_Mem_BW:1": 24.6, "mem-bw/D2H_Mem_BW:2": 24.5, "mem-bw/D2H_Mem_BW:3": 24.6, "mem-bw/D2H_Mem_BW:4": 24.3, "mem-bw/D2H_Mem_BW:5": 24.3, "mem-bw/D2H_Mem_BW:6": 23.9, "mem-bw/D2H_Mem_BW:7": 24.6, "mem-bw/D2D_Mem_BW:0": 1118.0, "mem-bw/D2D_Mem_BW:1": 1114.6, "mem-bw/D2D_Mem_BW:2": 1119.7, "mem-bw/D2D_Mem_BW:3": 1121.9, "mem-bw/D2D_Mem_BW:4": 1109.7, "mem-bw/D2D_Mem_BW:5": 1110.1, "mem-bw/D2D_Mem_BW:6": 1123.3, "mem-bw/D2D_Mem_BW:7": 1117.6, "nccl-bw/allreduce_8_busbw:0": 0.0, "nccl-bw/allreduce_8_algbw:0": 0.0, "nccl-bw/allreduce_8_time:0": 37.84, "nccl-bw/allreduce_16_busbw:0": 0.0, "nccl-bw/allreduce_16_algbw:0": 0.0, "nccl-bw/allreduce_16_time:0": 36.42, "nccl-bw/allreduce_32_busbw:0": 0.0, "nccl-bw/allreduce_32_algbw:0": 0.0, "nccl-bw/allreduce_32_time:0": 36.87, "nccl-bw/allreduce_64_busbw:0": 0.0, "nccl-bw/allreduce_64_algbw:0": 0.0, "nccl-bw/allreduce_64_time:0": 35.83, "nccl-bw/allreduce_128_busbw:0": 0.01, "nccl-bw/allreduce_128_algbw:0": 0.0, "nccl-bw/allreduce_128_time:0": 36.91, "nccl-bw/allreduce_256_busbw:0": 0.01, "nccl-bw/allreduce_256_algbw:0": 0.01, "nccl-bw/allreduce_256_time:0": 37.58, "nccl-bw/allreduce_512_busbw:0": 0.02, "nccl-bw/allreduce_512_algbw:0": 0.01, "nccl-bw/allreduce_512_time:0": 36.98, "nccl-bw/allreduce_1024_busbw:0": 0.05, "nccl-bw/allreduce_1024_algbw:0": 0.03, "nccl-bw/allreduce_1024_time:0": 36.93, "nccl-bw/allreduce_2048_busbw:0": 0.1, "nccl-bw/allreduce_2048_algbw:0": 0.06, "nccl-bw/allreduce_2048_time:0": 36.06, "nccl-bw/allreduce_4096_busbw:0": 0.19, "nccl-bw/allreduce_4096_algbw:0": 0.11, "nccl-bw/allreduce_4096_time:0": 37.2, "nccl-bw/allreduce_8192_busbw:0": 0.39, "nccl-bw/allreduce_8192_algbw:0": 0.22, "nccl-bw/allreduce_8192_time:0": 37.04, "nccl-bw/allreduce_16384_busbw:0": 0.77, "nccl-bw/allreduce_16384_algbw:0": 0.44, "nccl-bw/allreduce_16384_time:0": 37.46, "nccl-bw/allreduce_32768_busbw:0": 1.52, "nccl-bw/allreduce_32768_algbw:0": 0.87, "nccl-bw/allreduce_32768_time:0": 37.64, "nccl-bw/allreduce_65536_busbw:0": 3.0, "nccl-bw/allreduce_65536_algbw:0": 1.71, "nccl-bw/allreduce_65536_time:0": 38.22, "nccl-bw/allreduce_131072_busbw:0": 5.31, "nccl-bw/allreduce_131072_algbw:0": 3.04, "nccl-bw/allreduce_131072_time:0": 43.17, "nccl-bw/allreduce_262144_busbw:0": 9.5, "nccl-bw/allreduce_262144_algbw:0": 5.43, "nccl-bw/allreduce_262144_time:0": 48.29, "nccl-bw/allreduce_524288_busbw:0": 15.11, "nccl-bw/allreduce_524288_algbw:0": 8.64, "nccl-bw/allreduce_524288_time:0": 60.71, "nccl-bw/allreduce_1048576_busbw:0": 24.1, "nccl-bw/allreduce_1048576_algbw:0": 13.77, "nccl-bw/allreduce_1048576_time:0": 76.13, "nccl-bw/allreduce_2097152_busbw:0": 38.12, "nccl-bw/allreduce_2097152_algbw:0": 21.78, "nccl-bw/allreduce_2097152_time:0": 96.28, "nccl-bw/allreduce_4194304_busbw:0": 65.75, "nccl-bw/allreduce_4194304_algbw:0": 37.57, "nccl-bw/allreduce_4194304_time:0": 111.6, "nccl-bw/allreduce_8388608_busbw:0": 89.51, "nccl-bw/allreduce_8388608_algbw:0": 51.15, "nccl-bw/allreduce_8388608_time:0": 164.0, "nccl-bw/allreduce_16777216_busbw:0": 114.38, "nccl-bw/allreduce_16777216_algbw:0": 65.36, "nccl-bw/allreduce_16777216_time:0": 256.7, "nccl-bw/allreduce_33554432_busbw:0": 154.89, "nccl-bw/allreduce_33554432_algbw:0": 88.51, "nccl-bw/allreduce_33554432_time:0": 379.1, "nccl-bw/allreduce_67108864_busbw:0": 200.01, "nccl-bw/allreduce_67108864_algbw:0": 114.29, "nccl-bw/allreduce_67108864_time:0": 587.2, "nccl-bw/allreduce_134217728_busbw:0": 202.97, "nccl-bw/allreduce_134217728_algbw:0": 115.98, "nccl-bw/allreduce_134217728_time:0": 1157.2, "nccl-bw/allreduce_268435456_busbw:0": 221.82, "nccl-bw/allreduce_268435456_algbw:0": 126.75, "nccl-bw/allreduce_268435456_time:0": 2117.8, "nccl-bw/allreduce_536870912_busbw:0": 224.54, "nccl-bw/allreduce_536870912_algbw:0": 128.31, "nccl-bw/allreduce_536870912_time:0": 4184.2, "nccl-bw/allreduce_1073741824_busbw:0": 230.15, "nccl-bw/allreduce_1073741824_algbw:0": 131.51, "nccl-bw/allreduce_1073741824_time:0": 8164.5, "nccl-bw/allreduce_2147483648_busbw:0": 231.89, "nccl-bw/allreduce_2147483648_algbw:0": 132.51, "nccl-bw/allreduce_2147483648_time:0": 16207.0, "nccl-bw/allreduce_4294967296_busbw:0": 234.45, "nccl-bw/allreduce_4294967296_algbw:0": 133.97, "nccl-bw/allreduce_4294967296_time:0": 32059.0, "nccl-bw/allreduce_8589934592_busbw:0": 235.36, "nccl-bw/allreduce_8589934592_algbw:0": 134.49, "nccl-bw/allreduce_8589934592_time:0": 63870.0, "resnet_models/pytorch-resnet50/steptime_train_float32": 253.95522732287645, "resnet_models/pytorch-resnet50/throughput_train_float32": 760.3348099129964, "resnet_models/pytorch-resnet50/steptime_train_float16": 200.08606184273958, "resnet_models/pytorch-resnet50/throughput_train_float16": 971.0651430922575, "resnet_models/pytorch-resnet101/steptime_train_float32": 389.08605091273785, "resnet_models/pytorch-resnet101/throughput_train_float32": 496.11747409298965, "resnet_models/pytorch-resnet101/steptime_train_float16": 308.6274107918143, "resnet_models/pytorch-resnet101/throughput_train_float16": 627.2056272195069, "resnet_models/pytorch-resnet152/steptime_train_float32": 547.6558278314769, "resnet_models/pytorch-resnet152/throughput_train_float32": 352.07099543348215, "resnet_models/pytorch-resnet152/steptime_train_float16": 424.5809856802225, "resnet_models/pytorch-resnet152/throughput_train_float16": 454.8335998153649, "pytorch-sharding-matmul/allreduce": 10.574411869049072, "pytorch-sharding-matmul/allgather": 10.084696769714355, "vgg_models/pytorch-vgg11/steptime_train_float32": 40.35283671692014, "vgg_models/pytorch-vgg11/throughput_train_float32": 796.3615936949874, "vgg_models/pytorch-vgg11/steptime_train_float16": 24.133514845743775, "vgg_models/pytorch-vgg11/throughput_train_float16": 1330.411361458461, "vgg_models/pytorch-vgg13/steptime_train_float32": 55.466310936026275, "vgg_models/pytorch-vgg13/throughput_train_float32": 580.234107444399, "vgg_models/pytorch-vgg13/steptime_train_float16": 33.35228993091732, "vgg_models/pytorch-vgg13/throughput_train_float16": 962.5332023901524, "vgg_models/pytorch-vgg16/steptime_train_float32": 65.22519944701344, "vgg_models/pytorch-vgg16/throughput_train_float32": 493.4268638875934, "vgg_models/pytorch-vgg16/steptime_train_float16": 39.25287735182792, "vgg_models/pytorch-vgg16/throughput_train_float16": 817.2008546147621, "vgg_models/pytorch-vgg19/steptime_train_float32": 74.93487105239183, "vgg_models/pytorch-vgg19/throughput_train_float32": 429.80921583106164, "vgg_models/pytorch-vgg19/steptime_train_float16": 45.20330624654889, "vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377091}
{"node": "sb-validation-03","bert_models/pytorch-bert-base/steptime_train_float32": 114.59167010616511,"bert_models/pytorch-bert-base/throughput_train_float32": 279.8794623591105, "bert_models/pytorch-bert-base/steptime_train_float16": 83.88951083179563, "bert_models/pytorch-bert-base/throughput_train_float16": 382.0672582741963, "bert_models/pytorch-bert-large/steptime_train_float32": 307.9359371913597, "bert_models/pytorch-bert-large/throughput_train_float32": 103.94876097417632, "bert_models/pytorch-bert-large/steptime_train_float16": 206.81141689419746, "bert_models/pytorch-bert-large/throughput_train_float16": 154.84089117113942, "pytorch-computation-communication-overlap/mul_cost:0": 44.182206214372854, "pytorch-computation-communication-overlap/mul_cost:1": 44.18221393893873, "pytorch-computation-communication-overlap/mul_cost:2": 43.970147078084665, "pytorch-computation-communication-overlap/mul_cost:3": 43.97014787559783, "pytorch-computation-communication-overlap/mul_cost:4": 43.97017793166924, "pytorch-computation-communication-overlap/mul_cost:5": 43.97015716062924, "pytorch-computation-communication-overlap/mul_cost:6": 43.97016519828867, "pytorch-computation-communication-overlap/mul_cost:7": 44.179544478538446, "pytorch-computation-communication-overlap/matmul_cost:0": 137.04773705558182, "pytorch-computation-communication-overlap/matmul_cost:1": 137.0478344692856, "pytorch-computation-communication-overlap/matmul_cost:2": 137.04777220902997, "pytorch-computation-communication-overlap/matmul_cost:3": 137.04779697263803, "pytorch-computation-communication-overlap/matmul_cost:4": 137.04813674314664, "pytorch-computation-communication-overlap/matmul_cost:5": 137.04821988767435, "pytorch-computation-communication-overlap/matmul_cost:6": 137.04775322366913, "pytorch-computation-communication-overlap/matmul_cost:7": 137.0478081606734, "densenet_models/pytorch-densenet169/steptime_train_float32": 150.64155543223023, "densenet_models/pytorch-densenet169/throughput_train_float32": 212.47017192425312, "densenet_models/pytorch-densenet169/steptime_train_float16": 145.30819572973996, "densenet_models/pytorch-densenet169/throughput_train_float16": 220.293233730535, "densenet_models/pytorch-densenet201/steptime_train_float32": 182.91117786429822, "densenet_models/pytorch-densenet201/throughput_train_float32": 174.9742278232251, "densenet_models/pytorch-densenet201/steptime_train_float16": 176.3489063596353, "densenet_models/pytorch-densenet201/throughput_train_float16": 181.54465736033902, "gemm-flops/FP64:0": 9031.23, "gemm-flops/FP64:1": 9040.85, "gemm-flops/FP64:2": 9010.56, "gemm-flops/FP64:3": 9041.26, "gemm-flops/FP64:4": 9039.19, "gemm-flops/FP64:5": 9015.69, "gemm-flops/FP64:6": 9022.19, "gemm-flops/FP64:7": 9030.2, "gemm-flops/FP32:0": 18362.1, "gemm-flops/FP32:1": 18375.6, "gemm-flops/FP32:2": 18314.9, "gemm-flops/FP32:3": 18375.6, "gemm-flops/FP32:4": 18368.6, "gemm-flops/FP32:5": 18347.1, "gemm-flops/FP32:6": 18247.4, "gemm-flops/FP32:7": 18318.4, "gemm-flops/FP16:0": 33878.0, "gemm-flops/FP16:1": 33911.1, "gemm-flops/FP16:2": 33769.3, "gemm-flops/FP16:3": 33909.9, "gemm-flops/FP16:4": 33896.5, "gemm-flops/FP16:5": 33798.1, "gemm-flops/FP16:6": 33647.3, "gemm-flops/FP16:7": 33764.8, "gemm-flops/FP64_TC:0": 18963.6, "gemm-flops/FP64_TC:1": 18924.2, "gemm-flops/FP64_TC:2": 18930.3, "gemm-flops/FP64_TC:3": 18971.9, "gemm-flops/FP64_TC:4": 18946.0, "gemm-flops/FP64_TC:5": 18945.0, "gemm-flops/FP64_TC:6": 18822.9, "gemm-flops/FP64_TC:7": 18911.1, "gemm-flops/TF32_TC:0": 127900.0, "gemm-flops/TF32_TC:1": 129094.0, "gemm-flops/TF32_TC:2": 127831.0, "gemm-flops/TF32_TC:3": 128709.0, "gemm-flops/TF32_TC:4": 127388.0, "gemm-flops/TF32_TC:5": 127861.0, "gemm-flops/TF32_TC:6": 128492.0, "gemm-flops/TF32_TC:7": 127720.0, "gemm-flops/BF16_TC:0": 264965.0, "gemm-flops/BF16_TC:1": 266638.0, "gemm-flops/BF16_TC:2": 263151.0, "gemm-flops/BF16_TC:3": 264752.0, "gemm-flops/BF16_TC:4": 263049.0, "gemm-flops/BF16_TC:5": 266605.0, "gemm-flops/BF16_TC:6": 267501.0, "gemm-flops/BF16_TC:7": 263880.0, "gemm-flops/FP16_TC:0": 279474.0, "gemm-flops/FP16_TC:1": 281256.0, "gemm-flops/FP16_TC:2": 277403.0, "gemm-flops/FP16_TC:3": 279147.0, "gemm-flops/FP16_TC:4": 277587.0, "gemm-flops/FP16_TC:5": 281537.0, "gemm-flops/FP16_TC:6": 282132.0, "gemm-flops/FP16_TC:7": 277788.0, "gemm-flops/INT8_TC:0": 475160.0, "gemm-flops/INT8_TC:1": 477725.0, "gemm-flops/INT8_TC:2": 471621.0, "gemm-flops/INT8_TC:3": 473716.0, "gemm-flops/INT8_TC:4": 472124.0, "gemm-flops/INT8_TC:5": 479972.0, "gemm-flops/INT8_TC:6": 481327.0, "gemm-flops/INT8_TC:7": 474710.0, "gemm-flops/INT4_TC:0": 970330.0, "gemm-flops/INT4_TC:1": 976837.0, "gemm-flops/INT4_TC:2": 966003.0, "gemm-flops/INT4_TC:3": 971315.0, "gemm-flops/INT4_TC:4": 964441.0, "gemm-flops/INT4_TC:5": 982461.0, "gemm-flops/INT4_TC:6": 979610.0, "gemm-flops/INT4_TC:7": 968359.0, "gpt_models/pytorch-gpt2-large/steptime_train_float32": 295.0526971835643, "gpt_models/pytorch-gpt2-large/throughput_train_float32": 27.115454396866486, "gpt_models/pytorch-gpt2-large/steptime_train_float16": 194.4957742234692, "gpt_models/pytorch-gpt2-large/throughput_train_float16": 41.139449941061585, "gpu-sm-copy-bw/dtoh:0": 3.91755, "gpu-sm-copy-bw/dtoh:1": 4.45414, "gpu-sm-copy-bw/dtoh:2": 1.26483, "gpu-sm-copy-bw/dtoh:3": 1.30041, "gpu-sm-copy-bw/dtoh:4": 1.31577, "gpu-sm-copy-bw/dtoh:5": 1.27968, "gpu-sm-copy-bw/dtoh:6": 4.47849, "gpu-sm-copy-bw/dtoh:7": 3.96231, "gpu-sm-copy-bw/dtoh:8": 3.91705, "gpu-sm-copy-bw/dtoh:9": 4.45487, "gpu-sm-copy-bw/dtoh:10": 1.26352, "gpu-sm-copy-bw/dtoh:11": 1.2999, "gpu-sm-copy-bw/dtoh:12": 1.31677, "gpu-sm-copy-bw/dtoh:13": 1.27885, "gpu-sm-copy-bw/dtoh:14": 4.47913, "gpu-sm-copy-bw/dtoh:15": 3.95893, "gpu-sm-copy-bw/dtoh:16": 3.91729, "gpu-sm-copy-bw/dtoh:17": 4.45627, "gpu-sm-copy-bw/dtoh:18": 1.26437, "gpu-sm-copy-bw/dtoh:19": 1.30144, "gpu-sm-copy-bw/dtoh:20": 1.31704, "gpu-sm-copy-bw/dtoh:21": 1.27857, "gpu-sm-copy-bw/dtoh:22": 4.47889, "gpu-sm-copy-bw/dtoh:23": 3.95984, "gpu-sm-copy-bw/dtoh:24": 3.92025, "gpu-sm-copy-bw/dtoh:25": 4.45423, "gpu-sm-copy-bw/dtoh:26": 1.26449, "gpu-sm-copy-bw/dtoh:27": 1.29954, "gpu-sm-copy-bw/dtoh:28": 1.31731, "gpu-sm-copy-bw/dtoh:29": 1.27916, "gpu-sm-copy-bw/dtoh:30": 4.4797, "gpu-sm-copy-bw/dtoh:31": 3.96124, "gpu-sm-copy-bw/htod:0": 23.9685, "gpu-sm-copy-bw/htod:1": 23.967, "gpu-sm-copy-bw/htod:2": 19.9898, "gpu-sm-copy-bw/htod:3": 20.2848, "gpu-sm-copy-bw/htod:4": 20.3931, "gpu-sm-copy-bw/htod:5": 20.0888, "gpu-sm-copy-bw/htod:6": 23.9766, "gpu-sm-copy-bw/htod:7": 23.9792, "gpu-sm-copy-bw/htod:8": 23.9497, "gpu-sm-copy-bw/htod:9": 23.9438, "gpu-sm-copy-bw/htod:10": 20.0039, "gpu-sm-copy-bw/htod:11": 20.2469, "gpu-sm-copy-bw/htod:12": 20.4344, "gpu-sm-copy-bw/htod:13": 20.1005, "gpu-sm-copy-bw/htod:14": 23.9491, "gpu-sm-copy-bw/htod:15": 23.9898, "gpu-sm-copy-bw/htod:16": 23.962, "gpu-sm-copy-bw/htod:17": 23.97, "gpu-sm-copy-bw/htod:18": 19.987, "gpu-sm-copy-bw/htod:19": 20.1549, "gpu-sm-copy-bw/htod:20": 20.3931, "gpu-sm-copy-bw/htod:21": 20.1013, "gpu-sm-copy-bw/htod:22": 23.9865, "gpu-sm-copy-bw/htod:23": 23.972, "gpu-sm-copy-bw/htod:24": 23.9569, "gpu-sm-copy-bw/htod:25": 23.9762, "gpu-sm-copy-bw/htod:26": 19.92, "gpu-sm-copy-bw/htod:27": 20.2638, "gpu-sm-copy-bw/htod:28": 20.4419, "gpu-sm-copy-bw/htod:29": 20.1054, "gpu-sm-copy-bw/htod:30": 23.9752, "gpu-sm-copy-bw/htod:31": 23.9631, "ib-loopback/IB_write_512_Avg_0:0": 1492.89, "ib-loopback/IB_write_1024_Avg_0:0": 3224.92, "ib-loopback/IB_write_2048_Avg_0:0": 6714.5, "ib-loopback/IB_write_4096_Avg_0:0": 12871.93, "ib-loopback/IB_write_8192_Avg_0:0": 19990.78, "ib-loopback/IB_write_16384_Avg_0:0": 22172.25, "ib-loopback/IB_write_32768_Avg_0:0": 23073.19, "ib-loopback/IB_write_65536_Avg_0:0": 23527.09, "ib-loopback/IB_write_131072_Avg_0:0": 23805.92, "ib-loopback/IB_write_262144_Avg_0:0": 23380.51, "ib-loopback/IB_write_524288_Avg_0:0": 23856.63, "ib-loopback/IB_write_1048576_Avg_0:0": 23869.58, "ib-loopback/IB_write_2097152_Avg_0:0": 23885.7, "ib-loopback/IB_write_4194304_Avg_0:0": 23914.56, "ib-loopback/IB_write_8388608_Avg_0:0": 23935.21, "ib-loopback/IB_write_512_Avg_1:0": 1479.1, "ib-loopback/IB_write_1024_Avg_1:0": 3246.48, "ib-loopback/IB_write_2048_Avg_1:0": 6754.94, "ib-loopback/IB_write_4096_Avg_1:0": 13101.09, "ib-loopback/IB_write_8192_Avg_1:0": 19945.81, "ib-loopback/IB_write_16384_Avg_1:0": 22631.62, "ib-loopback/IB_write_32768_Avg_1:0": 23381.58, "ib-loopback/IB_write_65536_Avg_1:0": 23515.29, "ib-loopback/IB_write_131072_Avg_1:0": 23756.01, "ib-loopback/IB_write_262144_Avg_1:0": 23875.91, "ib-loopback/IB_write_524288_Avg_1:0": 23911.13, "ib-loopback/IB_write_1048576_Avg_1:0": 23935.42, "ib-loopback/IB_write_2097152_Avg_1:0": 23941.25, "ib-loopback/IB_write_4194304_Avg_1:0": 23922.06, "ib-loopback/IB_write_8388608_Avg_1:0": 23928.59, "ib-loopback/IB_write_512_Avg_2:0": 1505.29, "ib-loopback/IB_write_1024_Avg_2:0": 3215.97, "ib-loopback/IB_write_2048_Avg_2:0": 6745.49, "ib-loopback/IB_write_4096_Avg_2:0": 11548.88, "ib-loopback/IB_write_8192_Avg_2:0": 19432.15, "ib-loopback/IB_write_16384_Avg_2:0": 22765.51, "ib-loopback/IB_write_32768_Avg_2:0": 23235.07, "ib-loopback/IB_write_65536_Avg_2:0": 23620.08, "ib-loopback/IB_write_131072_Avg_2:0": 23759.08, "ib-loopback/IB_write_262144_Avg_2:0": 23859.82, "ib-loopback/IB_write_524288_Avg_2:0": 23775.01, "ib-loopback/IB_write_1048576_Avg_2:0": 23885.96, "ib-loopback/IB_write_2097152_Avg_2:0": 23894.73, "ib-loopback/IB_write_4194304_Avg_2:0": 23909.98, "ib-loopback/IB_write_8388608_Avg_2:0": 23927.21, "ib-loopback/IB_write_512_Avg_3:0": 1474.47, "ib-loopback/IB_write_1024_Avg_3:0": 3344.68, "ib-loopback/IB_write_2048_Avg_3:0": 6606.35, "ib-loopback/IB_write_4096_Avg_3:0": 12071.4, "ib-loopback/IB_write_8192_Avg_3:0": 18408.33, "ib-loopback/IB_write_16384_Avg_3:0": 20789.27, "ib-loopback/IB_write_32768_Avg_3:0": 22469.43, "ib-loopback/IB_write_65536_Avg_3:0": 22777.1, "ib-loopback/IB_write_131072_Avg_3:0": 23461.03, "ib-loopback/IB_write_262144_Avg_3:0": 23397.19, "ib-loopback/IB_write_524288_Avg_3:0": 23526.55, "ib-loopback/IB_write_1048576_Avg_3:0": 23854.76, "ib-loopback/IB_write_2097152_Avg_3:0": 23862.23, "ib-loopback/IB_write_4194304_Avg_3:0": 23931.15, "ib-loopback/IB_write_8388608_Avg_3:0": 23924.44, "ib-loopback/IB_write_512_Avg_4:0": 1523.33, "ib-loopback/IB_write_1024_Avg_4:0": 3233.23, "ib-loopback/IB_write_2048_Avg_4:0": 6792.88, "ib-loopback/IB_write_4096_Avg_4:0": 12616.05, "ib-loopback/IB_write_8192_Avg_4:0": 19324.05, "ib-loopback/IB_write_16384_Avg_4:0": 22082.51, "ib-loopback/IB_write_32768_Avg_4:0": 23294.23, "ib-loopback/IB_write_65536_Avg_4:0": 23546.22, "ib-loopback/IB_write_131072_Avg_4:0": 23727.91, "ib-loopback/IB_write_262144_Avg_4:0": 23843.93, "ib-loopback/IB_write_524288_Avg_4:0": 23905.96, "ib-loopback/IB_write_1048576_Avg_4:0": 23902.37, "ib-loopback/IB_write_2097152_Avg_4:0": 23921.03, "ib-loopback/IB_write_4194304_Avg_4:0": 23921.06, "ib-loopback/IB_write_8388608_Avg_4:0": 23922.4, "ib-loopback/IB_write_512_Avg_5:0": 1506.63, "ib-loopback/IB_write_1024_Avg_5:0": 3261.71, "ib-loopback/IB_write_2048_Avg_5:0": 6752.38, "ib-loopback/IB_write_4096_Avg_5:0": 13592.41, "ib-loopback/IB_write_8192_Avg_5:0": 19989.31, "ib-loopback/IB_write_16384_Avg_5:0": 22560.57, "ib-loopback/IB_write_32768_Avg_5:0": 23389.29, "ib-loopback/IB_write_65536_Avg_5:0": 23503.32, "ib-loopback/IB_write_131072_Avg_5:0": 23741.51, "ib-loopback/IB_write_262144_Avg_5:0": 23866.43, "ib-loopback/IB_write_524288_Avg_5:0": 23898.95, "ib-loopback/IB_write_1048576_Avg_5:0": 23876.36, "ib-loopback/IB_write_2097152_Avg_5:0": 23919.63, "ib-loopback/IB_write_4194304_Avg_5:0": 23924.68, "ib-loopback/IB_write_8388608_Avg_5:0": 23930.37, "ib-loopback/IB_write_512_Avg_6:0": 1467.69, "ib-loopback/IB_write_1024_Avg_6:0": 3157.04, "ib-loopback/IB_write_2048_Avg_6:0": 6494.61, "ib-loopback/IB_write_4096_Avg_6:0": 12883.51, "ib-loopback/IB_write_8192_Avg_6:0": 19207.67, "ib-loopback/IB_write_16384_Avg_6:0": 22519.39, "ib-loopback/IB_write_32768_Avg_6:0": 23323.46, "ib-loopback/IB_write_65536_Avg_6:0": 23523.6, "ib-loopback/IB_write_131072_Avg_6:0": 23626.67, "ib-loopback/IB_write_262144_Avg_6:0": 23836.99, "ib-loopback/IB_write_524288_Avg_6:0": 23904.51, "ib-loopback/IB_write_1048576_Avg_6:0": 23919.07, "ib-loopback/IB_write_2097152_Avg_6:0": 23943.82, "ib-loopback/IB_write_4194304_Avg_6:0": 23936.77, "ib-loopback/IB_write_8388608_Avg_6:0": 23941.57, "ib-loopback/IB_write_512_Avg_7:0": 1505.63, "ib-loopback/IB_write_1024_Avg_7:0": 3259.93, "ib-loopback/IB_write_2048_Avg_7:0": 6738.6, "ib-loopback/IB_write_4096_Avg_7:0": 13352.06, "ib-loopback/IB_write_8192_Avg_7:0": 19941.35, "ib-loopback/IB_write_16384_Avg_7:0": 22566.09, "ib-loopback/IB_write_32768_Avg_7:0": 23244.77, "ib-loopback/IB_write_65536_Avg_7:0": 23377.67, "ib-loopback/IB_write_131072_Avg_7:0": 23736.17, "ib-loopback/IB_write_262144_Avg_7:0": 23829.25, "ib-loopback/IB_write_524288_Avg_7:0": 23879.6, "ib-loopback/IB_write_1048576_Avg_7:0": 23895.1, "ib-loopback/IB_write_2097152_Avg_7:0": 23930.64, "ib-loopback/IB_write_4194304_Avg_7:0": 23845.63, "ib-loopback/IB_write_8388608_Avg_7:0": 23896.94, "kernel-launch/return_code":0, "kernel-launch/event_overhead:0": 0.00596, "kernel-launch/event_overhead:1": 0.00595, "kernel-launch/event_overhead:2": 0.00557, "kernel-launch/event_overhead:3": 0.0055, "kernel-launch/event_overhead:4": 0.00592, "kernel-launch/event_overhead:5": 0.00589, "kernel-launch/event_overhead:6": 0.00572, "kernel-launch/event_overhead:7": 0.0059, "kernel-launch/wall_overhead:0": 0.01026, "kernel-launch/wall_overhead:1": 0.01026, "kernel-launch/wall_overhead:2": 0.01046, "kernel-launch/wall_overhead:3": 0.01049, "kernel-launch/wall_overhead:4": 0.01063, "kernel-launch/wall_overhead:5": 0.01006, "kernel-launch/wall_overhead:6": 0.01045, "kernel-launch/wall_overhead:7": 0.01071, "lstm_models/pytorch-lstm/steptime_train_float32": 48.07024518959224, "lstm_models/pytorch-lstm/throughput_train_float32": 4806.472441132788, "lstm_models/pytorch-lstm/steptime_train_float16": 25.95312986522913, "lstm_models/pytorch-lstm/throughput_train_float16": 9069.90809255883, "pytorch-matmul/nosharding": 34.64499759674072, "mem-bw/return_code":1, "nccl-bw/allreduce_8_busbw:0": 0.0, "nccl-bw/allreduce_8_algbw:0": 0.0, "nccl-bw/allreduce_8_time:0": 37.84, "nccl-bw/allreduce_16_busbw:0": 0.0, "nccl-bw/allreduce_16_algbw:0": 0.0, "nccl-bw/allreduce_16_time:0": 36.42, "nccl-bw/allreduce_32_busbw:0": 0.0, "nccl-bw/allreduce_32_algbw:0": 0.0, "nccl-bw/allreduce_32_time:0": 36.87, "nccl-bw/allreduce_64_busbw:0": 0.0, "nccl-bw/allreduce_64_algbw:0": 0.0, "nccl-bw/allreduce_64_time:0": 35.83, "nccl-bw/allreduce_128_busbw:0": 0.01, "nccl-bw/allreduce_128_algbw:0": 0.0, "nccl-bw/allreduce_128_time:0": 36.91, "nccl-bw/allreduce_256_busbw:0": 0.01, "nccl-bw/allreduce_256_algbw:0": 0.01, "nccl-bw/allreduce_256_time:0": 37.58, "nccl-bw/allreduce_512_busbw:0": 0.02, "nccl-bw/allreduce_512_algbw:0": 0.01, "nccl-bw/allreduce_512_time:0": 36.98, "nccl-bw/allreduce_1024_busbw:0": 0.05, "nccl-bw/allreduce_1024_algbw:0": 0.03, "nccl-bw/allreduce_1024_time:0": 36.93, "nccl-bw/allreduce_2048_busbw:0": 0.1, "nccl-bw/allreduce_2048_algbw:0": 0.06, "nccl-bw/allreduce_2048_time:0": 36.06, "nccl-bw/allreduce_4096_busbw:0": 0.19, "nccl-bw/allreduce_4096_algbw:0": 0.11, "nccl-bw/allreduce_4096_time:0": 37.2, "nccl-bw/allreduce_8192_busbw:0": 0.39, "nccl-bw/allreduce_8192_algbw:0": 0.22, "nccl-bw/allreduce_8192_time:0": 37.04, "nccl-bw/allreduce_16384_busbw:0": 0.77, "nccl-bw/allreduce_16384_algbw:0": 0.44, "nccl-bw/allreduce_16384_time:0": 37.46, "nccl-bw/allreduce_32768_busbw:0": 1.52, "nccl-bw/allreduce_32768_algbw:0": 0.87, "nccl-bw/allreduce_32768_time:0": 37.64, "nccl-bw/allreduce_65536_busbw:0": 3.0, "nccl-bw/allreduce_65536_algbw:0": 1.71, "nccl-bw/allreduce_65536_time:0": 38.22, "nccl-bw/allreduce_131072_busbw:0": 5.31, "nccl-bw/allreduce_131072_algbw:0": 3.04, "nccl-bw/allreduce_131072_time:0": 43.17, "nccl-bw/allreduce_262144_busbw:0": 9.5, "nccl-bw/allreduce_262144_algbw:0": 5.43, "nccl-bw/allreduce_262144_time:0": 48.29, "nccl-bw/allreduce_524288_busbw:0": 15.11, "nccl-bw/allreduce_524288_algbw:0": 8.64, "nccl-bw/allreduce_524288_time:0": 60.71, "nccl-bw/allreduce_1048576_busbw:0": 24.1, "nccl-bw/allreduce_1048576_algbw:0": 13.77, "nccl-bw/allreduce_1048576_time:0": 76.13, "nccl-bw/allreduce_2097152_busbw:0": 38.12, "nccl-bw/allreduce_2097152_algbw:0": 21.78, "nccl-bw/allreduce_2097152_time:0": 96.28, "nccl-bw/allreduce_4194304_busbw:0": 65.75, "nccl-bw/allreduce_4194304_algbw:0": 37.57, "nccl-bw/allreduce_4194304_time:0": 111.6, "nccl-bw/allreduce_8388608_busbw:0": 89.51, "nccl-bw/allreduce_8388608_algbw:0": 51.15, "nccl-bw/allreduce_8388608_time:0": 164.0, "nccl-bw/allreduce_16777216_busbw:0": 114.38, "nccl-bw/allreduce_16777216_algbw:0": 65.36, "nccl-bw/allreduce_16777216_time:0": 256.7, "nccl-bw/allreduce_33554432_busbw:0": 154.89, "nccl-bw/allreduce_33554432_algbw:0": 88.51, "nccl-bw/allreduce_33554432_time:0": 379.1, "nccl-bw/allreduce_67108864_busbw:0": 200.01, "nccl-bw/allreduce_67108864_algbw:0": 114.29, "nccl-bw/allreduce_67108864_time:0": 587.2, "nccl-bw/allreduce_134217728_busbw:0": 202.97, "nccl-bw/allreduce_134217728_algbw:0": 115.98, "nccl-bw/allreduce_134217728_time:0": 1157.2, "nccl-bw/allreduce_268435456_busbw:0": 221.82, "nccl-bw/allreduce_268435456_algbw:0": 126.75, "nccl-bw/allreduce_268435456_time:0": 2117.8, "nccl-bw/allreduce_536870912_busbw:0": 224.54, "nccl-bw/allreduce_536870912_algbw:0": 128.31, "nccl-bw/allreduce_536870912_time:0": 4184.2, "nccl-bw/allreduce_1073741824_busbw:0": 230.15, "nccl-bw/allreduce_1073741824_algbw:0": 131.51, "nccl-bw/allreduce_1073741824_time:0": 8164.5, "nccl-bw/allreduce_2147483648_busbw:0": 231.89, "nccl-bw/allreduce_2147483648_algbw:0": 132.51, "nccl-bw/allreduce_2147483648_time:0": 16207.0, "nccl-bw/allreduce_4294967296_busbw:0": 234.45, "nccl-bw/allreduce_4294967296_algbw:0": 133.97, "nccl-bw/allreduce_4294967296_time:0": 32059.0, "nccl-bw/allreduce_8589934592_busbw:0": 235.36, "nccl-bw/allreduce_8589934592_algbw:0": 134.49, "nccl-bw/allreduce_8589934592_time:0": 63870.0, "resnet_models/pytorch-resnet50/steptime_train_float32": 253.95522732287645, "resnet_models/pytorch-resnet50/throughput_train_float32": 760.3348099129964, "resnet_models/pytorch-resnet50/steptime_train_float16": 200.08606184273958, "resnet_models/pytorch-resnet50/throughput_train_float16": 971.0651430922575, "resnet_models/pytorch-resnet101/steptime_train_float32": 389.08605091273785, "resnet_models/pytorch-resnet101/throughput_train_float32": 496.11747409298965, "resnet_models/pytorch-resnet101/steptime_train_float16": 308.6274107918143, "resnet_models/pytorch-resnet101/throughput_train_float16": 627.2056272195069, "resnet_models/pytorch-resnet152/steptime_train_float32": 547.6558278314769, "resnet_models/pytorch-resnet152/throughput_train_float32": 352.07099543348215, "resnet_models/pytorch-resnet152/steptime_train_float16": 424.5809856802225, "resnet_models/pytorch-resnet152/throughput_train_float16": 454.8335998153649, "pytorch-sharding-matmul/allreduce": 10.574411869049072, "pytorch-sharding-matmul/allgather": 10.084696769714355, "vgg_models/pytorch-vgg11/steptime_train_float32": 40.35283671692014, "vgg_models/pytorch-vgg11/throughput_train_float32": 796.3615936949874, "vgg_models/pytorch-vgg11/steptime_train_float16": 24.133514845743775, "vgg_models/pytorch-vgg11/throughput_train_float16": 1330.411361458461, "vgg_models/pytorch-vgg13/steptime_train_float32": 55.466310936026275, "vgg_models/pytorch-vgg13/throughput_train_float32": 580.234107444399, "vgg_models/pytorch-vgg13/steptime_train_float16": 33.35228993091732, "vgg_models/pytorch-vgg13/throughput_train_float16": 962.5332023901524, "vgg_models/pytorch-vgg16/steptime_train_float32": 65.22519944701344, "vgg_models/pytorch-vgg16/throughput_train_float32": 493.4268638875934, "vgg_models/pytorch-vgg16/steptime_train_float16": 39.25287735182792, "vgg_models/pytorch-vgg16/throughput_train_float16": 817.2008546147621, "vgg_models/pytorch-vgg19/steptime_train_float32": 74.93487105239183, "vgg_models/pytorch-vgg19/throughput_train_float32": 429.80921583106164, "vgg_models/pytorch-vgg19/steptime_train_float16": 45.20330624654889, "vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377091} {"node": "sb-validation-03","bert_models/pytorch-bert-base/steptime_train_float32": 114.59167010616511,"bert_models/pytorch-bert-base/throughput_train_float32": 279.8794623591105, "bert_models/pytorch-bert-base/steptime_train_float16": 83.88951083179563, "bert_models/pytorch-bert-base/throughput_train_float16": 382.0672582741963, "bert_models/pytorch-bert-large/steptime_train_float32": 307.9359371913597, "bert_models/pytorch-bert-large/throughput_train_float32": 103.94876097417632, "bert_models/pytorch-bert-large/steptime_train_float16": 206.81141689419746, "bert_models/pytorch-bert-large/throughput_train_float16": 154.84089117113942, "pytorch-computation-communication-overlap/mul_cost:0": 44.182206214372854, "pytorch-computation-communication-overlap/mul_cost:1": 44.18221393893873, "pytorch-computation-communication-overlap/mul_cost:2": 43.970147078084665, "pytorch-computation-communication-overlap/mul_cost:3": 43.97014787559783, "pytorch-computation-communication-overlap/mul_cost:4": 43.97017793166924, "pytorch-computation-communication-overlap/mul_cost:5": 43.97015716062924, "pytorch-computation-communication-overlap/mul_cost:6": 43.97016519828867, "pytorch-computation-communication-overlap/mul_cost:7": 44.179544478538446, "pytorch-computation-communication-overlap/matmul_cost:0": 137.04773705558182, "pytorch-computation-communication-overlap/matmul_cost:1": 137.0478344692856, "pytorch-computation-communication-overlap/matmul_cost:2": 137.04777220902997, "pytorch-computation-communication-overlap/matmul_cost:3": 137.04779697263803, "pytorch-computation-communication-overlap/matmul_cost:4": 137.04813674314664, "pytorch-computation-communication-overlap/matmul_cost:5": 137.04821988767435, "pytorch-computation-communication-overlap/matmul_cost:6": 137.04775322366913, "pytorch-computation-communication-overlap/matmul_cost:7": 137.0478081606734, "densenet_models/pytorch-densenet169/steptime_train_float32": 150.64155543223023, "densenet_models/pytorch-densenet169/throughput_train_float32": 212.47017192425312, "densenet_models/pytorch-densenet169/steptime_train_float16": 145.30819572973996, "densenet_models/pytorch-densenet169/throughput_train_float16": 220.293233730535, "densenet_models/pytorch-densenet201/steptime_train_float32": 182.91117786429822, "densenet_models/pytorch-densenet201/throughput_train_float32": 174.9742278232251, "densenet_models/pytorch-densenet201/steptime_train_float16": 176.3489063596353, "densenet_models/pytorch-densenet201/throughput_train_float16": 181.54465736033902, "gemm-flops/FP64:0": 9031.23, "gemm-flops/FP64:1": 9040.85, "gemm-flops/FP64:2": 9010.56, "gemm-flops/FP64:3": 9041.26, "gemm-flops/FP64:4": 9039.19, "gemm-flops/FP64:5": 9015.69, "gemm-flops/FP64:6": 9022.19, "gemm-flops/FP64:7": 9030.2, "gemm-flops/FP32:0": 18362.1, "gemm-flops/FP32:1": 18375.6, "gemm-flops/FP32:2": 18314.9, "gemm-flops/FP32:3": 18375.6, "gemm-flops/FP32:4": 18368.6, "gemm-flops/FP32:5": 18347.1, "gemm-flops/FP32:6": 18247.4, "gemm-flops/FP32:7": 18318.4, "gemm-flops/FP16:0": 33878.0, "gemm-flops/FP16:1": 33911.1, "gemm-flops/FP16:2": 33769.3, "gemm-flops/FP16:3": 33909.9, "gemm-flops/FP16:4": 33896.5, "gemm-flops/FP16:5": 33798.1, "gemm-flops/FP16:6": 33647.3, "gemm-flops/FP16:7": 33764.8, "gemm-flops/FP64_TC:0": 18963.6, "gemm-flops/FP64_TC:1": 18924.2, "gemm-flops/FP64_TC:2": 18930.3, "gemm-flops/FP64_TC:3": 18971.9, "gemm-flops/FP64_TC:4": 18946.0, "gemm-flops/FP64_TC:5": 18945.0, "gemm-flops/FP64_TC:6": 18822.9, "gemm-flops/FP64_TC:7": 18911.1, "gemm-flops/TF32_TC:0": 127900.0, "gemm-flops/TF32_TC:1": 129094.0, "gemm-flops/TF32_TC:2": 127831.0, "gemm-flops/TF32_TC:3": 128709.0, "gemm-flops/TF32_TC:4": 127388.0, "gemm-flops/TF32_TC:5": 127861.0, "gemm-flops/TF32_TC:6": 128492.0, "gemm-flops/TF32_TC:7": 127720.0, "gemm-flops/BF16_TC:0": 264965.0, "gemm-flops/BF16_TC:1": 266638.0, "gemm-flops/BF16_TC:2": 263151.0, "gemm-flops/BF16_TC:3": 264752.0, "gemm-flops/BF16_TC:4": 263049.0, "gemm-flops/BF16_TC:5": 266605.0, "gemm-flops/BF16_TC:6": 267501.0, "gemm-flops/BF16_TC:7": 263880.0, "gemm-flops/FP16_TC:0": 279474.0, "gemm-flops/FP16_TC:1": 281256.0, "gemm-flops/FP16_TC:2": 277403.0, "gemm-flops/FP16_TC:3": 279147.0, "gemm-flops/FP16_TC:4": 277587.0, "gemm-flops/FP16_TC:5": 281537.0, "gemm-flops/FP16_TC:6": 282132.0, "gemm-flops/FP16_TC:7": 277788.0, "gemm-flops/INT8_TC:0": 475160.0, "gemm-flops/INT8_TC:1": 477725.0, "gemm-flops/INT8_TC:2": 471621.0, "gemm-flops/INT8_TC:3": 473716.0, "gemm-flops/INT8_TC:4": 472124.0, "gemm-flops/INT8_TC:5": 479972.0, "gemm-flops/INT8_TC:6": 481327.0, "gemm-flops/INT8_TC:7": 474710.0, "gemm-flops/INT4_TC:0": 970330.0, "gemm-flops/INT4_TC:1": 976837.0, "gemm-flops/INT4_TC:2": 966003.0, "gemm-flops/INT4_TC:3": 971315.0, "gemm-flops/INT4_TC:4": 964441.0, "gemm-flops/INT4_TC:5": 982461.0, "gemm-flops/INT4_TC:6": 979610.0, "gemm-flops/INT4_TC:7": 968359.0, "gpt_models/pytorch-gpt2-large/steptime_train_float32": 295.0526971835643, "gpt_models/pytorch-gpt2-large/throughput_train_float32": 27.115454396866486, "gpt_models/pytorch-gpt2-large/steptime_train_float16": 194.4957742234692, "gpt_models/pytorch-gpt2-large/throughput_train_float16": 41.139449941061585, "gpu-sm-copy-bw/dtoh:0": 3.91755, "gpu-sm-copy-bw/dtoh:1": 4.45414, "gpu-sm-copy-bw/dtoh:2": 1.26483, "gpu-sm-copy-bw/dtoh:3": 1.30041, "gpu-sm-copy-bw/dtoh:4": 1.31577, "gpu-sm-copy-bw/dtoh:5": 1.27968, "gpu-sm-copy-bw/dtoh:6": 4.47849, "gpu-sm-copy-bw/dtoh:7": 3.96231, "gpu-sm-copy-bw/dtoh:8": 3.91705, "gpu-sm-copy-bw/dtoh:9": 4.45487, "gpu-sm-copy-bw/dtoh:10": 1.26352, "gpu-sm-copy-bw/dtoh:11": 1.2999, "gpu-sm-copy-bw/dtoh:12": 1.31677, "gpu-sm-copy-bw/dtoh:13": 1.27885, "gpu-sm-copy-bw/dtoh:14": 4.47913, "gpu-sm-copy-bw/dtoh:15": 3.95893, "gpu-sm-copy-bw/dtoh:16": 3.91729, "gpu-sm-copy-bw/dtoh:17": 4.45627, "gpu-sm-copy-bw/dtoh:18": 1.26437, "gpu-sm-copy-bw/dtoh:19": 1.30144, "gpu-sm-copy-bw/dtoh:20": 1.31704, "gpu-sm-copy-bw/dtoh:21": 1.27857, "gpu-sm-copy-bw/dtoh:22": 4.47889, "gpu-sm-copy-bw/dtoh:23": 3.95984, "gpu-sm-copy-bw/dtoh:24": 3.92025, "gpu-sm-copy-bw/dtoh:25": 4.45423, "gpu-sm-copy-bw/dtoh:26": 1.26449, "gpu-sm-copy-bw/dtoh:27": 1.29954, "gpu-sm-copy-bw/dtoh:28": 1.31731, "gpu-sm-copy-bw/dtoh:29": 1.27916, "gpu-sm-copy-bw/dtoh:30": 4.4797, "gpu-sm-copy-bw/dtoh:31": 3.96124, "gpu-sm-copy-bw/htod:0": 23.9685, "gpu-sm-copy-bw/htod:1": 23.967, "gpu-sm-copy-bw/htod:2": 19.9898, "gpu-sm-copy-bw/htod:3": 20.2848, "gpu-sm-copy-bw/htod:4": 20.3931, "gpu-sm-copy-bw/htod:5": 20.0888, "gpu-sm-copy-bw/htod:6": 23.9766, "gpu-sm-copy-bw/htod:7": 23.9792, "gpu-sm-copy-bw/htod:8": 23.9497, "gpu-sm-copy-bw/htod:9": 23.9438, "gpu-sm-copy-bw/htod:10": 20.0039, "gpu-sm-copy-bw/htod:11": 20.2469, "gpu-sm-copy-bw/htod:12": 20.4344, "gpu-sm-copy-bw/htod:13": 20.1005, "gpu-sm-copy-bw/htod:14": 23.9491, "gpu-sm-copy-bw/htod:15": 23.9898, "gpu-sm-copy-bw/htod:16": 23.962, "gpu-sm-copy-bw/htod:17": 23.97, "gpu-sm-copy-bw/htod:18": 19.987, "gpu-sm-copy-bw/htod:19": 20.1549, "gpu-sm-copy-bw/htod:20": 20.3931, "gpu-sm-copy-bw/htod:21": 20.1013, "gpu-sm-copy-bw/htod:22": 23.9865, "gpu-sm-copy-bw/htod:23": 23.972, "gpu-sm-copy-bw/htod:24": 23.9569, "gpu-sm-copy-bw/htod:25": 23.9762, "gpu-sm-copy-bw/htod:26": 19.92, "gpu-sm-copy-bw/htod:27": 20.2638, "gpu-sm-copy-bw/htod:28": 20.4419, "gpu-sm-copy-bw/htod:29": 20.1054, "gpu-sm-copy-bw/htod:30": 23.9752, "gpu-sm-copy-bw/htod:31": 23.9631, "ib-loopback/IB_write_512_Avg_0:0": 1492.89, "ib-loopback/IB_write_1024_Avg_0:0": 3224.92, "ib-loopback/IB_write_2048_Avg_0:0": 6714.5, "ib-loopback/IB_write_4096_Avg_0:0": 12871.93, "ib-loopback/IB_write_8192_Avg_0:0": 19990.78, "ib-loopback/IB_write_16384_Avg_0:0": 22172.25, "ib-loopback/IB_write_32768_Avg_0:0": 23073.19, "ib-loopback/IB_write_65536_Avg_0:0": 23527.09, "ib-loopback/IB_write_131072_Avg_0:0": 23805.92, "ib-loopback/IB_write_262144_Avg_0:0": 23380.51, "ib-loopback/IB_write_524288_Avg_0:0": 23856.63, "ib-loopback/IB_write_1048576_Avg_0:0": 23869.58, "ib-loopback/IB_write_2097152_Avg_0:0": 23885.7, "ib-loopback/IB_write_4194304_Avg_0:0": 23914.56, "ib-loopback/IB_write_8388608_Avg_0:0": 23935.21, "ib-loopback/IB_write_512_Avg_1:0": 1479.1, "ib-loopback/IB_write_1024_Avg_1:0": 3246.48, "ib-loopback/IB_write_2048_Avg_1:0": 6754.94, "ib-loopback/IB_write_4096_Avg_1:0": 13101.09, "ib-loopback/IB_write_8192_Avg_1:0": 19945.81, "ib-loopback/IB_write_16384_Avg_1:0": 22631.62, "ib-loopback/IB_write_32768_Avg_1:0": 23381.58, "ib-loopback/IB_write_65536_Avg_1:0": 23515.29, "ib-loopback/IB_write_131072_Avg_1:0": 23756.01, "ib-loopback/IB_write_262144_Avg_1:0": 23875.91, "ib-loopback/IB_write_524288_Avg_1:0": 23911.13, "ib-loopback/IB_write_1048576_Avg_1:0": 23935.42, "ib-loopback/IB_write_2097152_Avg_1:0": 23941.25, "ib-loopback/IB_write_4194304_Avg_1:0": 23922.06, "ib-loopback/IB_write_8388608_Avg_1:0": 23928.59, "ib-loopback/IB_write_512_Avg_2:0": 1505.29, "ib-loopback/IB_write_1024_Avg_2:0": 3215.97, "ib-loopback/IB_write_2048_Avg_2:0": 6745.49, "ib-loopback/IB_write_4096_Avg_2:0": 11548.88, "ib-loopback/IB_write_8192_Avg_2:0": 19432.15, "ib-loopback/IB_write_16384_Avg_2:0": 22765.51, "ib-loopback/IB_write_32768_Avg_2:0": 23235.07, "ib-loopback/IB_write_65536_Avg_2:0": 23620.08, "ib-loopback/IB_write_131072_Avg_2:0": 23759.08, "ib-loopback/IB_write_262144_Avg_2:0": 23859.82, "ib-loopback/IB_write_524288_Avg_2:0": 23775.01, "ib-loopback/IB_write_1048576_Avg_2:0": 23885.96, "ib-loopback/IB_write_2097152_Avg_2:0": 23894.73, "ib-loopback/IB_write_4194304_Avg_2:0": 23909.98, "ib-loopback/IB_write_8388608_Avg_2:0": 23927.21, "ib-loopback/IB_write_512_Avg_3:0": 1474.47, "ib-loopback/IB_write_1024_Avg_3:0": 3344.68, "ib-loopback/IB_write_2048_Avg_3:0": 6606.35, "ib-loopback/IB_write_4096_Avg_3:0": 12071.4, "ib-loopback/IB_write_8192_Avg_3:0": 18408.33, "ib-loopback/IB_write_16384_Avg_3:0": 20789.27, "ib-loopback/IB_write_32768_Avg_3:0": 22469.43, "ib-loopback/IB_write_65536_Avg_3:0": 22777.1, "ib-loopback/IB_write_131072_Avg_3:0": 23461.03, "ib-loopback/IB_write_262144_Avg_3:0": 23397.19, "ib-loopback/IB_write_524288_Avg_3:0": 23526.55, "ib-loopback/IB_write_1048576_Avg_3:0": 23854.76, "ib-loopback/IB_write_2097152_Avg_3:0": 23862.23, "ib-loopback/IB_write_4194304_Avg_3:0": 23931.15, "ib-loopback/IB_write_8388608_Avg_3:0": 23924.44, "ib-loopback/IB_write_512_Avg_4:0": 1523.33, "ib-loopback/IB_write_1024_Avg_4:0": 3233.23, "ib-loopback/IB_write_2048_Avg_4:0": 6792.88, "ib-loopback/IB_write_4096_Avg_4:0": 12616.05, "ib-loopback/IB_write_8192_Avg_4:0": 19324.05, "ib-loopback/IB_write_16384_Avg_4:0": 22082.51, "ib-loopback/IB_write_32768_Avg_4:0": 23294.23, "ib-loopback/IB_write_65536_Avg_4:0": 23546.22, "ib-loopback/IB_write_131072_Avg_4:0": 23727.91, "ib-loopback/IB_write_262144_Avg_4:0": 23843.93, "ib-loopback/IB_write_524288_Avg_4:0": 23905.96, "ib-loopback/IB_write_1048576_Avg_4:0": 23902.37, "ib-loopback/IB_write_2097152_Avg_4:0": 23921.03, "ib-loopback/IB_write_4194304_Avg_4:0": 23921.06, "ib-loopback/IB_write_8388608_Avg_4:0": 23922.4, "ib-loopback/IB_write_512_Avg_5:0": 1506.63, "ib-loopback/IB_write_1024_Avg_5:0": 3261.71, "ib-loopback/IB_write_2048_Avg_5:0": 6752.38, "ib-loopback/IB_write_4096_Avg_5:0": 13592.41, "ib-loopback/IB_write_8192_Avg_5:0": 19989.31, "ib-loopback/IB_write_16384_Avg_5:0": 22560.57, "ib-loopback/IB_write_32768_Avg_5:0": 23389.29, "ib-loopback/IB_write_65536_Avg_5:0": 23503.32, "ib-loopback/IB_write_131072_Avg_5:0": 23741.51, "ib-loopback/IB_write_262144_Avg_5:0": 23866.43, "ib-loopback/IB_write_524288_Avg_5:0": 23898.95, "ib-loopback/IB_write_1048576_Avg_5:0": 23876.36, "ib-loopback/IB_write_2097152_Avg_5:0": 23919.63, "ib-loopback/IB_write_4194304_Avg_5:0": 23924.68, "ib-loopback/IB_write_8388608_Avg_5:0": 23930.37, "ib-loopback/IB_write_512_Avg_6:0": 1467.69, "ib-loopback/IB_write_1024_Avg_6:0": 3157.04, "ib-loopback/IB_write_2048_Avg_6:0": 6494.61, "ib-loopback/IB_write_4096_Avg_6:0": 12883.51, "ib-loopback/IB_write_8192_Avg_6:0": 19207.67, "ib-loopback/IB_write_16384_Avg_6:0": 22519.39, "ib-loopback/IB_write_32768_Avg_6:0": 23323.46, "ib-loopback/IB_write_65536_Avg_6:0": 23523.6, "ib-loopback/IB_write_131072_Avg_6:0": 23626.67, "ib-loopback/IB_write_262144_Avg_6:0": 23836.99, "ib-loopback/IB_write_524288_Avg_6:0": 23904.51, "ib-loopback/IB_write_1048576_Avg_6:0": 23919.07, "ib-loopback/IB_write_2097152_Avg_6:0": 23943.82, "ib-loopback/IB_write_4194304_Avg_6:0": 23936.77, "ib-loopback/IB_write_8388608_Avg_6:0": 23941.57, "ib-loopback/IB_write_512_Avg_7:0": 1505.63, "ib-loopback/IB_write_1024_Avg_7:0": 3259.93, "ib-loopback/IB_write_2048_Avg_7:0": 6738.6, "ib-loopback/IB_write_4096_Avg_7:0": 13352.06, "ib-loopback/IB_write_8192_Avg_7:0": 19941.35, "ib-loopback/IB_write_16384_Avg_7:0": 22566.09, "ib-loopback/IB_write_32768_Avg_7:0": 23244.77, "ib-loopback/IB_write_65536_Avg_7:0": 23377.67, "ib-loopback/IB_write_131072_Avg_7:0": 23736.17, "ib-loopback/IB_write_262144_Avg_7:0": 23829.25, "ib-loopback/IB_write_524288_Avg_7:0": 23879.6, "ib-loopback/IB_write_1048576_Avg_7:0": 23895.1, "ib-loopback/IB_write_2097152_Avg_7:0": 23930.64, "ib-loopback/IB_write_4194304_Avg_7:0": 23845.63, "ib-loopback/IB_write_8388608_Avg_7:0": 23896.94, "kernel-launch/return_code":0, "kernel-launch/event_overhead:0": 0.00596, "kernel-launch/event_overhead:1": 0.00595, "kernel-launch/event_overhead:2": 0.00557, "kernel-launch/event_overhead:3": 0.0055, "kernel-launch/event_overhead:4": 0.00592, "kernel-launch/event_overhead:5": 0.00589, "kernel-launch/event_overhead:6": 0.00572, "kernel-launch/event_overhead:7": 0.0059, "kernel-launch/wall_overhead:0": 0.01026, "kernel-launch/wall_overhead:1": 0.01026, "kernel-launch/wall_overhead:2": 0.01046, "kernel-launch/wall_overhead:3": 0.01049, "kernel-launch/wall_overhead:4": 0.01063, "kernel-launch/wall_overhead:5": 0.01006, "kernel-launch/wall_overhead:6": 0.01045, "kernel-launch/wall_overhead:7": 0.01071, "lstm_models/pytorch-lstm/steptime_train_float32": 48.07024518959224, "lstm_models/pytorch-lstm/throughput_train_float32": 4806.472441132788, "lstm_models/pytorch-lstm/steptime_train_float16": 25.95312986522913, "lstm_models/pytorch-lstm/throughput_train_float16": 9069.90809255883, "pytorch-matmul/nosharding": 34.64499759674072, "mem-bw/return_code":1, "nccl-bw/allreduce_8_busbw:0": 0.0, "nccl-bw/allreduce_8_algbw:0": 0.0, "nccl-bw/allreduce_8_time:0": 37.84, "nccl-bw/allreduce_16_busbw:0": 0.0, "nccl-bw/allreduce_16_algbw:0": 0.0, "nccl-bw/allreduce_16_time:0": 36.42, "nccl-bw/allreduce_32_busbw:0": 0.0, "nccl-bw/allreduce_32_algbw:0": 0.0, "nccl-bw/allreduce_32_time:0": 36.87, "nccl-bw/allreduce_64_busbw:0": 0.0, "nccl-bw/allreduce_64_algbw:0": 0.0, "nccl-bw/allreduce_64_time:0": 35.83, "nccl-bw/allreduce_128_busbw:0": 0.01, "nccl-bw/allreduce_128_algbw:0": 0.0, "nccl-bw/allreduce_128_time:0": 36.91, "nccl-bw/allreduce_256_busbw:0": 0.01, "nccl-bw/allreduce_256_algbw:0": 0.01, "nccl-bw/allreduce_256_time:0": 37.58, "nccl-bw/allreduce_512_busbw:0": 0.02, "nccl-bw/allreduce_512_algbw:0": 0.01, "nccl-bw/allreduce_512_time:0": 36.98, "nccl-bw/allreduce_1024_busbw:0": 0.05, "nccl-bw/allreduce_1024_algbw:0": 0.03, "nccl-bw/allreduce_1024_time:0": 36.93, "nccl-bw/allreduce_2048_busbw:0": 0.1, "nccl-bw/allreduce_2048_algbw:0": 0.06, "nccl-bw/allreduce_2048_time:0": 36.06, "nccl-bw/allreduce_4096_busbw:0": 0.19, "nccl-bw/allreduce_4096_algbw:0": 0.11, "nccl-bw/allreduce_4096_time:0": 37.2, "nccl-bw/allreduce_8192_busbw:0": 0.39, "nccl-bw/allreduce_8192_algbw:0": 0.22, "nccl-bw/allreduce_8192_time:0": 37.04, "nccl-bw/allreduce_16384_busbw:0": 0.77, "nccl-bw/allreduce_16384_algbw:0": 0.44, "nccl-bw/allreduce_16384_time:0": 37.46, "nccl-bw/allreduce_32768_busbw:0": 1.52, "nccl-bw/allreduce_32768_algbw:0": 0.87, "nccl-bw/allreduce_32768_time:0": 37.64, "nccl-bw/allreduce_65536_busbw:0": 3.0, "nccl-bw/allreduce_65536_algbw:0": 1.71, "nccl-bw/allreduce_65536_time:0": 38.22, "nccl-bw/allreduce_131072_busbw:0": 5.31, "nccl-bw/allreduce_131072_algbw:0": 3.04, "nccl-bw/allreduce_131072_time:0": 43.17, "nccl-bw/allreduce_262144_busbw:0": 9.5, "nccl-bw/allreduce_262144_algbw:0": 5.43, "nccl-bw/allreduce_262144_time:0": 48.29, "nccl-bw/allreduce_524288_busbw:0": 15.11, "nccl-bw/allreduce_524288_algbw:0": 8.64, "nccl-bw/allreduce_524288_time:0": 60.71, "nccl-bw/allreduce_1048576_busbw:0": 24.1, "nccl-bw/allreduce_1048576_algbw:0": 13.77, "nccl-bw/allreduce_1048576_time:0": 76.13, "nccl-bw/allreduce_2097152_busbw:0": 38.12, "nccl-bw/allreduce_2097152_algbw:0": 21.78, "nccl-bw/allreduce_2097152_time:0": 96.28, "nccl-bw/allreduce_4194304_busbw:0": 65.75, "nccl-bw/allreduce_4194304_algbw:0": 37.57, "nccl-bw/allreduce_4194304_time:0": 111.6, "nccl-bw/allreduce_8388608_busbw:0": 89.51, "nccl-bw/allreduce_8388608_algbw:0": 51.15, "nccl-bw/allreduce_8388608_time:0": 164.0, "nccl-bw/allreduce_16777216_busbw:0": 114.38, "nccl-bw/allreduce_16777216_algbw:0": 65.36, "nccl-bw/allreduce_16777216_time:0": 256.7, "nccl-bw/allreduce_33554432_busbw:0": 154.89, "nccl-bw/allreduce_33554432_algbw:0": 88.51, "nccl-bw/allreduce_33554432_time:0": 379.1, "nccl-bw/allreduce_67108864_busbw:0": 200.01, "nccl-bw/allreduce_67108864_algbw:0": 114.29, "nccl-bw/allreduce_67108864_time:0": 587.2, "nccl-bw/allreduce_134217728_busbw:0": 202.97, "nccl-bw/allreduce_134217728_algbw:0": 115.98, "nccl-bw/allreduce_134217728_time:0": 1157.2, "nccl-bw/allreduce_268435456_busbw:0": 221.82, "nccl-bw/allreduce_268435456_algbw:0": 126.75, "nccl-bw/allreduce_268435456_time:0": 2117.8, "nccl-bw/allreduce_536870912_busbw:0": 224.54, "nccl-bw/allreduce_536870912_algbw:0": 128.31, "nccl-bw/allreduce_536870912_time:0": 4184.2, "nccl-bw/allreduce_1073741824_busbw:0": 230.15, "nccl-bw/allreduce_1073741824_algbw:0": 131.51, "nccl-bw/allreduce_1073741824_time:0": 8164.5, "nccl-bw/allreduce_2147483648_busbw:0": 231.89, "nccl-bw/allreduce_2147483648_algbw:0": 132.51, "nccl-bw/allreduce_2147483648_time:0": 16207.0, "nccl-bw/allreduce_4294967296_busbw:0": 234.45, "nccl-bw/allreduce_4294967296_algbw:0": 133.97, "nccl-bw/allreduce_4294967296_time:0": 32059.0, "nccl-bw/allreduce_8589934592_busbw:0": 235.36, "nccl-bw/allreduce_8589934592_algbw:0": 134.49, "nccl-bw/allreduce_8589934592_time:0": 63870.0, "resnet_models/pytorch-resnet50/steptime_train_float32": 253.95522732287645, "resnet_models/pytorch-resnet50/throughput_train_float32": 760.3348099129964, "resnet_models/pytorch-resnet50/steptime_train_float16": 200.08606184273958, "resnet_models/pytorch-resnet50/throughput_train_float16": 971.0651430922575, "resnet_models/pytorch-resnet101/steptime_train_float32": 389.08605091273785, "resnet_models/pytorch-resnet101/throughput_train_float32": 496.11747409298965, "resnet_models/pytorch-resnet101/steptime_train_float16": 308.6274107918143, "resnet_models/pytorch-resnet101/throughput_train_float16": 627.2056272195069, "resnet_models/pytorch-resnet152/steptime_train_float32": 547.6558278314769, "resnet_models/pytorch-resnet152/throughput_train_float32": 352.07099543348215, "resnet_models/pytorch-resnet152/steptime_train_float16": 424.5809856802225, "resnet_models/pytorch-resnet152/throughput_train_float16": 454.8335998153649, "pytorch-sharding-matmul/allreduce": 10.574411869049072, "pytorch-sharding-matmul/allgather": 10.084696769714355, "vgg_models/pytorch-vgg11/steptime_train_float32": 40.35283671692014, "vgg_models/pytorch-vgg11/throughput_train_float32": 796.3615936949874, "vgg_models/pytorch-vgg11/steptime_train_float16": 24.133514845743775, "vgg_models/pytorch-vgg11/throughput_train_float16": 1330.411361458461, "vgg_models/pytorch-vgg13/steptime_train_float32": 55.466310936026275, "vgg_models/pytorch-vgg13/throughput_train_float32": 580.234107444399, "vgg_models/pytorch-vgg13/steptime_train_float16": 33.35228993091732, "vgg_models/pytorch-vgg13/throughput_train_float16": 962.5332023901524, "vgg_models/pytorch-vgg16/steptime_train_float32": 65.22519944701344, "vgg_models/pytorch-vgg16/throughput_train_float32": 493.4268638875934, "vgg_models/pytorch-vgg16/steptime_train_float16": 39.25287735182792, "vgg_models/pytorch-vgg16/throughput_train_float16": 817.2008546147621, "vgg_models/pytorch-vgg19/steptime_train_float32": 74.93487105239183, "vgg_models/pytorch-vgg19/throughput_train_float32": 429.80921583106164, "vgg_models/pytorch-vgg19/steptime_train_float16": 45.20330624654889, "vgg_models/pytorch-vgg19/throughput_train_float16": 709.1127328377091}
\ No newline at end of file
...@@ -99,21 +99,101 @@ def test_rule_op(self): ...@@ -99,21 +99,101 @@ def test_rule_op(self):
# variance # variance
data = {'kernel-launch/event_overhead:0': 3.1, 'kernel-launch/event_overhead:1': 2} data = {'kernel-launch/event_overhead:0': 3.1, 'kernel-launch/event_overhead:1': 2}
data_row = pd.Series(data) data_row = pd.Series(data)
pass_rule = rule_op(data_row, true_baselines[0], summary_data_row, details, categories) violated_metric_num = rule_op(data_row, true_baselines[0], summary_data_row, details, categories)
assert (not pass_rule) assert (violated_metric_num == 1)
assert (categories == {'KernelLaunch'}) assert (categories == {'KernelLaunch'})
assert (details == ['kernel-launch/event_overhead:0(B/L: 2.0000 VAL: 3.1000 VAR: 55.00% Rule:lambda x:x>0.5)']) assert (details == ['kernel-launch/event_overhead:0(B/L: 2.0000 VAL: 3.1000 VAR: 55.00% Rule:lambda x:x>0.5)'])
data = {'kernel-launch/event_overhead:0': 1.5, 'kernel-launch/event_overhead:1': 1.5} data = {'kernel-launch/event_overhead:0': 1.5, 'kernel-launch/event_overhead:1': 1.5}
data_row = pd.Series(data) data_row = pd.Series(data)
pass_rule = rule_op(data_row, true_baselines[1], summary_data_row, details, categories) violated_metric_num = rule_op(data_row, true_baselines[1], summary_data_row, details, categories)
assert (pass_rule) assert (violated_metric_num == 0)
assert (categories == {'KernelLaunch'}) assert (categories == {'KernelLaunch'})
# value # value
rule_op = RuleOp.get_rule_func(DiagnosisRuleType.VALUE) rule_op = RuleOp.get_rule_func(DiagnosisRuleType.VALUE)
pass_rule = rule_op(data_row, true_baselines[2], summary_data_row, details, categories) violated_metric_num = rule_op(data_row, true_baselines[2], summary_data_row, details, categories)
assert (not pass_rule)
assert (categories == {'KernelLaunch', 'KernelLaunch2'}) assert (categories == {'KernelLaunch', 'KernelLaunch2'})
assert ('kernel-launch/event_overhead:0(VAL: 1.5000 Rule:lambda x:x>0)' in details) assert ('kernel-launch/event_overhead:0(VAL: 1.5000 Rule:lambda x:x>0)' in details)
assert ('kernel-launch/event_overhead:0(B/L: 2.0000 VAL: 3.1000 VAR: 55.00% Rule:lambda x:x>0.5)' in details) assert ('kernel-launch/event_overhead:0(B/L: 2.0000 VAL: 3.1000 VAR: 55.00% Rule:lambda x:x>0.5)' in details)
def test_multi_rules_op(self):
"""multi-rule check."""
details = []
categories = set()
data_row = pd.Series()
summary_data_row = pd.Series(index=['kernel-launch/event_overhead:0'], dtype=float)
false_baselines = [
{
'categories': 'KernelLaunch',
'criteria': 'lambda label:True if label["rule2"]>=2 else False',
'function': 'multi_rules'
}
]
label = {}
for rule in false_baselines:
self.assertRaises(Exception, RuleOp.multi_rules, rule, details, categories, label)
true_baselines = [
{
'name': 'rule1',
'categories': 'CNN',
'criteria': 'lambda x:x<-0.5',
'store': True,
'function': 'variance',
'metrics': {
'resnet_models/pytorch-resnet152/throughput_train_float32': 300,
}
}, {
'name': 'rule2',
'categories': 'CNN',
'criteria': 'lambda x:x<-0.5',
'store': True,
'function': 'variance',
'metrics': {
'vgg_models/pytorch-vgg11/throughput_train_float32': 300
}
}, {
'name': 'rule3',
'categories': 'KernelLaunch',
'criteria': 'lambda label:True if label["rule1"]+label["rule2"]>=2 else False',
'store': False,
'function': 'multi_rules'
}
]
# label["rule1"]+label["rule2"]=1, rule3 pass
data = {
'resnet_models/pytorch-resnet152/throughput_train_float32': 300,
'vgg_models/pytorch-vgg11/throughput_train_float32': 100
}
data_row = pd.Series(data)
rule_op = RuleOp.get_rule_func(DiagnosisRuleType(true_baselines[0]['function']))
label[true_baselines[0]['name']] = rule_op(data_row, true_baselines[0], summary_data_row, details, categories)
label[true_baselines[1]['name']] = rule_op(data_row, true_baselines[1], summary_data_row, details, categories)
rule_op = RuleOp.get_rule_func(DiagnosisRuleType(true_baselines[2]['function']))
violated_metric_num = rule_op(true_baselines[2], details, categories, label)
assert (violated_metric_num == 0)
# label["rule1"]+label["rule2"]=2, rule3 not pass
data = {
'resnet_models/pytorch-resnet152/throughput_train_float32': 100,
'vgg_models/pytorch-vgg11/throughput_train_float32': 100
}
data_row = pd.Series(data)
details = []
categories = set()
rule_op = RuleOp.get_rule_func(DiagnosisRuleType(true_baselines[0]['function']))
label[true_baselines[0]['name']] = rule_op(data_row, true_baselines[0], summary_data_row, details, categories)
label[true_baselines[1]['name']] = rule_op(data_row, true_baselines[1], summary_data_row, details, categories)
rule_op = RuleOp.get_rule_func(DiagnosisRuleType(true_baselines[2]['function']))
violated_metric_num = rule_op(true_baselines[2], details, categories, label)
assert (violated_metric_num)
assert ('CNN' in categories)
assert (
details == [
'resnet_models/pytorch-resnet152/throughput_train_float32' +
'(B/L: 300.0000 VAL: 100.0000 VAR: -66.67% Rule:lambda x:x<-0.5)',
'vgg_models/pytorch-vgg11/throughput_train_float32' +
'(B/L: 300.0000 VAL: 100.0000 VAR: -66.67% Rule:lambda x:x<-0.5)',
'rule3:lambda label:True if label["rule1"]+label["rule2"]>=2 else False'
]
)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment