1. 02 Sep, 2022 1 commit
    • Yuting Jiang's avatar
      Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399) · db842892
      Yuting Jiang authored
      **Description**
       Make baseline check optional in data diagnosis and fix bugs.
      
      **Major Revision**
      - make baseline file optional in data diagnosis
      - fix bugs of output in md and excel format when 'function' is not in the rule
      - fix bug in multi_rules function that miss/failed test may failed the whole process
      
      **Minor Revision**
      - revise doc related with data diagnosis
      - resolve warning message about baseline not found check, only raise exception if baseline not found in the  'variance' function
      - move summary fields into top of json file
      - unify 'Index','machine' -> 'index' in output file
      db842892
  2. 23 Aug, 2022 1 commit
    • Yuting Jiang's avatar
      Analyzer - Add support to store values of metrics in data diagnosis (#392) · 733860d7
      Yuting Jiang authored
      **Description**
      Add support to store values of metrics in data diagnosis.
      
      Take the following rules as example: 
      ```
          nccl_store_rule:
            categories: NCCL_DIS
            store: True
            metrics:
              - nccl-bw:allreduce-run0/allreduce_1073741824_busbw
              - nccl-bw:allreduce-run1/allreduce_1073741824_busbw
              - nccl-bw:allreduce-run2/allreduce_1073741824_busbw
              - nccl-bw:allreduce-run3/allreduce_1073741824_busbw
              - nccl-bw:allreduce-run4/allreduce_1073741824_busbw
          nccl_rule:
            function: multi_rules
            criteria: 'lambda label:True if min(label["nccl_store_rule"].values())/max(label["nccl_store_rule"].values())<0.95 else False'
            categories: NCCL_DIS
      ```
      **nccl_store_rule** will store the values of the metrics in dict and save them into `label["nccl_store_rule"]` , and then **rccl_rule** can use the values of metrics through `label["nccl_store_rule"].values()` in criteria
      733860d7
  3. 01 Aug, 2022 1 commit
    • Yuting Jiang's avatar
      Analyzer - Add failure check feature in data diagnosis (#378) · ec16d425
      Yuting Jiang authored
      **Description**
      Add failure check feature in data diagnosis.
      
      **Major Revision**
      - Add failure check rule op to support that if there exists metric_regex not been matched by any metric in result, label as failedtest
      - Split performance issue and failedtest in categories
      
      
      **Minor Revision**
      - replace DataFrame.append() with pd.concat since append() will be removed in later version of pandas
      ec16d425
  4. 20 Feb, 2022 1 commit
  5. 08 Dec, 2021 1 commit