sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
using world size: 1024, data-parallel size: 8, context-parallel size: 2, hierarchical context-parallel sizes: Nonetensor-model-parallel size: 4, encoder-tensor-model-parallel size: 0, pipeline-model-parallel size: 16, encoder-pipeline-model-parallel size: 0
WARNING: overriding default arguments for tokenizer_type:GPT2BPETokenizer                        with tokenizer_type:Llama2Tokenizer
WARNING: Setting args.overlap_p2p_comm and args.align_param_gather to False since non-interleaved schedule does not support overlapping p2p communication and aligned param AG
Number of virtual stages per pipeline stage: None
accumulate and all-reduce gradients in fp32 for bfloat16 data type.
using torch.bfloat16 for parameters ...
------------------------ arguments ------------------------
  account_for_embedding_in_pipeline_split ......... False
  account_for_loss_in_pipeline_split .............. False
  accumulate_allreduce_grads_in_fp32 .............. True
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  add_bias_linear ................................. False
  add_position_embedding .......................... False
  add_qkv_bias .................................... False
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  align_grad_reduce ............................... True
  align_param_gather .............................. False
  app_tag_run_name ................................ None
  app_tag_run_version ............................. 0.0.0
  apply_layernorm_1p .............................. False
  apply_query_key_layer_scaling ................... False
  apply_residual_connection_post_layernorm ........ False
  apply_rope_fusion ............................... True
  async_save ...................................... None
  async_tensor_model_parallel_allreduce ........... True
  attention_backend ............................... AttnBackend.auto
  attention_dropout ............................... 0.0
  attention_softmax_in_fp32 ....................... False
  auto_detect_ckpt_format ......................... False
  barrier_with_L1_time ............................ True
  bert_binary_head ................................ True
  bert_embedder_type .............................. megatron
  bert_load ....................................... None
  bf16 ............................................ True
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ False
  bias_swiglu_fusion .............................. True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  calc_ft_timeouts ................................ False
  calculate_per_token_loss ........................ False
  check_for_large_grads ........................... False
  check_for_nan_in_loss_and_grad .................. True
  check_for_spiky_loss ............................ False
  check_weight_hash_across_dp_replicas_interval ... None
  ckpt_assume_constant_structure .................. False
  ckpt_convert_format ............................. None
  ckpt_convert_save ............................... None
  ckpt_convert_update_legacy_dist_opt_format ...... False
  ckpt_format ..................................... torch
  ckpt_fully_parallel_load ........................ False
  ckpt_fully_parallel_save ........................ True
  ckpt_fully_parallel_save_deprecated ............. False
  ckpt_step ....................................... None
  classes_fraction ................................ 1.0
  clip_grad ....................................... 1.0
  clone_scatter_output_in_embedding ............... True
  config_logger_dir ............................... 
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  context_parallel_size ........................... 2
  cp_comm_type .................................... ['p2p']
  create_attention_mask_in_dataloader ............. True
  cross_entropy_loss_fusion ....................... False
  cuda_graph_scope ................................ full
  cuda_graph_warmup_steps ......................... 3
  data_args_path .................................. None
  data_cache_path ................................. None
  data_parallel_random_init ....................... False
  data_parallel_sharding_strategy ................. no_shard
  data_parallel_size .............................. 8
  data_path ....................................... ['/public/home/xingjl/datasets/gpt_dataset/redpajama_text_document']
  data_per_class_fraction ......................... 1.0
  data_sharding ................................... True
  dataloader_type ................................. single
  ddp_average_in_collective ....................... False
  ddp_bucket_size ................................. None
  ddp_num_buckets ................................. None
  ddp_pad_buckets_for_high_nccl_busbw ............. False
  decoder_first_pipeline_num_layers ............... None
  decoder_last_pipeline_num_layers ................ None
  decoder_num_layers .............................. None
  decoder_seq_length .............................. None
  decoupled_lr .................................... None
  decoupled_min_lr ................................ None
  decrease_batch_size_if_needed ................... False
  defer_embedding_wgrad_compute ................... False
  deprecated_use_mcore_models ..................... True
  deterministic_mode .............................. False
  dino_bottleneck_size ............................ 256
  dino_freeze_last_layer .......................... 1
  dino_head_hidden_size ........................... 2048
  dino_local_crops_number ......................... 10
  dino_local_img_size ............................. 96
  dino_norm_last_layer ............................ False
  dino_teacher_temp ............................... 0.07
  dino_warmup_teacher_temp ........................ 0.04
  dino_warmup_teacher_temp_epochs ................. 30
  disable_straggler_on_startup .................... False
  dist_ckpt_format_deprecated ..................... None
  dist_ckpt_strictness ............................ assume_ok_unexpected
  dist_url ........................................ tcp://node001:25900
  distribute_saved_activations .................... False
  distributed_backend ............................. nccl
  distributed_timeout_minutes ..................... 10
  embedding_path .................................. None
  empty_unused_memory_level ....................... 0
  enable_cuda_graph ............................... False
  enable_ft_package ............................... False
  enable_gloo_process_groups ...................... True
  enable_one_logger ............................... True
  encoder_num_layers .............................. 64
  encoder_pipeline_model_parallel_size ............ 0
  encoder_seq_length .............................. 8192
  encoder_tensor_model_parallel_size .............. 0
  end_weight_decay ................................ 0.1
  eod_mask_loss ................................... False
  error_injection_rate ............................ 0
  error_injection_type ............................ transient_error
  eval_interval ................................... 10000
  eval_iters ...................................... 3
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... None
  exit_interval ................................... None
  exit_on_missing_checkpoint ...................... False
  exit_signal_handler ............................. False
  exp_avg_dtype ................................... torch.float32
  exp_avg_sq_dtype ................................ torch.float32
  expert_model_parallel_size ...................... 16
  expert_tensor_parallel_size ..................... 4
  external_cuda_graph ............................. False
  ffn_hidden_size ................................. 32768
  finetune ........................................ False
  first_last_layers_bf16 .......................... False
  flash_decode .................................... False
  fp16 ............................................ False
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  fp8 ............................................. None
  fp8_amax_compute_algo ........................... most_recent
  fp8_amax_history_len ............................ 1
  fp8_interval .................................... 1
  fp8_margin ...................................... 0
  fp8_param_gather ................................ False
  fp8_recipe ...................................... delayed
  fp8_wgrad ....................................... True
  global_batch_size ............................... 2048
  grad_reduce_in_bf16 ............................. False
  gradient_accumulation_fusion .................... True
  gradient_reduce_div_fusion ...................... True
  group_query_attention ........................... False
  head_lr_mult .................................... 1.0
  hidden_dropout .................................. 0.0
  hidden_size ..................................... 8192
  hierarchical_context_parallel_sizes ............. None
  hybrid_attention_ratio .......................... 0.0
  hybrid_mlp_ratio ................................ 0.0
  hybrid_override_pattern ......................... None
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_h ........................................... 224
  img_w ........................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  inference_batch_times_seqlen_threshold .......... -1
  inference_dynamic_batching ...................... False
  inference_dynamic_batching_buffer_guaranteed_fraction  0.2
  inference_dynamic_batching_buffer_overflow_factor  None
  inference_dynamic_batching_buffer_size_gb ....... 40.0
  inference_dynamic_batching_max_requests_override  None
  inference_dynamic_batching_max_tokens_override .. None
  inference_max_batch_size ........................ 8
  inference_max_seq_length ........................ 2560
  inference_rng_tracker ........................... False
  init_method_std ................................. 0.01
  init_method_xavier_uniform ...................... False
  init_model_with_meta_device ..................... False
  initial_loss_scale .............................. 4294967296
  is_hybrid_model ................................. False
  iter_per_epoch .................................. 1250
  iterations_to_skip .............................. []
  keep_fp8_transpose_cache_when_using_custom_fsdp . False
  kv_channels ..................................... 128
  kv_lora_rank .................................... 32
  lazy_mpu_init ................................... None
  load ............................................ None
  local_rank ...................................... 0
  log_interval .................................... 1
  log_loss_scale_to_tensorboard ................... True
  log_memory_to_tensorboard ....................... False
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_progress .................................... False
  log_straggler ................................... False
  log_throughput .................................. True
  log_timers_to_tensorboard ....................... False
  log_validation_ppl_to_tensorboard ............... False
  log_world_size_to_tensorboard ................... False
  logging_level ................................... None
  loss_scale ...................................... None
  loss_scale_window ............................... 1000
  lr .............................................. 0.0001
  lr_decay_iters .................................. 10000
  lr_decay_samples ................................ None
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_init .................................. 0.0
  lr_warmup_iters ................................. 2000
  lr_warmup_samples ............................... 0
  lr_wsd_decay_iters .............................. None
  lr_wsd_decay_samples ............................ None
  lr_wsd_decay_style .............................. exponential
  main_grads_dtype ................................ torch.float32
  main_params_dtype ............................... torch.float32
  make_vocab_size_divisible_by .................... 128
  mamba_head_dim .................................. 64
  mamba_num_groups ................................ 8
  mamba_state_dim ................................. 128
  manual_gc ....................................... False
  manual_gc_eval .................................. True
  manual_gc_interval .............................. 0
  mask_factor ..................................... 1.0
  mask_prob ....................................... 0.15
  mask_type ....................................... random
  masked_softmax_fusion ........................... False
  max_position_embeddings ......................... 32768
  max_tokens_to_oom ............................... 12000
  memory_snapshot_path ............................ snapshot.pickle
  merge_file ...................................... None
  micro_batch_size ................................ 1
  microbatch_group_size_per_vp_stage .............. None
  min_loss_scale .................................. 1.0
  min_lr .......................................... 1e-07
  mmap_bin_files .................................. True
  mock_data ....................................... False
  moe_aux_loss_coeff .............................. 0.01
  moe_enable_deepep ............................... False
  moe_expert_capacity_factor ...................... 0.5
  moe_extended_tp ................................. False
  moe_ffn_hidden_size ............................. 32768
  moe_grouped_gemm ................................ False
  moe_input_jitter_eps ............................ None
  moe_layer_freq .................................. 1
  moe_layer_recompute ............................. False
  moe_pad_expert_input_to_capacity ................ True
  moe_per_layer_logging ........................... False
  moe_permute_fusion .............................. False
  moe_router_bias_update_rate ..................... 0.001
  moe_router_dtype ................................ None
  moe_router_enable_expert_bias ................... False
  moe_router_group_topk ........................... None
  moe_router_load_balancing_type .................. aux_loss
  moe_router_num_groups ........................... None
  moe_router_pre_softmax .......................... False
  moe_router_score_function ....................... softmax
  moe_router_topk ................................. 2
  moe_router_topk_scaling_factor .................. None
  moe_shared_expert_intermediate_size ............. None
  moe_shared_expert_overlap ....................... False
  moe_token_dispatcher_type ....................... alltoall
  moe_token_drop_policy ........................... probs
  moe_use_legacy_grouped_gemm ..................... False
  moe_use_upcycling ............................... False
  moe_z_loss_coeff ................................ None
  mrope_section ................................... None
  mscale .......................................... 1.0
  mscale_all_dim .................................. 1.0
  mtp_loss_scaling_factor ......................... 0.1
  mtp_num_layers .................................. None
  multi_latent_attention .......................... False
  nccl_communicator_config_path ................... None
  no_load_optim ................................... True
  no_load_rng ..................................... True
  no_persist_layer_norm ........................... False
  no_save_optim ................................... None
  no_save_rng ..................................... None
  non_persistent_ckpt_type ........................ None
  non_persistent_global_ckpt_dir .................. None
  non_persistent_local_ckpt_algo .................. fully_parallel
  non_persistent_local_ckpt_dir ................... None
  non_persistent_save_interval .................... None
  norm_epsilon .................................... 1e-05
  normalization ................................... RMSNorm
  num_attention_heads ............................. 64
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_dataset_builder_threads ..................... 1
  num_distributed_optimizer_instances ............. 1
  num_experts ..................................... 16
  num_layers ...................................... 64
  num_layers_at_end_in_bf16 ....................... 1
  num_layers_at_start_in_bf16 ..................... 1
  num_layers_per_virtual_pipeline_stage ........... None
  num_query_groups ................................ 1
  num_virtual_stages_per_pipeline_rank ............ None
  num_workers ..................................... 2
  one_logger_async ................................ False
  one_logger_project .............................. megatron-lm
  one_logger_run_name ............................. None
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  optimizer_cpu_offload ........................... False
  optimizer_offload_fraction ...................... 1.0
  output_bert_embeddings .......................... False
  overlap_cpu_optimizer_d2h_h2d ................... False
  overlap_grad_reduce ............................. True
  overlap_p2p_comm ................................ False
  overlap_p2p_comm_warmup_flush ................... False
  overlap_param_gather ............................ True
  overlap_param_gather_with_optimizer_step ........ False
  override_opt_param_scheduler .................... False
  params_dtype .................................... torch.bfloat16
  patch_dim ....................................... 16
  per_split_data_args_path ........................ None
  perform_initialization .......................... True
  pin_cpu_grads ................................... True
  pin_cpu_params .................................. True
  pipeline_model_parallel_comm_backend ............ None
  pipeline_model_parallel_size .................... 16
  pipeline_model_parallel_split_rank .............. None
  position_embedding_type ......................... rope
  pretrained_checkpoint ........................... None
  profile ......................................... False
  profile_dir ..................................... ./
  profile_ranks ................................... [0]
  profile_step_end ................................ 12
  profile_step_start .............................. 10
  q_lora_rank ..................................... None
  qk_head_dim ..................................... 128
  qk_layernorm .................................... False
  qk_pos_emb_head_dim ............................. 64
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... None
  rank ............................................ 0
  recompute_granularity ........................... None
  recompute_method ................................ None
  recompute_num_layers ............................ None
  record_memory_history ........................... False
  relative_attention_max_distance ................. 128
  relative_attention_num_buckets .................. 32
  replication ..................................... False
  replication_factor .............................. 2
  replication_jump ................................ None
  rerun_mode ...................................... disabled
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  result_rejected_tracker_filename ................ None
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  retro_add_retriever ............................. False
  retro_attention_gate ............................ 1
  retro_cyclic_train_iters ........................ None
  retro_encoder_attention_dropout ................. 0.1
  retro_encoder_hidden_dropout .................... 0.1
  retro_encoder_layers ............................ 2
  retro_num_neighbors ............................. 2
  retro_num_retrieved_chunks ...................... 2
  retro_project_dir ............................... None
  retro_verify_neighbor_count ..................... True
  rope_scaling_factor ............................. 8.0
  rotary_base ..................................... 1000000
  rotary_interleaved .............................. False
  rotary_percent .................................. 1.0
  rotary_scaling_factor ........................... 1.0
  rotary_seq_len_interpolation_factor ............. None
  run_workload_inspector_server ................... False
  s3_cache_path ................................... None
  sample_rate ..................................... 1.0
  save ............................................ None
  save_interval ................................... 500000
  scatter_gather_tensors_in_pipeline .............. True
  seed ............................................ 1234
  seq_length ...................................... 8192
  sequence_parallel ............................... True
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  skip_train ...................................... False
  skipped_train_samples ........................... 0
  spec ............................................ None
  split ........................................... 98,2,0
  squared_relu .................................... False
  start_weight_decay .............................. 0.1
  straggler_ctrlr_port ............................ 65535
  straggler_minmax_count .......................... 1
  suggested_communication_unit_size ............... 400000000
  swiglu .......................................... False
  swin_backbone_type .............................. tiny
  te_rng_tracker .................................. False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. ./CKPT/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 1000
  test_data_path .................................. None
  test_mode ....................................... False
  tiktoken_num_special_tokens ..................... 1000
  tiktoken_pattern ................................ None
  tiktoken_special_tokens ......................... None
  timing_log_level ................................ 0
  timing_log_option ............................... minmax
  titles_data_path ................................ None
  tokenizer_model ................................. /public/home/xingjl/datasets/gpt_dataset/tokenizer.model
  tokenizer_type .................................. Llama2Tokenizer
  tp_comm_bootstrap_backend ....................... nccl
  tp_comm_bulk_dgrad .............................. True
  tp_comm_bulk_wgrad .............................. True
  tp_comm_overlap ................................. False
  tp_comm_overlap_ag .............................. True
  tp_comm_overlap_cfg ............................. None
  tp_comm_overlap_rs .............................. True
  tp_comm_overlap_rs_dgrad ........................ False
  tp_comm_split_ag ................................ True
  tp_comm_split_rs ................................ True
  train_data_path ................................. None
  train_iters ..................................... 5
  train_samples ................................... None
  train_sync_interval ............................. None
  transformer_impl ................................ transformer_engine
  transformer_pipeline_model_parallel_size ........ 16
  untie_embeddings_and_output_weights ............. True
  use_checkpoint_args ............................. False
  use_checkpoint_opt_param_scheduler .............. False
  use_cpu_initialization .......................... None
  use_custom_fsdp ................................. False
  use_dist_ckpt ................................... False
  use_dist_ckpt_deprecated ........................ False
  use_distributed_optimizer ....................... True
  use_flash_attn .................................. False
  use_hip_profiler ................................ False
  use_legacy_models ............................... False
  use_mp_args_from_checkpoint_args ................ False
  use_one_sent_docs ............................... False
  use_persistent_ckpt_worker ...................... False
  use_precision_aware_optimizer ................... False
  use_pytorch_profiler ............................ False
  use_ring_exchange_p2p ........................... False
  use_rope_scaling ................................ False
  use_rotary_position_embeddings .................. False
  use_tokenizer_model_from_checkpoint_args ........ True
  use_torch_fsdp2 ................................. False
  use_torch_optimizer_for_cpu_offload ............. False
  use_tp_pp_dp_mapping ............................ False
  v_head_dim ...................................... 128
  valid_data_path ................................. None
  variable_seq_lengths ............................ False
  virtual_pipeline_model_parallel_size ............ None
  vision_backbone_type ............................ vit
  vision_pretraining .............................. False
  vision_pretraining_type ......................... classify
  vocab_extra_ids ................................. 0
  vocab_file ...................................... None
  vocab_size ...................................... None
  wandb_exp_name .................................. 
  wandb_project ................................... 
  wandb_save_dir .................................. 
  weight_decay .................................... 0.1
  weight_decay_incr_style ......................... constant
  wgrad_deferral_limit ............................ 0
  world_size ...................................... 1024
  yaml_cfg ........................................ None
-------------------- end of arguments ---------------------
> building Llama2Tokenizer tokenizer ...
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
 > padded vocab (size: 32000) with 256 dummy tokens (new size: 32256)
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
> initializing torch distributed ...
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
sh: /opt/rocm/llvm/bin/clang++: No such file or directory
Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/dtk/bin/hipconfig line 71.
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
WARNING: TensorBoard writing requested but is not available (are you using PyTorch 1.1.0 or later?), no TensorBoard logs will be written.
WARNING: one_logger package is required to enable e2e metrics tracking. please go to https://confluence.nvidia.com/display/MLWFO/Package+Repositories for details to install it
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
[WARNING  | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode disabled
> initialized tensor model parallel with size 4
> initialized pipeline model parallel with size 16
> setting random seeds to 1234 ...
> compiling dataset index builder ...
make: Entering directory '/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/datasets'
make: Nothing to be done for 'default'.
make: Leaving directory '/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/datasets'
>>> done with dataset index builder. Compilation time: 0.055 seconds
WARNING: constraints for invoking optimized fused softmax kernel are not met. We default back to unfused kernel invocations.
> compiling and loading fused kernels ...
>>> done with compiling and loading fused kernels. Compilation time: 9.275 seconds
time to initialize megatron (seconds): 26.607
[after megatron is initialized] datetime: 2025-04-29 13:52:05 
building GPT model ...
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (2, 13): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 12): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 871956480
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 871956480
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (0, 10): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 11): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (0, 14): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (3, 14): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (2, 14): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (0, 14): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 871956480
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 871956480
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (1, 10): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (1, 15): 871964672
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 871956480
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 871956480
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (2, 13): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 871956480
 > number of parameters on (tensor, pipeline) model parallel rank (3, 11): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 9): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (0, 12): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 11): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 13): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 10): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 13): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 10): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 12): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 15): 871964672
 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 10): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 13): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 13): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 11): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 12): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 13): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 11): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 15): 871964672
 > number of parameters on (tensor, pipeline) model parallel rank (1, 12): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 871956480
 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (3, 15): 871964672
 > number of parameters on (tensor, pipeline) model parallel rank (2, 9): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 11): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (2, 9): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 9): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (3, 15): 871964672
 > number of parameters on (tensor, pipeline) model parallel rank (1, 9): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 10): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 11): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 11): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 13): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 12): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 12): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 9): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 12): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 9): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 9): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 15): 871964672
 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 15): 871964672
 > number of parameters on (tensor, pipeline) model parallel rank (2, 15): 871964672
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 10): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 10): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 14): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (1, 14): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 14): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 14): 805896192
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
 > number of parameters on (tensor, pipeline) model parallel rank (3, 8): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 8): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 8): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (0, 8): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 8): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (1, 8): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (3, 8): 805896192
 > number of parameters on (tensor, pipeline) model parallel rank (2, 8): 805896192
[after model, optimizer, and learning rate scheduler are built] datetime: 2025-04-29 13:52:06 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      10240
    validation: 6144
    test:       6144
> building train, validation, and test datasets for GPT ...
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2025-04-29 13:54:01 
done with setup ...
(min, max) time across ranks (ms):
    model-and-optimizer-setup ......................: (118.45, 231.13)
    train/valid/test-data-iterators-setup ..........: (115613.70, 115767.07)
training ...
Setting rerun_state_machine.current_iteration to 0...
[before the start of training step] datetime: 2025-04-29 13:54:01 
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
[rank13]:W0429 13:54:21.228000 22706215462720 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank14]:W0429 13:54:21.230000 23313449944896 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank13]:W0429 13:54:21.230000 22706215462720 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank15]:W0429 13:54:21.230000 23381296039744 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank14]:W0429 13:54:21.231000 23313449944896 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank15]:W0429 13:54:21.231000 23381296039744 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank12]:W0429 13:54:21.231000 22485468759872 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank53]:W0429 13:54:21.232000 23042043643712 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank54]:W0429 13:54:21.232000 22658028410688 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank55]:W0429 13:54:21.232000 23104896481088 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank12]:W0429 13:54:21.233000 22485468759872 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank53]:W0429 13:54:21.233000 23042043643712 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank52]:W0429 13:54:21.233000 22480712988480 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank54]:W0429 13:54:21.233000 22658028410688 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank55]:W0429 13:54:21.234000 23104896481088 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank52]:W0429 13:54:21.234000 22480712988480 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank23]:W0429 13:54:21.235000 22938088867648 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank21]:W0429 13:54:21.235000 23332464195392 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank22]:W0429 13:54:21.236000 22555352008512 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank23]:W0429 13:54:21.236000 22938088867648 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank20]:W0429 13:54:21.236000 22363135022912 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank21]:W0429 13:54:21.237000 23332464195392 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank22]:W0429 13:54:21.237000 22555352008512 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank20]:W0429 13:54:21.238000 22363135022912 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank41]:W0429 13:54:21.240000 22566937286464 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank9]:W0429 13:54:21.240000 22663587964736 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank37]:W0429 13:54:21.241000 23084348028736 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank42]:W0429 13:54:21.241000 23017199159104 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank11]:W0429 13:54:21.241000 23045636052800 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank41]:W0429 13:54:21.241000 22566937286464 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank17]:W0429 13:54:21.241000 23355753162560 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank25]:W0429 13:54:21.241000 23181093459776 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank9]:W0429 13:54:21.242000 22663587964736 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank24]:W0429 13:54:21.242000 22362361513792 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank40]:W0429 13:54:21.242000 22580976768832 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank43]:W0429 13:54:21.242000 23142359189312 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank37]:W0429 13:54:21.242000 23084348028736 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank42]:W0429 13:54:21.243000 23017199159104 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank38]:W0429 13:54:21.243000 22531696125760 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank11]:W0429 13:54:21.243000 23045636052800 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank25]:W0429 13:54:21.243000 23181093459776 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank17]:W0429 13:54:21.243000 23355753162560 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank27]:W0429 13:54:21.243000 22663514097472 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank8]:W0429 13:54:21.243000 23112552056640 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank39]:W0429 13:54:21.243000 22899500726080 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank35]:W0429 13:54:21.243000 23425161594688 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank6]:W0429 13:54:21.243000 22730029471552 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank10]:W0429 13:54:21.243000 22844050634560 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank24]:W0429 13:54:21.243000 22362361513792 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank31]:W0429 13:54:21.243000 23412359505728 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank40]:W0429 13:54:21.243000 22580976768832 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank26]:W0429 13:54:21.243000 22409523705664 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank43]:W0429 13:54:21.243000 23142359189312 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank29]:W0429 13:54:21.243000 23056213575488 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank38]:W0429 13:54:21.244000 22531696125760 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank27]:W0429 13:54:21.244000 22663514097472 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank7]:W0429 13:54:21.244000 22685919659840 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank6]:W0429 13:54:21.244000 22730029471552 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank39]:W0429 13:54:21.244000 22899500726080 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank35]:W0429 13:54:21.244000 23425161594688 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank8]:W0429 13:54:21.244000 23112552056640 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank31]:W0429 13:54:21.244000 23412359505728 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank18]:W0429 13:54:21.244000 23278861125440 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank10]:W0429 13:54:21.244000 22844050634560 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank19]:W0429 13:54:21.244000 22855786153792 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank30]:W0429 13:54:21.245000 23430462359360 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank26]:W0429 13:54:21.245000 22409523705664 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank5]:W0429 13:54:21.244000 23266467100480 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank36]:W0429 13:54:21.245000 22714495231808 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank29]:W0429 13:54:21.245000 23056213575488 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank33]:W0429 13:54:21.245000 22848592500544 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank28]:W0429 13:54:21.245000 22396638504768 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank32]:W0429 13:54:21.245000 22441118041920 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank7]:W0429 13:54:21.246000 22685919659840 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank45]:W0429 13:54:21.246000 23310030456640 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank18]:W0429 13:54:21.246000 23278861125440 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank19]:W0429 13:54:21.246000 22855786153792 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank5]:W0429 13:54:21.246000 23266467100480 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank30]:W0429 13:54:21.246000 23430462359360 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank4]:W0429 13:54:21.246000 23261080323904 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank36]:W0429 13:54:21.246000 22714495231808 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank16]:W0429 13:54:21.246000 22567734212416 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank33]:W0429 13:54:21.246000 22848592500544 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank28]:W0429 13:54:21.247000 22396638504768 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank32]:W0429 13:54:21.247000 22441118041920 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank34]:W0429 13:54:21.247000 22439896872768 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank45]:W0429 13:54:21.247000 23310030456640 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank4]:W0429 13:54:21.247000 23261080323904 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank16]:W0429 13:54:21.248000 22567734212416 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank47]:W0429 13:54:21.248000 22511631222592 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank46]:W0429 13:54:21.248000 22694495721280 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank34]:W0429 13:54:21.248000 22439896872768 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank59]:W0429 13:54:21.248000 22917974476608 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank44]:W0429 13:54:21.249000 22426641037120 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank50]:W0429 13:54:21.249000 22937738880832 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank47]:W0429 13:54:21.249000 22511631222592 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank46]:W0429 13:54:21.249000 22694495721280 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank59]:W0429 13:54:21.250000 22917974476608 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank44]:W0429 13:54:21.250000 22426641037120 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank50]:W0429 13:54:21.250000 22937738880832 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank48]:W0429 13:54:21.251000 23194721371968 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank48]:W0429 13:54:21.252000 23194721371968 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank62]:W0429 13:54:21.255000 22579835909952 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1]:W0429 13:54:21.255000 23306636531520 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank3]:W0429 13:54:21.255000 23052730869568 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank61]:W0429 13:54:21.256000 22891439089472 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank63]:W0429 13:54:21.256000 22907454621504 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank62]:W0429 13:54:21.256000 22579835909952 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1]:W0429 13:54:21.256000 23306636531520 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank2]:W0429 13:54:21.256000 23172266252096 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank3]:W0429 13:54:21.257000 23052730869568 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank61]:W0429 13:54:21.257000 22891439089472 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank51]:W0429 13:54:21.257000 23020700919616 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank49]:W0429 13:54:21.257000 22819760645952 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank60]:W0429 13:54:21.257000 23167795435328 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank63]:W0429 13:54:21.257000 22907454621504 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank58]:W0429 13:54:21.257000 22985118742336 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank57]:W0429 13:54:21.258000 22528372332352 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank2]:W0429 13:54:21.258000 23172266252096 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank13]:W0429 13:54:21.258000 22706215462720 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank51]:W0429 13:54:21.258000 23020700919616 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank49]:W0429 13:54:21.258000 22819760645952 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank60]:W0429 13:54:21.259000 23167795435328 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank56]:W0429 13:54:21.258000 22596183689024 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank0]:W0429 13:54:21.258000 23372189968192 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank58]:W0429 13:54:21.259000 22985118742336 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank57]:W0429 13:54:21.259000 22528372332352 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank13]:W0429 13:54:21.260000 22706215462720 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank14]:W0429 13:54:21.260000 23313449944896 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank15]:W0429 13:54:21.260000 23381296039744 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank56]:W0429 13:54:21.260000 22596183689024 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank0]:W0429 13:54:21.260000 23372189968192 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank14]:W0429 13:54:21.261000 23313449944896 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank12]:W0429 13:54:21.261000 22485468759872 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank15]:W0429 13:54:21.261000 23381296039744 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank53]:W0429 13:54:21.261000 23042043643712 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank54]:W0429 13:54:21.262000 22658028410688 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank55]:W0429 13:54:21.262000 23104896481088 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank12]:W0429 13:54:21.262000 22485468759872 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank53]:W0429 13:54:21.263000 23042043643712 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank52]:W0429 13:54:21.263000 22480712988480 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank54]:W0429 13:54:21.263000 22658028410688 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank55]:W0429 13:54:21.263000 23104896481088 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank23]:W0429 13:54:21.264000 22938088867648 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank52]:W0429 13:54:21.264000 22480712988480 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank21]:W0429 13:54:21.265000 23332464195392 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank23]:W0429 13:54:21.265000 22938088867648 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank22]:W0429 13:54:21.266000 22555352008512 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank20]:W0429 13:54:21.266000 22363135022912 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank21]:W0429 13:54:21.266000 23332464195392 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank22]:W0429 13:54:21.267000 22555352008512 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank20]:W0429 13:54:21.267000 22363135022912 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank9]:W0429 13:54:21.270000 22663587964736 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank41]:W0429 13:54:21.270000 22566937286464 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank37]:W0429 13:54:21.270000 23084348028736 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank25]:W0429 13:54:21.271000 23181093459776 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank17]:W0429 13:54:21.271000 23355753162560 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank42]:W0429 13:54:21.271000 23017199159104 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank11]:W0429 13:54:21.271000 23045636052800 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank9]:W0429 13:54:21.271000 22663587964736 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank24]:W0429 13:54:21.272000 22362361513792 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank41]:W0429 13:54:21.272000 22566937286464 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank37]:W0429 13:54:21.272000 23084348028736 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank40]:W0429 13:54:21.272000 22580976768832 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank43]:W0429 13:54:21.272000 23142359189312 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank6]:W0429 13:54:21.272000 22730029471552 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank25]:W0429 13:54:21.272000 23181093459776 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank38]:W0429 13:54:21.272000 22531696125760 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank39]:W0429 13:54:21.272000 22899500726080 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank42]:W0429 13:54:21.272000 23017199159104 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank17]:W0429 13:54:21.272000 23355753162560 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank31]:W0429 13:54:21.273000 23412359505728 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank35]:W0429 13:54:21.273000 23425161594688 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank11]:W0429 13:54:21.273000 23045636052800 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank27]:W0429 13:54:21.273000 22663514097472 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank24]:W0429 13:54:21.273000 22362361513792 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank26]:W0429 13:54:21.273000 22409523705664 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank8]:W0429 13:54:21.273000 23112552056640 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank40]:W0429 13:54:21.273000 22580976768832 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank29]:W0429 13:54:21.273000 23056213575488 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank43]:W0429 13:54:21.273000 23142359189312 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank10]:W0429 13:54:21.273000 22844050634560 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank6]:W0429 13:54:21.273000 22730029471552 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank7]:W0429 13:54:21.274000 22685919659840 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank38]:W0429 13:54:21.274000 22531696125760 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank39]:W0429 13:54:21.274000 22899500726080 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank31]:W0429 13:54:21.274000 23412359505728 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank35]:W0429 13:54:21.274000 23425161594688 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank5]:W0429 13:54:21.274000 23266467100480 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank27]:W0429 13:54:21.274000 22663514097472 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank36]:W0429 13:54:21.274000 22714495231808 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank26]:W0429 13:54:21.274000 22409523705664 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank30]:W0429 13:54:21.274000 23430462359360 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank8]:W0429 13:54:21.274000 23112552056640 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank29]:W0429 13:54:21.275000 23056213575488 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank18]:W0429 13:54:21.275000 23278861125440 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank10]:W0429 13:54:21.275000 22844050634560 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank19]:W0429 13:54:21.275000 22855786153792 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank7]:W0429 13:54:21.275000 22685919659840 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank28]:W0429 13:54:21.275000 22396638504768 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank33]:W0429 13:54:21.275000 22848592500544 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank32]:W0429 13:54:21.275000 22441118041920 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank5]:W0429 13:54:21.275000 23266467100480 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank45]:W0429 13:54:21.275000 23310030456640 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank36]:W0429 13:54:21.276000 22714495231808 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank30]:W0429 13:54:21.276000 23430462359360 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank4]:W0429 13:54:21.276000 23261080323904 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank18]:W0429 13:54:21.276000 23278861125440 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank19]:W0429 13:54:21.276000 22855786153792 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank28]:W0429 13:54:21.276000 22396638504768 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank33]:W0429 13:54:21.276000 22848592500544 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank32]:W0429 13:54:21.276000 22441118041920 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank16]:W0429 13:54:21.277000 22567734212416 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank45]:W0429 13:54:21.277000 23310030456640 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank4]:W0429 13:54:21.277000 23261080323904 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank47]:W0429 13:54:21.277000 22511631222592 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank46]:W0429 13:54:21.278000 22694495721280 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank34]:W0429 13:54:21.278000 22439896872768 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank16]:W0429 13:54:21.278000 22567734212416 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank59]:W0429 13:54:21.278000 22917974476608 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank47]:W0429 13:54:21.279000 22511631222592 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank44]:W0429 13:54:21.279000 22426641037120 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank50]:W0429 13:54:21.279000 22937738880832 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank46]:W0429 13:54:21.279000 22694495721280 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank34]:W0429 13:54:21.279000 22439896872768 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank59]:W0429 13:54:21.279000 22917974476608 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank44]:W0429 13:54:21.280000 22426641037120 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank50]:W0429 13:54:21.280000 22937738880832 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank48]:W0429 13:54:21.281000 23194721371968 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank48]:W0429 13:54:21.283000 23194721371968 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank62]:W0429 13:54:21.285000 22579835909952 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1]:W0429 13:54:21.285000 23306636531520 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank61]:W0429 13:54:21.285000 22891439089472 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank3]:W0429 13:54:21.285000 23052730869568 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank63]:W0429 13:54:21.286000 22907454621504 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank62]:W0429 13:54:21.286000 22579835909952 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1]:W0429 13:54:21.286000 23306636531520 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank2]:W0429 13:54:21.286000 23172266252096 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank3]:W0429 13:54:21.287000 23052730869568 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank61]:W0429 13:54:21.287000 22891439089472 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank63]:W0429 13:54:21.287000 22907454621504 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank49]:W0429 13:54:21.287000 22819760645952 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank51]:W0429 13:54:21.287000 23020700919616 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank60]:W0429 13:54:21.287000 23167795435328 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank58]:W0429 13:54:21.287000 22985118742336 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank57]:W0429 13:54:21.288000 22528372332352 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank2]:W0429 13:54:21.288000 23172266252096 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank49]:W0429 13:54:21.288000 22819760645952 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank60]:W0429 13:54:21.288000 23167795435328 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank51]:W0429 13:54:21.288000 23020700919616 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank58]:W0429 13:54:21.288000 22985118742336 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank56]:W0429 13:54:21.289000 22596183689024 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank57]:W0429 13:54:21.289000 22528372332352 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank0]:W0429 13:54:21.289000 23372189968192 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank56]:W0429 13:54:21.290000 22596183689024 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank0]:W0429 13:54:21.291000 23372189968192 torch/fx/experimental/symbolic_shapes.py:4449] [13/0] tmp6 is not in var_ranges, defaulting to unknown range.
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
[rank102]:W0429 13:54:39.005000 22683342313280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank100]:W0429 13:54:39.005000 22747544479552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank103]:W0429 13:54:39.006000 22896949737280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank101]:W0429 13:54:39.006000 22833374287680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank102]:W0429 13:54:39.007000 22683342313280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank100]:W0429 13:54:39.007000 22747544479552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank103]:W0429 13:54:39.007000 22896949737280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank101]:W0429 13:54:39.007000 22833374287680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank109]:W0429 13:54:39.010000 22833432659776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank108]:W0429 13:54:39.010000 22677074855744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank110]:W0429 13:54:39.010000 22915543336768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank109]:W0429 13:54:39.011000 22833432659776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank111]:W0429 13:54:39.011000 23057271183168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank108]:W0429 13:54:39.011000 22677074855744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank110]:W0429 13:54:39.012000 22915543336768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank111]:W0429 13:54:39.012000 23057271183168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank68]:W0429 13:54:39.013000 23271114356544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank70]:W0429 13:54:39.014000 22653057701696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank69]:W0429 13:54:39.014000 22866598868800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank71]:W0429 13:54:39.014000 22610169476928 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank79]:W0429 13:54:39.014000 22460722431808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank77]:W0429 13:54:39.014000 23389024790336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank76]:W0429 13:54:39.015000 22901582837568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank68]:W0429 13:54:39.015000 23271114356544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank78]:W0429 13:54:39.015000 22865390917440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank70]:W0429 13:54:39.015000 22653057701696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank69]:W0429 13:54:39.015000 22866598868800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank71]:W0429 13:54:39.015000 22610169476928 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank79]:W0429 13:54:39.016000 22460722431808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank77]:W0429 13:54:39.016000 23389024790336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank76]:W0429 13:54:39.016000 22901582837568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank78]:W0429 13:54:39.016000 22865390917440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank66]:W0429 13:54:39.019000 22849277118272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank64]:W0429 13:54:39.019000 23170953406272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank65]:W0429 13:54:39.019000 23390719784768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank67]:W0429 13:54:39.019000 22645195020096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank66]:W0429 13:54:39.020000 22849277118272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank64]:W0429 13:54:39.021000 23170953406272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank67]:W0429 13:54:39.021000 22645195020096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank65]:W0429 13:54:39.021000 23390719784768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank73]:W0429 13:54:39.024000 22681645172544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank72]:W0429 13:54:39.025000 22795362211648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank85]:W0429 13:54:39.025000 22440907654976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank75]:W0429 13:54:39.025000 22826330203968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank124]:W0429 13:54:39.025000 23376087562048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank74]:W0429 13:54:39.025000 22750528702272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank80]:W0429 13:54:39.025000 23048164755264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank126]:W0429 13:54:39.025000 22508424849216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank73]:W0429 13:54:39.025000 22681645172544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank122]:W0429 13:54:39.026000 23003642836800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank127]:W0429 13:54:39.026000 22958566483776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank125]:W0429 13:54:39.026000 22667013142336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank105]:W0429 13:54:39.026000 23029667600192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank87]:W0429 13:54:39.026000 23424151160640 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank107]:W0429 13:54:39.026000 22434413557568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank81]:W0429 13:54:39.026000 22691049772864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank72]:W0429 13:54:39.026000 22795362211648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank85]:W0429 13:54:39.026000 22440907654976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank124]:W0429 13:54:39.026000 23376087562048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank83]:W0429 13:54:39.026000 23115430680384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank75]:W0429 13:54:39.026000 22826330203968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank74]:W0429 13:54:39.026000 22750528702272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank123]:W0429 13:54:39.026000 23349361293120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank84]:W0429 13:54:39.026000 22394382317376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank97]:W0429 13:54:39.026000 22994466142016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank82]:W0429 13:54:39.026000 23411502409536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank86]:W0429 13:54:39.027000 23204399363904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank99]:W0429 13:54:39.027000 22631651673920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank121]:W0429 13:54:39.027000 22492902229824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank80]:W0429 13:54:39.027000 23048164755264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank126]:W0429 13:54:39.027000 22508424849216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank120]:W0429 13:54:39.027000 22568908642112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank104]:W0429 13:54:39.027000 22755938076480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank98]:W0429 13:54:39.027000 23346847893312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank115]:W0429 13:54:39.027000 22841424561984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank127]:W0429 13:54:39.027000 22958566483776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank125]:W0429 13:54:39.027000 22667013142336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank122]:W0429 13:54:39.027000 23003642836800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank87]:W0429 13:54:39.027000 23424151160640 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank112]:W0429 13:54:39.027000 22470722582336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank105]:W0429 13:54:39.027000 23029667600192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank114]:W0429 13:54:39.027000 22682294355776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank113]:W0429 13:54:39.027000 23447124985664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank107]:W0429 13:54:39.027000 22434413557568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank81]:W0429 13:54:39.027000 22691049772864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank83]:W0429 13:54:39.028000 23115430680384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank123]:W0429 13:54:39.028000 23349361293120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank82]:W0429 13:54:39.028000 23411502409536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank84]:W0429 13:54:39.028000 22394382317376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank97]:W0429 13:54:39.028000 22994466142016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank106]:W0429 13:54:39.028000 23161180411712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank86]:W0429 13:54:39.028000 23204399363904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank99]:W0429 13:54:39.028000 22631651673920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank96]:W0429 13:54:39.028000 23009833490240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank121]:W0429 13:54:39.028000 22492902229824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank120]:W0429 13:54:39.028000 22568908642112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank115]:W0429 13:54:39.028000 22841424561984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank104]:W0429 13:54:39.028000 22755938076480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank98]:W0429 13:54:39.028000 23346847893312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank112]:W0429 13:54:39.029000 22470722582336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank114]:W0429 13:54:39.029000 22682294355776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank113]:W0429 13:54:39.029000 23447124985664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank106]:W0429 13:54:39.029000 23161180411712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank96]:W0429 13:54:39.029000 23009833490240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank93]:W0429 13:54:39.029000 23258577889088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank92]:W0429 13:54:39.030000 23330007242560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank94]:W0429 13:54:39.030000 23087302526784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank95]:W0429 13:54:39.030000 22976426383168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank93]:W0429 13:54:39.031000 23258577889088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank92]:W0429 13:54:39.031000 23330007242560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank94]:W0429 13:54:39.032000 23087302526784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank95]:W0429 13:54:39.032000 22976426383168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank117]:W0429 13:54:39.033000 23093348345664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank90]:W0429 13:54:39.033000 23150435538752 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank116]:W0429 13:54:39.034000 22558691170112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank119]:W0429 13:54:39.035000 23088243783488 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank117]:W0429 13:54:39.035000 23093348345664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank90]:W0429 13:54:39.035000 23150435538752 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank118]:W0429 13:54:39.035000 22559094351680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank89]:W0429 13:54:39.035000 23336834197312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank102]:W0429 13:54:39.035000 22683342313280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank88]:W0429 13:54:39.035000 22830808131392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank116]:W0429 13:54:39.035000 22558691170112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank103]:W0429 13:54:39.035000 22896949737280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank100]:W0429 13:54:39.036000 22747544479552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank91]:W0429 13:54:39.036000 22934713628480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank101]:W0429 13:54:39.036000 22833374287680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank119]:W0429 13:54:39.036000 23088243783488 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank118]:W0429 13:54:39.036000 22559094351680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank102]:W0429 13:54:39.036000 22683342313280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank89]:W0429 13:54:39.037000 23336834197312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank88]:W0429 13:54:39.037000 22830808131392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank103]:W0429 13:54:39.037000 22896949737280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank100]:W0429 13:54:39.037000 22747544479552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank101]:W0429 13:54:39.037000 22833374287680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank91]:W0429 13:54:39.037000 22934713628480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank109]:W0429 13:54:39.040000 22833432659776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank108]:W0429 13:54:39.040000 22677074855744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank110]:W0429 13:54:39.040000 22915543336768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank111]:W0429 13:54:39.041000 23057271183168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank109]:W0429 13:54:39.041000 22833432659776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank108]:W0429 13:54:39.041000 22677074855744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank110]:W0429 13:54:39.042000 22915543336768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank111]:W0429 13:54:39.042000 23057271183168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank70]:W0429 13:54:39.044000 22653057701696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank68]:W0429 13:54:39.044000 23271114356544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank69]:W0429 13:54:39.044000 22866598868800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank79]:W0429 13:54:39.044000 22460722431808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank77]:W0429 13:54:39.044000 23389024790336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank71]:W0429 13:54:39.044000 22610169476928 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank76]:W0429 13:54:39.044000 22901582837568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank78]:W0429 13:54:39.045000 22865390917440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank70]:W0429 13:54:39.045000 22653057701696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank68]:W0429 13:54:39.045000 23271114356544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank69]:W0429 13:54:39.045000 22866598868800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank79]:W0429 13:54:39.045000 22460722431808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank77]:W0429 13:54:39.045000 23389024790336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank76]:W0429 13:54:39.046000 22901582837568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank71]:W0429 13:54:39.046000 22610169476928 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank78]:W0429 13:54:39.046000 22865390917440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank66]:W0429 13:54:39.049000 22849277118272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank64]:W0429 13:54:39.050000 23170953406272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank65]:W0429 13:54:39.050000 23390719784768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank67]:W0429 13:54:39.050000 22645195020096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank66]:W0429 13:54:39.051000 22849277118272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank64]:W0429 13:54:39.051000 23170953406272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank65]:W0429 13:54:39.051000 23390719784768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank67]:W0429 13:54:39.051000 22645195020096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank73]:W0429 13:54:39.054000 22681645172544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank72]:W0429 13:54:39.054000 22795362211648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank85]:W0429 13:54:39.054000 22440907654976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank124]:W0429 13:54:39.054000 23376087562048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank75]:W0429 13:54:39.055000 22826330203968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank74]:W0429 13:54:39.055000 22750528702272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank126]:W0429 13:54:39.055000 22508424849216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank87]:W0429 13:54:39.055000 23424151160640 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank73]:W0429 13:54:39.055000 22681645172544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank80]:W0429 13:54:39.055000 23048164755264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank125]:W0429 13:54:39.055000 22667013142336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank124]:W0429 13:54:39.056000 23376087562048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank85]:W0429 13:54:39.056000 22440907654976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank72]:W0429 13:54:39.056000 22795362211648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank127]:W0429 13:54:39.056000 22958566483776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank122]:W0429 13:54:39.056000 23003642836800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank105]:W0429 13:54:39.056000 23029667600192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank81]:W0429 13:54:39.056000 22691049772864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank123]:W0429 13:54:39.056000 23349361293120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank107]:W0429 13:54:39.056000 22434413557568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank83]:W0429 13:54:39.056000 23115430680384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank82]:W0429 13:54:39.056000 23411502409536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank84]:W0429 13:54:39.056000 22394382317376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank75]:W0429 13:54:39.056000 22826330203968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank74]:W0429 13:54:39.056000 22750528702272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank86]:W0429 13:54:39.056000 23204399363904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank126]:W0429 13:54:39.056000 22508424849216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank120]:W0429 13:54:39.056000 22568908642112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank99]:W0429 13:54:39.056000 22631651673920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank97]:W0429 13:54:39.056000 22994466142016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank80]:W0429 13:54:39.057000 23048164755264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank87]:W0429 13:54:39.057000 23424151160640 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank125]:W0429 13:54:39.057000 22667013142336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank121]:W0429 13:54:39.057000 22492902229824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank98]:W0429 13:54:39.057000 23346847893312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank115]:W0429 13:54:39.057000 22841424561984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank127]:W0429 13:54:39.057000 22958566483776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank122]:W0429 13:54:39.057000 23003642836800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank112]:W0429 13:54:39.057000 22470722582336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank114]:W0429 13:54:39.057000 22682294355776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank113]:W0429 13:54:39.057000 23447124985664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank105]:W0429 13:54:39.057000 23029667600192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank81]:W0429 13:54:39.057000 22691049772864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank104]:W0429 13:54:39.057000 22755938076480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank83]:W0429 13:54:39.057000 23115430680384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank82]:W0429 13:54:39.057000 23411502409536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank123]:W0429 13:54:39.057000 23349361293120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank107]:W0429 13:54:39.058000 22434413557568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank84]:W0429 13:54:39.058000 22394382317376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank86]:W0429 13:54:39.058000 23204399363904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank120]:W0429 13:54:39.058000 22568908642112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank99]:W0429 13:54:39.058000 22631651673920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank97]:W0429 13:54:39.058000 22994466142016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank121]:W0429 13:54:39.058000 22492902229824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank98]:W0429 13:54:39.058000 23346847893312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank96]:W0429 13:54:39.058000 23009833490240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank115]:W0429 13:54:39.058000 22841424561984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank112]:W0429 13:54:39.058000 22470722582336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank106]:W0429 13:54:39.058000 23161180411712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank114]:W0429 13:54:39.058000 22682294355776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank113]:W0429 13:54:39.059000 23447124985664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank104]:W0429 13:54:39.059000 22755938076480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank93]:W0429 13:54:39.059000 23258577889088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank96]:W0429 13:54:39.059000 23009833490240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank92]:W0429 13:54:39.060000 23330007242560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank106]:W0429 13:54:39.060000 23161180411712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank95]:W0429 13:54:39.060000 22976426383168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank93]:W0429 13:54:39.060000 23258577889088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank94]:W0429 13:54:39.060000 23087302526784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank92]:W0429 13:54:39.061000 23330007242560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank95]:W0429 13:54:39.062000 22976426383168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank94]:W0429 13:54:39.062000 23087302526784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank117]:W0429 13:54:39.063000 23093348345664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank116]:W0429 13:54:39.063000 22558691170112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank90]:W0429 13:54:39.064000 23150435538752 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank119]:W0429 13:54:39.064000 23088243783488 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank117]:W0429 13:54:39.064000 23093348345664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank116]:W0429 13:54:39.065000 22558691170112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank90]:W0429 13:54:39.065000 23150435538752 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank118]:W0429 13:54:39.065000 22559094351680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank89]:W0429 13:54:39.065000 23336834197312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank88]:W0429 13:54:39.065000 22830808131392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank119]:W0429 13:54:39.065000 23088243783488 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank91]:W0429 13:54:39.066000 22934713628480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank118]:W0429 13:54:39.066000 22559094351680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank88]:W0429 13:54:39.067000 22830808131392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank89]:W0429 13:54:39.067000 23336834197312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank91]:W0429 13:54:39.067000 22934713628480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
[rank140]:W0429 13:54:55.892000 22717280204608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank143]:W0429 13:54:55.892000 22871709730624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank142]:W0429 13:54:55.892000 23375852779328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank141]:W0429 13:54:55.892000 22906869233472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank140]:W0429 13:54:55.893000 22717280204608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank143]:W0429 13:54:55.893000 22871709730624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank142]:W0429 13:54:55.893000 23375852779328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank141]:W0429 13:54:55.894000 22906869233472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank165]:W0429 13:54:55.902000 23098084124480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank166]:W0429 13:54:55.903000 22902185314112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank164]:W0429 13:54:55.903000 22499531257664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank180]:W0429 13:54:55.903000 23408962176832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank167]:W0429 13:54:55.903000 23240198584128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank181]:W0429 13:54:55.903000 22969048364864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank165]:W0429 13:54:55.904000 23098084124480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank182]:W0429 13:54:55.904000 22818674796352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank189]:W0429 13:54:55.904000 22806575716160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank132]:W0429 13:54:55.904000 23059399808832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank133]:W0429 13:54:55.904000 22466904856384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank183]:W0429 13:54:55.904000 23284536387392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank166]:W0429 13:54:55.904000 22902185314112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank148]:W0429 13:54:55.904000 22839605245760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank191]:W0429 13:54:55.904000 23255313188672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank164]:W0429 13:54:55.904000 22499531257664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank180]:W0429 13:54:55.905000 23408962176832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank188]:W0429 13:54:55.904000 22851270379328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank149]:W0429 13:54:55.905000 22910322153280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank181]:W0429 13:54:55.905000 22969048364864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank190]:W0429 13:54:55.905000 23155344385856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank167]:W0429 13:54:55.905000 23240198584128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank189]:W0429 13:54:55.905000 22806575716160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank132]:W0429 13:54:55.905000 23059399808832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank182]:W0429 13:54:55.905000 22818674796352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank151]:W0429 13:54:55.905000 22968116860736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank133]:W0429 13:54:55.905000 22466904856384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank148]:W0429 13:54:55.905000 22839605245760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank183]:W0429 13:54:55.905000 23284536387392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank135]:W0429 13:54:55.905000 22973797103424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank191]:W0429 13:54:55.906000 23255313188672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank150]:W0429 13:54:55.906000 23437876565824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank130]:W0429 13:54:55.906000 22823484159808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank149]:W0429 13:54:55.906000 22910322153280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank188]:W0429 13:54:55.906000 22851270379328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank134]:W0429 13:54:55.906000 22643123144512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank190]:W0429 13:54:55.906000 23155344385856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank131]:W0429 13:54:55.906000 22432856721216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank129]:W0429 13:54:55.906000 23329076684608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank128]:W0429 13:54:55.906000 22535945967424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank185]:W0429 13:54:55.906000 22432356706112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank151]:W0429 13:54:55.907000 22968116860736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank135]:W0429 13:54:55.907000 22973797103424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank130]:W0429 13:54:55.907000 22823484159808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank150]:W0429 13:54:55.907000 23437876565824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank134]:W0429 13:54:55.907000 22643123144512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank131]:W0429 13:54:55.907000 22432856721216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank128]:W0429 13:54:55.907000 22535945967424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank129]:W0429 13:54:55.907000 23329076684608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank185]:W0429 13:54:55.908000 22432356706112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank186]:W0429 13:54:55.908000 23033743685440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank184]:W0429 13:54:55.908000 23296706598720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank187]:W0429 13:54:55.909000 23069882242880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank186]:W0429 13:54:55.910000 23033743685440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank184]:W0429 13:54:55.910000 23296706598720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank187]:W0429 13:54:55.910000 23069882242880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank174]:W0429 13:54:55.914000 22801339721536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank172]:W0429 13:54:55.914000 22566063155008 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank173]:W0429 13:54:55.914000 22826477733696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank175]:W0429 13:54:55.914000 22994211190592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank174]:W0429 13:54:55.915000 22801339721536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank172]:W0429 13:54:55.915000 22566063155008 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank173]:W0429 13:54:55.915000 22826477733696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank175]:W0429 13:54:55.916000 22994211190592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank177]:W0429 13:54:55.917000 23268739430208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank178]:W0429 13:54:55.917000 23022631487296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank179]:W0429 13:54:55.917000 22567540295488 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank177]:W0429 13:54:55.918000 23268739430208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank176]:W0429 13:54:55.918000 22642024879936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank178]:W0429 13:54:55.918000 23022631487296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank179]:W0429 13:54:55.918000 22567540295488 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank138]:W0429 13:54:55.919000 22629980055360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank137]:W0429 13:54:55.919000 23280845530944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank176]:W0429 13:54:55.920000 22642024879936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank136]:W0429 13:54:55.920000 22819909936960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank139]:W0429 13:54:55.920000 22855062943552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank138]:W0429 13:54:55.921000 22629980055360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank137]:W0429 13:54:55.921000 23280845530944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank140]:W0429 13:54:55.921000 22717280204608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank136]:W0429 13:54:55.921000 22819909936960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank142]:W0429 13:54:55.921000 23375852779328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank143]:W0429 13:54:55.922000 22871709730624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank139]:W0429 13:54:55.922000 22855062943552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank141]:W0429 13:54:55.922000 22906869233472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank140]:W0429 13:54:55.923000 22717280204608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank142]:W0429 13:54:55.923000 23375852779328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank143]:W0429 13:54:55.923000 22871709730624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank155]:W0429 13:54:55.923000 22540949755712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank168]:W0429 13:54:55.923000 22911190701888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank154]:W0429 13:54:55.923000 22438334134080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank141]:W0429 13:54:55.923000 22906869233472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank171]:W0429 13:54:55.924000 22935535294272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank169]:W0429 13:54:55.924000 22836995888960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank153]:W0429 13:54:55.924000 22729323710272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank170]:W0429 13:54:55.924000 22471370512192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank155]:W0429 13:54:55.924000 22540949755712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank154]:W0429 13:54:55.925000 22438334134080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank168]:W0429 13:54:55.925000 22911190701888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank160]:W0429 13:54:55.925000 22357124683584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank171]:W0429 13:54:55.925000 22935535294272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank169]:W0429 13:54:55.925000 22836995888960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank153]:W0429 13:54:55.925000 22729323710272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank162]:W0429 13:54:55.925000 23041638762304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank152]:W0429 13:54:55.925000 23183139522368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank170]:W0429 13:54:55.926000 22471370512192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank163]:W0429 13:54:55.926000 22630359861056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank160]:W0429 13:54:55.926000 22357124683584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank161]:W0429 13:54:55.926000 22560101046080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank145]:W0429 13:54:55.927000 23447154751296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank162]:W0429 13:54:55.927000 23041638762304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank152]:W0429 13:54:55.927000 23183139522368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank163]:W0429 13:54:55.927000 22630359861056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank157]:W0429 13:54:55.927000 23310792869696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank147]:W0429 13:54:55.928000 23372908009280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank146]:W0429 13:54:55.928000 23276978042688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank161]:W0429 13:54:55.928000 22560101046080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank145]:W0429 13:54:55.928000 23447154751296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank158]:W0429 13:54:55.928000 22763414529856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank144]:W0429 13:54:55.928000 23425238271808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank159]:W0429 13:54:55.929000 23280951838528 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank157]:W0429 13:54:55.929000 23310792869696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank156]:W0429 13:54:55.929000 23220075747136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank147]:W0429 13:54:55.929000 23372908009280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank146]:W0429 13:54:55.929000 23276978042688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank158]:W0429 13:54:55.930000 22763414529856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank144]:W0429 13:54:55.930000 23425238271808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank159]:W0429 13:54:55.930000 23280951838528 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank156]:W0429 13:54:55.930000 23220075747136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank164]:W0429 13:54:55.933000 22499531257664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank166]:W0429 13:54:55.933000 22902185314112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank165]:W0429 13:54:55.933000 23098084124480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank180]:W0429 13:54:55.933000 23408962176832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank189]:W0429 13:54:55.933000 22806575716160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank132]:W0429 13:54:55.933000 23059399808832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank181]:W0429 13:54:55.933000 22969048364864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank133]:W0429 13:54:55.933000 22466904856384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank182]:W0429 13:54:55.933000 22818674796352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank167]:W0429 13:54:55.934000 23240198584128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank183]:W0429 13:54:55.934000 23284536387392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank191]:W0429 13:54:55.934000 23255313188672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank164]:W0429 13:54:55.934000 22499531257664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank166]:W0429 13:54:55.934000 22902185314112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank148]:W0429 13:54:55.934000 22839605245760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank149]:W0429 13:54:55.934000 22910322153280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank188]:W0429 13:54:55.934000 22851270379328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank165]:W0429 13:54:55.934000 23098084124480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank180]:W0429 13:54:55.934000 23408962176832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank189]:W0429 13:54:55.934000 22806575716160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank132]:W0429 13:54:55.934000 23059399808832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank181]:W0429 13:54:55.935000 22969048364864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank190]:W0429 13:54:55.935000 23155344385856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank133]:W0429 13:54:55.935000 22466904856384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank182]:W0429 13:54:55.935000 22818674796352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank167]:W0429 13:54:55.935000 23240198584128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank183]:W0429 13:54:55.935000 23284536387392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank135]:W0429 13:54:55.935000 22973797103424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank150]:W0429 13:54:55.935000 23437876565824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank191]:W0429 13:54:55.935000 23255313188672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank130]:W0429 13:54:55.935000 22823484159808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank151]:W0429 13:54:55.935000 22968116860736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank148]:W0429 13:54:55.935000 22839605245760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank149]:W0429 13:54:55.936000 22910322153280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank188]:W0429 13:54:55.935000 22851270379328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank128]:W0429 13:54:55.936000 22535945967424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank134]:W0429 13:54:55.936000 22643123144512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank129]:W0429 13:54:55.936000 23329076684608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank190]:W0429 13:54:55.936000 23155344385856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank131]:W0429 13:54:55.936000 22432856721216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank185]:W0429 13:54:55.936000 22432356706112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank135]:W0429 13:54:55.936000 22973797103424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank150]:W0429 13:54:55.937000 23437876565824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank130]:W0429 13:54:55.937000 22823484159808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank151]:W0429 13:54:55.937000 22968116860736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank128]:W0429 13:54:55.937000 22535945967424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank134]:W0429 13:54:55.937000 22643123144512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank129]:W0429 13:54:55.937000 23329076684608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank131]:W0429 13:54:55.937000 22432856721216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank185]:W0429 13:54:55.937000 22432356706112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank186]:W0429 13:54:55.938000 23033743685440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank184]:W0429 13:54:55.939000 23296706598720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank187]:W0429 13:54:55.939000 23069882242880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank186]:W0429 13:54:55.939000 23033743685440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank184]:W0429 13:54:55.940000 23296706598720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank187]:W0429 13:54:55.940000 23069882242880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank173]:W0429 13:54:55.944000 22826477733696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank174]:W0429 13:54:55.944000 22801339721536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank172]:W0429 13:54:55.944000 22566063155008 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank175]:W0429 13:54:55.944000 22994211190592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank173]:W0429 13:54:55.945000 22826477733696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank174]:W0429 13:54:55.945000 22801339721536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank172]:W0429 13:54:55.945000 22566063155008 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank175]:W0429 13:54:55.946000 22994211190592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank177]:W0429 13:54:55.947000 23268739430208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank179]:W0429 13:54:55.947000 22567540295488 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank178]:W0429 13:54:55.947000 23022631487296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank176]:W0429 13:54:55.948000 22642024879936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank177]:W0429 13:54:55.948000 23268739430208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank179]:W0429 13:54:55.948000 22567540295488 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank178]:W0429 13:54:55.948000 23022631487296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank138]:W0429 13:54:55.949000 22629980055360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank137]:W0429 13:54:55.949000 23280845530944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank176]:W0429 13:54:55.950000 22642024879936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank136]:W0429 13:54:55.950000 22819909936960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank138]:W0429 13:54:55.950000 22629980055360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank139]:W0429 13:54:55.951000 22855062943552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank137]:W0429 13:54:55.951000 23280845530944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank136]:W0429 13:54:55.951000 22819909936960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank139]:W0429 13:54:55.952000 22855062943552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank168]:W0429 13:54:55.953000 22911190701888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank155]:W0429 13:54:55.953000 22540949755712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank154]:W0429 13:54:55.953000 22438334134080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank171]:W0429 13:54:55.954000 22935535294272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank153]:W0429 13:54:55.954000 22729323710272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank169]:W0429 13:54:55.954000 22836995888960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank170]:W0429 13:54:55.954000 22471370512192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank168]:W0429 13:54:55.954000 22911190701888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank155]:W0429 13:54:55.954000 22540949755712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank154]:W0429 13:54:55.955000 22438334134080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank171]:W0429 13:54:55.955000 22935535294272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank160]:W0429 13:54:55.955000 22357124683584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank153]:W0429 13:54:55.955000 22729323710272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank169]:W0429 13:54:55.955000 22836995888960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank170]:W0429 13:54:55.955000 22471370512192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank152]:W0429 13:54:55.956000 23183139522368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank162]:W0429 13:54:55.956000 23041638762304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank163]:W0429 13:54:55.956000 22630359861056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank160]:W0429 13:54:55.956000 22357124683584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank145]:W0429 13:54:55.957000 23447154751296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank161]:W0429 13:54:55.957000 22560101046080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank152]:W0429 13:54:55.957000 23183139522368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank162]:W0429 13:54:55.957000 23041638762304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank157]:W0429 13:54:55.957000 23310792869696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank163]:W0429 13:54:55.957000 22630359861056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank147]:W0429 13:54:55.957000 23372908009280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank146]:W0429 13:54:55.958000 23276978042688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank161]:W0429 13:54:55.958000 22560101046080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank145]:W0429 13:54:55.958000 23447154751296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank158]:W0429 13:54:55.958000 22763414529856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank159]:W0429 13:54:55.958000 23280951838528 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank157]:W0429 13:54:55.959000 23310792869696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank156]:W0429 13:54:55.959000 23220075747136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank144]:W0429 13:54:55.959000 23425238271808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank147]:W0429 13:54:55.959000 23372908009280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank146]:W0429 13:54:55.959000 23276978042688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank158]:W0429 13:54:55.959000 22763414529856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank159]:W0429 13:54:55.960000 23280951838528 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank156]:W0429 13:54:55.960000 23220075747136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank144]:W0429 13:54:55.960000 23425238271808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
[rank242]:W0429 13:55:19.177000 22606900881216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank243]:W0429 13:55:19.177000 23373992761152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank240]:W0429 13:55:19.178000 22608052918080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank241]:W0429 13:55:19.178000 23440443119424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank250]:W0429 13:55:19.178000 22565538338624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank242]:W0429 13:55:19.178000 22606900881216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank243]:W0429 13:55:19.179000 23373992761152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank240]:W0429 13:55:19.179000 22608052918080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank241]:W0429 13:55:19.179000 23440443119424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank250]:W0429 13:55:19.179000 22565538338624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank249]:W0429 13:55:19.180000 22968744855360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank251]:W0429 13:55:19.180000 22730104407872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank248]:W0429 13:55:19.180000 23423837947712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank249]:W0429 13:55:19.181000 22968744855360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank251]:W0429 13:55:19.181000 22730104407872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank248]:W0429 13:55:19.182000 23423837947712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank214]:W0429 13:55:19.184000 23259383994176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank215]:W0429 13:55:19.184000 23370938468160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank212]:W0429 13:55:19.185000 23047693469504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank213]:W0429 13:55:19.185000 22485635147584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank214]:W0429 13:55:19.186000 23259383994176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank215]:W0429 13:55:19.186000 23370938468160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank212]:W0429 13:55:19.186000 23047693469504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank213]:W0429 13:55:19.187000 22485635147584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank245]:W0429 13:55:19.189000 22579940398912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank230]:W0429 13:55:19.189000 23362597832512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank229]:W0429 13:55:19.190000 22379685750592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank236]:W0429 13:55:19.190000 23309950756672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank209]:W0429 13:55:19.190000 23449618081600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank228]:W0429 13:55:19.190000 23312000137024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank246]:W0429 13:55:19.190000 22593186535232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank203]:W0429 13:55:19.190000 23280651978560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank245]:W0429 13:55:19.191000 22579940398912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank238]:W0429 13:55:19.190000 22524005943104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank247]:W0429 13:55:19.191000 23415333144384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank237]:W0429 13:55:19.191000 22400234260288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank230]:W0429 13:55:19.191000 23362597832512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank244]:W0429 13:55:19.191000 22539528558400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank202]:W0429 13:55:19.191000 23049067001664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank239]:W0429 13:55:19.191000 23169933915968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank229]:W0429 13:55:19.191000 22379685750592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank210]:W0429 13:55:19.191000 22739709818688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank236]:W0429 13:55:19.191000 23309950756672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank208]:W0429 13:55:19.191000 22602569500480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank200]:W0429 13:55:19.192000 23017507551040 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank231]:W0429 13:55:19.192000 22557449443136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank209]:W0429 13:55:19.192000 23449618081600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank222]:W0429 13:55:19.191000 22564048418624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank246]:W0429 13:55:19.192000 22593186535232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank228]:W0429 13:55:19.192000 23312000137024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank203]:W0429 13:55:19.192000 23280651978560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank201]:W0429 13:55:19.192000 23241289725760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank237]:W0429 13:55:19.192000 22400234260288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank247]:W0429 13:55:19.192000 23415333144384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank220]:W0429 13:55:19.192000 22956469024576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank238]:W0429 13:55:19.192000 22524005943104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank211]:W0429 13:55:19.192000 23163315607360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank244]:W0429 13:55:19.192000 22539528558400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank224]:W0429 13:55:19.192000 23068930123584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank227]:W0429 13:55:19.192000 22956983322432 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank202]:W0429 13:55:19.192000 23049067001664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank225]:W0429 13:55:19.192000 22928326457152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank239]:W0429 13:55:19.192000 23169933915968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank226]:W0429 13:55:19.192000 23245218654016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank210]:W0429 13:55:19.192000 22739709818688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank223]:W0429 13:55:19.193000 22707747858240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank208]:W0429 13:55:19.193000 22602569500480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank221]:W0429 13:55:19.193000 22438741571392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank222]:W0429 13:55:19.193000 22564048418624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank200]:W0429 13:55:19.193000 23017507551040 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank231]:W0429 13:55:19.193000 22557449443136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank201]:W0429 13:55:19.193000 23241289725760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank220]:W0429 13:55:19.193000 22956469024576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank211]:W0429 13:55:19.193000 23163315607360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank227]:W0429 13:55:19.194000 22956983322432 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank224]:W0429 13:55:19.194000 23068930123584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank225]:W0429 13:55:19.194000 22928326457152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank226]:W0429 13:55:19.194000 23245218654016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank223]:W0429 13:55:19.194000 22707747858240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank221]:W0429 13:55:19.194000 22438741571392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank253]:W0429 13:55:19.196000 23273910597440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank254]:W0429 13:55:19.197000 22929567725376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank252]:W0429 13:55:19.197000 22421345261376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank253]:W0429 13:55:19.197000 23273910597440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank255]:W0429 13:55:19.197000 22396454532928 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank193]:W0429 13:55:19.197000 23358991128384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank195]:W0429 13:55:19.198000 22464448079680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank192]:W0429 13:55:19.198000 23169496680256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank252]:W0429 13:55:19.198000 22421345261376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank254]:W0429 13:55:19.198000 22929567725376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank193]:W0429 13:55:19.198000 23358991128384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank255]:W0429 13:55:19.199000 22396454532928 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank194]:W0429 13:55:19.199000 22924905109312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank217]:W0429 13:55:19.199000 22801800075072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank195]:W0429 13:55:19.199000 22464448079680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank219]:W0429 13:55:19.199000 22660315203392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank216]:W0429 13:55:19.199000 22434596960064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank218]:W0429 13:55:19.199000 23272807339840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank192]:W0429 13:55:19.200000 23169496680256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank205]:W0429 13:55:19.200000 22589953836864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank233]:W0429 13:55:19.200000 22474954528576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank204]:W0429 13:55:19.200000 22614583154496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank235]:W0429 13:55:19.200000 23175765862208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank232]:W0429 13:55:19.200000 22585140414272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank194]:W0429 13:55:19.200000 22924905109312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank207]:W0429 13:55:19.200000 23019265918784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank217]:W0429 13:55:19.201000 22801800075072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank219]:W0429 13:55:19.201000 22660315203392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank216]:W0429 13:55:19.201000 22434596960064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank196]:W0429 13:55:19.201000 22484711565120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank234]:W0429 13:55:19.201000 22946193401664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank218]:W0429 13:55:19.201000 23272807339840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank199]:W0429 13:55:19.201000 22877959272256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank205]:W0429 13:55:19.201000 22589953836864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank233]:W0429 13:55:19.201000 22474954528576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank204]:W0429 13:55:19.202000 22614583154496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank235]:W0429 13:55:19.202000 23175765862208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank197]:W0429 13:55:19.202000 23107409135424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank232]:W0429 13:55:19.202000 22585140414272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank207]:W0429 13:55:19.202000 23019265918784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank196]:W0429 13:55:19.202000 22484711565120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank198]:W0429 13:55:19.202000 23449363756864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank234]:W0429 13:55:19.202000 22946193401664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank206]:W0429 13:55:19.202000 23042563192640 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank199]:W0429 13:55:19.202000 22877959272256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank197]:W0429 13:55:19.203000 23107409135424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank198]:W0429 13:55:19.203000 23449363756864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank206]:W0429 13:55:19.204000 23042563192640 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank242]:W0429 13:55:19.207000 22606900881216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank243]:W0429 13:55:19.207000 23373992761152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank240]:W0429 13:55:19.208000 22608052918080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank241]:W0429 13:55:19.208000 23440443119424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank250]:W0429 13:55:19.208000 22565538338624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank242]:W0429 13:55:19.208000 22606900881216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank243]:W0429 13:55:19.209000 23373992761152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank240]:W0429 13:55:19.209000 22608052918080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank241]:W0429 13:55:19.209000 23440443119424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank250]:W0429 13:55:19.209000 22565538338624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank249]:W0429 13:55:19.210000 22968744855360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank248]:W0429 13:55:19.210000 23423837947712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank251]:W0429 13:55:19.210000 22730104407872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank249]:W0429 13:55:19.211000 22968744855360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank248]:W0429 13:55:19.211000 23423837947712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank251]:W0429 13:55:19.211000 22730104407872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank214]:W0429 13:55:19.214000 23259383994176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank215]:W0429 13:55:19.215000 23370938468160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank212]:W0429 13:55:19.215000 23047693469504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank213]:W0429 13:55:19.215000 22485635147584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank214]:W0429 13:55:19.215000 23259383994176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank215]:W0429 13:55:19.216000 23370938468160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank212]:W0429 13:55:19.216000 23047693469504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank213]:W0429 13:55:19.216000 22485635147584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank245]:W0429 13:55:19.219000 22579940398912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank230]:W0429 13:55:19.219000 23362597832512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank229]:W0429 13:55:19.219000 22379685750592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank236]:W0429 13:55:19.220000 23309950756672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank246]:W0429 13:55:19.220000 22593186535232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank228]:W0429 13:55:19.220000 23312000137024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank238]:W0429 13:55:19.220000 22524005943104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank237]:W0429 13:55:19.220000 22400234260288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank245]:W0429 13:55:19.220000 22579940398912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank209]:W0429 13:55:19.220000 23449618081600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank244]:W0429 13:55:19.220000 22539528558400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank230]:W0429 13:55:19.220000 23362597832512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank203]:W0429 13:55:19.220000 23280651978560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank247]:W0429 13:55:19.220000 23415333144384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank229]:W0429 13:55:19.221000 22379685750592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank202]:W0429 13:55:19.221000 23049067001664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank210]:W0429 13:55:19.221000 22739709818688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank239]:W0429 13:55:19.221000 23169933915968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank236]:W0429 13:55:19.221000 23309950756672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank246]:W0429 13:55:19.221000 22593186535232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank231]:W0429 13:55:19.221000 22557449443136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank228]:W0429 13:55:19.221000 23312000137024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank238]:W0429 13:55:19.221000 22524005943104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank237]:W0429 13:55:19.222000 22400234260288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank201]:W0429 13:55:19.222000 23241289725760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank222]:W0429 13:55:19.222000 22564048418624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank244]:W0429 13:55:19.222000 22539528558400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank200]:W0429 13:55:19.222000 23017507551040 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank209]:W0429 13:55:19.222000 23449618081600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank203]:W0429 13:55:19.222000 23280651978560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank247]:W0429 13:55:19.222000 23415333144384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank208]:W0429 13:55:19.222000 22602569500480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank220]:W0429 13:55:19.222000 22956469024576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank224]:W0429 13:55:19.222000 23068930123584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank211]:W0429 13:55:19.222000 23163315607360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank202]:W0429 13:55:19.222000 23049067001664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank210]:W0429 13:55:19.222000 22739709818688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank239]:W0429 13:55:19.222000 23169933915968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank226]:W0429 13:55:19.222000 23245218654016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank227]:W0429 13:55:19.222000 22956983322432 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank225]:W0429 13:55:19.222000 22928326457152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank231]:W0429 13:55:19.223000 22557449443136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank223]:W0429 13:55:19.223000 22707747858240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank200]:W0429 13:55:19.223000 23017507551040 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank222]:W0429 13:55:19.223000 22564048418624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank201]:W0429 13:55:19.223000 23241289725760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank208]:W0429 13:55:19.223000 22602569500480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank221]:W0429 13:55:19.223000 22438741571392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank220]:W0429 13:55:19.223000 22956469024576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank211]:W0429 13:55:19.223000 23163315607360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank224]:W0429 13:55:19.223000 23068930123584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank226]:W0429 13:55:19.224000 23245218654016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank227]:W0429 13:55:19.224000 22956983322432 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank225]:W0429 13:55:19.224000 22928326457152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank223]:W0429 13:55:19.224000 22707747858240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank221]:W0429 13:55:19.224000 22438741571392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank253]:W0429 13:55:19.225000 23273910597440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank252]:W0429 13:55:19.226000 22421345261376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank253]:W0429 13:55:19.227000 23273910597440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank254]:W0429 13:55:19.227000 22929567725376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank193]:W0429 13:55:19.227000 23358991128384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank255]:W0429 13:55:19.227000 22396454532928 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank195]:W0429 13:55:19.227000 22464448079680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank252]:W0429 13:55:19.228000 22421345261376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank192]:W0429 13:55:19.228000 23169496680256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank254]:W0429 13:55:19.228000 22929567725376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank193]:W0429 13:55:19.228000 23358991128384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank255]:W0429 13:55:19.228000 22396454532928 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank195]:W0429 13:55:19.229000 22464448079680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank194]:W0429 13:55:19.229000 22924905109312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank217]:W0429 13:55:19.229000 22801800075072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank192]:W0429 13:55:19.229000 23169496680256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank216]:W0429 13:55:19.230000 22434596960064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank219]:W0429 13:55:19.230000 22660315203392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank205]:W0429 13:55:19.230000 22589953836864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank233]:W0429 13:55:19.230000 22474954528576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank235]:W0429 13:55:19.230000 23175765862208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank218]:W0429 13:55:19.230000 23272807339840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank194]:W0429 13:55:19.230000 22924905109312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank232]:W0429 13:55:19.230000 22585140414272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank204]:W0429 13:55:19.230000 22614583154496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank207]:W0429 13:55:19.230000 23019265918784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank196]:W0429 13:55:19.230000 22484711565120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank217]:W0429 13:55:19.231000 22801800075072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank234]:W0429 13:55:19.231000 22946193401664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank199]:W0429 13:55:19.231000 22877959272256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank216]:W0429 13:55:19.231000 22434596960064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank219]:W0429 13:55:19.231000 22660315203392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank205]:W0429 13:55:19.231000 22589953836864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank233]:W0429 13:55:19.231000 22474954528576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank235]:W0429 13:55:19.231000 23175765862208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank218]:W0429 13:55:19.231000 23272807339840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank197]:W0429 13:55:19.231000 23107409135424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank204]:W0429 13:55:19.231000 22614583154496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank232]:W0429 13:55:19.231000 22585140414272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank196]:W0429 13:55:19.232000 22484711565120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank207]:W0429 13:55:19.232000 23019265918784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank198]:W0429 13:55:19.232000 23449363756864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank234]:W0429 13:55:19.232000 22946193401664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank199]:W0429 13:55:19.232000 22877959272256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank206]:W0429 13:55:19.232000 23042563192640 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank197]:W0429 13:55:19.233000 23107409135424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank198]:W0429 13:55:19.233000 23449363756864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank206]:W0429 13:55:19.234000 23042563192640 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
[rank304]:W0429 13:55:36.152000 23127762617344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank307]:W0429 13:55:36.153000 22875396473856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank306]:W0429 13:55:36.153000 22875074102272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank301]:W0429 13:55:36.153000 22546113193792 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank318]:W0429 13:55:36.153000 22602851030848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank319]:W0429 13:55:36.153000 22755457058624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank303]:W0429 13:55:36.153000 23351588300608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank304]:W0429 13:55:36.153000 23127762617344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank316]:W0429 13:55:36.153000 23314159970112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank305]:W0429 13:55:36.154000 22726035870720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank317]:W0429 13:55:36.154000 22480827434816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank300]:W0429 13:55:36.154000 22396027983680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank307]:W0429 13:55:36.154000 22875396473856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank301]:W0429 13:55:36.154000 22546113193792 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank318]:W0429 13:55:36.154000 22602851030848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank319]:W0429 13:55:36.154000 22755457058624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank306]:W0429 13:55:36.154000 22875074102272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank302]:W0429 13:55:36.154000 22514947888960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank267]:W0429 13:55:36.154000 22736532809536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank266]:W0429 13:55:36.154000 22544854738752 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank303]:W0429 13:55:36.154000 23351588300608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank315]:W0429 13:55:36.154000 22786684450624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank264]:W0429 13:55:36.155000 23225003415360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank313]:W0429 13:55:36.155000 23218534467392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank316]:W0429 13:55:36.155000 23314159970112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank265]:W0429 13:55:36.155000 22663757563712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank305]:W0429 13:55:36.155000 22726035870720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank269]:W0429 13:55:36.155000 23410760816448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank300]:W0429 13:55:36.155000 22396027983680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank317]:W0429 13:55:36.155000 22480827434816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank267]:W0429 13:55:36.155000 22736532809536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank314]:W0429 13:55:36.155000 22958924527424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank266]:W0429 13:55:36.156000 22544854738752 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank302]:W0429 13:55:36.156000 22514947888960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank270]:W0429 13:55:36.156000 22864214685504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank315]:W0429 13:55:36.156000 22786684450624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank273]:W0429 13:55:36.156000 23278099240768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank264]:W0429 13:55:36.156000 23225003415360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank275]:W0429 13:55:36.156000 23282051221312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank313]:W0429 13:55:36.156000 23218534467392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank265]:W0429 13:55:36.156000 22663757563712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank271]:W0429 13:55:36.156000 22488593782592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank269]:W0429 13:55:36.157000 23410760816448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank312]:W0429 13:55:36.156000 22896953481024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank272]:W0429 13:55:36.157000 22649208649536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank268]:W0429 13:55:36.157000 22858997614400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank314]:W0429 13:55:36.157000 22958924527424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank274]:W0429 13:55:36.157000 22367477663552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank270]:W0429 13:55:36.157000 22864214685504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank273]:W0429 13:55:36.157000 23278099240768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank275]:W0429 13:55:36.158000 23282051221312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank290]:W0429 13:55:36.157000 23136802756416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank289]:W0429 13:55:36.158000 23213093029696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank271]:W0429 13:55:36.158000 22488593782592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank312]:W0429 13:55:36.158000 22896953481024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank272]:W0429 13:55:36.158000 22649208649536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank268]:W0429 13:55:36.158000 22858997614400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank274]:W0429 13:55:36.158000 22367477663552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank291]:W0429 13:55:36.158000 22541384697664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank288]:W0429 13:55:36.159000 22932850296640 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank290]:W0429 13:55:36.159000 23136802756416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank289]:W0429 13:55:36.159000 23213093029696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank291]:W0429 13:55:36.160000 22541384697664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank311]:W0429 13:55:36.160000 22459137106944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank288]:W0429 13:55:36.160000 22932850296640 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank308]:W0429 13:55:36.160000 22999437267968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank309]:W0429 13:55:36.160000 23002111820800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank298]:W0429 13:55:36.161000 22582818535232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank297]:W0429 13:55:36.161000 22847062742848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank295]:W0429 13:55:36.161000 23045115094848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank299]:W0429 13:55:36.161000 22449627662144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank311]:W0429 13:55:36.161000 22459137106944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank294]:W0429 13:55:36.161000 22564104423232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank308]:W0429 13:55:36.161000 22999437267968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank310]:W0429 13:55:36.162000 22678955025408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank292]:W0429 13:55:36.162000 22632747714368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank309]:W0429 13:55:36.162000 23002111820800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank296]:W0429 13:55:36.162000 22812956907328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank298]:W0429 13:55:36.162000 22582818535232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank293]:W0429 13:55:36.162000 23104736479040 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank297]:W0429 13:55:36.162000 22847062742848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank295]:W0429 13:55:36.162000 23045115094848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank299]:W0429 13:55:36.162000 22449627662144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank277]:W0429 13:55:36.163000 22654782576448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank294]:W0429 13:55:36.163000 22564104423232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank278]:W0429 13:55:36.163000 22401834968896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank262]:W0429 13:55:36.163000 23071180351296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank310]:W0429 13:55:36.163000 22678955025408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank260]:W0429 13:55:36.163000 22516484765504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank292]:W0429 13:55:36.163000 22632747714368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank296]:W0429 13:55:36.163000 22812956907328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank293]:W0429 13:55:36.164000 23104736479040 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank261]:W0429 13:55:36.163000 22685628434240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank279]:W0429 13:55:36.164000 22854448351040 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank277]:W0429 13:55:36.164000 22654782576448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank263]:W0429 13:55:36.164000 23420091299648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank260]:W0429 13:55:36.164000 22516484765504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank278]:W0429 13:55:36.164000 22401834968896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank262]:W0429 13:55:36.164000 23071180351296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank276]:W0429 13:55:36.164000 22794914531136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank257]:W0429 13:55:36.164000 23034289919808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank256]:W0429 13:55:36.165000 23400655296320 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank261]:W0429 13:55:36.165000 22685628434240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank279]:W0429 13:55:36.165000 22854448351040 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank259]:W0429 13:55:36.165000 22391681009472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank263]:W0429 13:55:36.165000 23420091299648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank276]:W0429 13:55:36.166000 22794914531136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank258]:W0429 13:55:36.166000 22357897742144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank257]:W0429 13:55:36.166000 23034289919808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank283]:W0429 13:55:36.166000 22742092425024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank256]:W0429 13:55:36.166000 23400655296320 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank284]:W0429 13:55:36.166000 23075654465344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank285]:W0429 13:55:36.166000 22831325046592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank259]:W0429 13:55:36.167000 22391681009472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank286]:W0429 13:55:36.167000 22364310476608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank258]:W0429 13:55:36.167000 22357897742144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank283]:W0429 13:55:36.167000 22742092425024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank281]:W0429 13:55:36.167000 22510836000576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank280]:W0429 13:55:36.167000 22825091864384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank284]:W0429 13:55:36.168000 23075654465344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank285]:W0429 13:55:36.168000 22831325046592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank287]:W0429 13:55:36.168000 22997670954816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank282]:W0429 13:55:36.168000 22494350767936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank286]:W0429 13:55:36.168000 22364310476608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank281]:W0429 13:55:36.169000 22510836000576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank280]:W0429 13:55:36.169000 22825091864384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank287]:W0429 13:55:36.169000 22997670954816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank282]:W0429 13:55:36.170000 22494350767936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank304]:W0429 13:55:36.182000 23127762617344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank301]:W0429 13:55:36.182000 22546113193792 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank319]:W0429 13:55:36.182000 22755457058624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank307]:W0429 13:55:36.183000 22875396473856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank318]:W0429 13:55:36.183000 22602851030848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank306]:W0429 13:55:36.183000 22875074102272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank303]:W0429 13:55:36.183000 23351588300608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank304]:W0429 13:55:36.183000 23127762617344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank316]:W0429 13:55:36.183000 23314159970112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank305]:W0429 13:55:36.183000 22726035870720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank300]:W0429 13:55:36.184000 22396027983680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank301]:W0429 13:55:36.184000 22546113193792 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank267]:W0429 13:55:36.184000 22736532809536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank266]:W0429 13:55:36.184000 22544854738752 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank307]:W0429 13:55:36.184000 22875396473856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank319]:W0429 13:55:36.184000 22755457058624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank318]:W0429 13:55:36.184000 22602851030848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank306]:W0429 13:55:36.184000 22875074102272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank303]:W0429 13:55:36.184000 23351588300608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank317]:W0429 13:55:36.184000 22480827434816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank302]:W0429 13:55:36.184000 22514947888960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank315]:W0429 13:55:36.184000 22786684450624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank316]:W0429 13:55:36.184000 23314159970112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank264]:W0429 13:55:36.185000 23225003415360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank300]:W0429 13:55:36.185000 22396027983680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank269]:W0429 13:55:36.185000 23410760816448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank305]:W0429 13:55:36.185000 22726035870720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank265]:W0429 13:55:36.185000 22663757563712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank313]:W0429 13:55:36.185000 23218534467392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank267]:W0429 13:55:36.185000 22736532809536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank266]:W0429 13:55:36.185000 22544854738752 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank317]:W0429 13:55:36.185000 22480827434816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank270]:W0429 13:55:36.186000 22864214685504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank302]:W0429 13:55:36.186000 22514947888960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank273]:W0429 13:55:36.186000 23278099240768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank315]:W0429 13:55:36.186000 22786684450624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank314]:W0429 13:55:36.186000 22958924527424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank264]:W0429 13:55:36.186000 23225003415360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank275]:W0429 13:55:36.186000 23282051221312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank271]:W0429 13:55:36.186000 22488593782592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank269]:W0429 13:55:36.186000 23410760816448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank313]:W0429 13:55:36.186000 23218534467392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank265]:W0429 13:55:36.186000 22663757563712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank272]:W0429 13:55:36.186000 22649208649536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank312]:W0429 13:55:36.186000 22896953481024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank268]:W0429 13:55:36.187000 22858997614400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank274]:W0429 13:55:36.187000 22367477663552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank270]:W0429 13:55:36.187000 22864214685504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank273]:W0429 13:55:36.187000 23278099240768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank290]:W0429 13:55:36.187000 23136802756416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank314]:W0429 13:55:36.187000 22958924527424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank275]:W0429 13:55:36.187000 23282051221312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank271]:W0429 13:55:36.187000 22488593782592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank272]:W0429 13:55:36.188000 22649208649536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank312]:W0429 13:55:36.188000 22896953481024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank289]:W0429 13:55:36.188000 23213093029696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank268]:W0429 13:55:36.188000 22858997614400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank274]:W0429 13:55:36.188000 22367477663552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank291]:W0429 13:55:36.188000 22541384697664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank290]:W0429 13:55:36.188000 23136802756416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank288]:W0429 13:55:36.188000 22932850296640 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank289]:W0429 13:55:36.189000 23213093029696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank291]:W0429 13:55:36.190000 22541384697664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank308]:W0429 13:55:36.190000 22999437267968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank311]:W0429 13:55:36.190000 22459137106944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank288]:W0429 13:55:36.190000 22932850296640 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank309]:W0429 13:55:36.190000 23002111820800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank295]:W0429 13:55:36.191000 23045115094848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank298]:W0429 13:55:36.191000 22582818535232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank297]:W0429 13:55:36.191000 22847062742848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank308]:W0429 13:55:36.191000 22999437267968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank299]:W0429 13:55:36.191000 22449627662144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank311]:W0429 13:55:36.191000 22459137106944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank294]:W0429 13:55:36.191000 22564104423232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank310]:W0429 13:55:36.191000 22678955025408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank292]:W0429 13:55:36.191000 22632747714368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank309]:W0429 13:55:36.192000 23002111820800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank295]:W0429 13:55:36.192000 23045115094848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank296]:W0429 13:55:36.192000 22812956907328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank298]:W0429 13:55:36.192000 22582818535232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank293]:W0429 13:55:36.192000 23104736479040 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank297]:W0429 13:55:36.192000 22847062742848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank299]:W0429 13:55:36.192000 22449627662144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank277]:W0429 13:55:36.192000 22654782576448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank260]:W0429 13:55:36.192000 22516484765504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank262]:W0429 13:55:36.192000 23071180351296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank294]:W0429 13:55:36.193000 22564104423232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank310]:W0429 13:55:36.193000 22678955025408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank278]:W0429 13:55:36.193000 22401834968896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank292]:W0429 13:55:36.193000 22632747714368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank296]:W0429 13:55:36.193000 22812956907328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank293]:W0429 13:55:36.193000 23104736479040 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank279]:W0429 13:55:36.193000 22854448351040 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank261]:W0429 13:55:36.193000 22685628434240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank260]:W0429 13:55:36.194000 22516484765504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank277]:W0429 13:55:36.194000 22654782576448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank263]:W0429 13:55:36.194000 23420091299648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank262]:W0429 13:55:36.194000 23071180351296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank278]:W0429 13:55:36.194000 22401834968896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank276]:W0429 13:55:36.194000 22794914531136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank257]:W0429 13:55:36.194000 23034289919808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank256]:W0429 13:55:36.194000 23400655296320 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank279]:W0429 13:55:36.195000 22854448351040 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank261]:W0429 13:55:36.195000 22685628434240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank263]:W0429 13:55:36.195000 23420091299648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank259]:W0429 13:55:36.195000 22391681009472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank276]:W0429 13:55:36.195000 22794914531136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank257]:W0429 13:55:36.195000 23034289919808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank256]:W0429 13:55:36.196000 23400655296320 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank258]:W0429 13:55:36.196000 22357897742144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank283]:W0429 13:55:36.196000 22742092425024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank285]:W0429 13:55:36.196000 22831325046592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank284]:W0429 13:55:36.196000 23075654465344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank259]:W0429 13:55:36.196000 22391681009472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank283]:W0429 13:55:36.197000 22742092425024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank286]:W0429 13:55:36.197000 22364310476608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank258]:W0429 13:55:36.197000 22357897742144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank285]:W0429 13:55:36.197000 22831325046592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank284]:W0429 13:55:36.198000 23075654465344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank280]:W0429 13:55:36.198000 22825091864384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank281]:W0429 13:55:36.198000 22510836000576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank287]:W0429 13:55:36.198000 22997670954816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank282]:W0429 13:55:36.198000 22494350767936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank286]:W0429 13:55:36.198000 22364310476608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank280]:W0429 13:55:36.199000 22825091864384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank281]:W0429 13:55:36.199000 22510836000576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank287]:W0429 13:55:36.199000 22997670954816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank282]:W0429 13:55:36.199000 22494350767936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
[rank383]:W0429 13:55:54.199000 22556736902976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank380]:W0429 13:55:54.199000 22870728668992 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank382]:W0429 13:55:54.199000 23264265488192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank383]:W0429 13:55:54.200000 22556736902976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank380]:W0429 13:55:54.200000 22870728668992 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank382]:W0429 13:55:54.200000 23264265488192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank367]:W0429 13:55:54.201000 22608362129216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank381]:W0429 13:55:54.201000 22614743889728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank365]:W0429 13:55:54.201000 22627828692800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank364]:W0429 13:55:54.201000 22890390402880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank366]:W0429 13:55:54.201000 22607166797632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank367]:W0429 13:55:54.202000 22608362129216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank381]:W0429 13:55:54.202000 22614743889728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank365]:W0429 13:55:54.202000 22627828692800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank364]:W0429 13:55:54.203000 22890390402880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank366]:W0429 13:55:54.203000 22607166797632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank373]:W0429 13:55:54.203000 23002544293696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank374]:W0429 13:55:54.204000 22441605031744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank375]:W0429 13:55:54.204000 22865077942080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank349]:W0429 13:55:54.204000 22868971304768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank372]:W0429 13:55:54.204000 22566055671616 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank323]:W0429 13:55:54.204000 22971035297600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank321]:W0429 13:55:54.204000 22450125264704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank348]:W0429 13:55:54.204000 22926501889856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank351]:W0429 13:55:54.204000 22885069985600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank373]:W0429 13:55:54.204000 23002544293696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank350]:W0429 13:55:54.204000 22926912390976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank374]:W0429 13:55:54.205000 22441605031744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank375]:W0429 13:55:54.205000 22865077942080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank349]:W0429 13:55:54.205000 22868971304768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank333]:W0429 13:55:54.205000 23293754419008 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank332]:W0429 13:55:54.205000 22441781614400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank335]:W0429 13:55:54.205000 23258202478400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank323]:W0429 13:55:54.205000 22971035297600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank372]:W0429 13:55:54.205000 22566055671616 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank322]:W0429 13:55:54.205000 22866028549952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank321]:W0429 13:55:54.206000 22450125264704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank348]:W0429 13:55:54.206000 22926501889856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank334]:W0429 13:55:54.205000 23386934216512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank351]:W0429 13:55:54.206000 22885069985600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank350]:W0429 13:55:54.206000 22926912390976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank320]:W0429 13:55:54.206000 23327383484224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank326]:W0429 13:55:54.206000 23203190716224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank325]:W0429 13:55:54.206000 22662752663360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank332]:W0429 13:55:54.206000 22441781614400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank333]:W0429 13:55:54.206000 23293754419008 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank335]:W0429 13:55:54.206000 23258202478400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank324]:W0429 13:55:54.207000 23067564107584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank322]:W0429 13:55:54.207000 22866028549952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank327]:W0429 13:55:54.207000 22847574398784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank334]:W0429 13:55:54.207000 23386934216512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank320]:W0429 13:55:54.207000 23327383484224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank326]:W0429 13:55:54.207000 23203190716224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank325]:W0429 13:55:54.208000 22662752663360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank378]:W0429 13:55:54.208000 22491001349952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank324]:W0429 13:55:54.208000 23067564107584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank327]:W0429 13:55:54.208000 22847574398784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank379]:W0429 13:55:54.209000 22794595452736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank377]:W0429 13:55:54.209000 22843360270144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank378]:W0429 13:55:54.209000 22491001349952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank376]:W0429 13:55:54.210000 23211449399104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank379]:W0429 13:55:54.210000 22794595452736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank377]:W0429 13:55:54.210000 22843360270144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank343]:W0429 13:55:54.210000 22359920138048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank341]:W0429 13:55:54.210000 22754913838912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank342]:W0429 13:55:54.211000 22484517500736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank376]:W0429 13:55:54.211000 23211449399104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank370]:W0429 13:55:54.211000 22430383830848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank369]:W0429 13:55:54.211000 23034940294976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank371]:W0429 13:55:54.211000 22993928861504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank368]:W0429 13:55:54.211000 22560426473280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank343]:W0429 13:55:54.212000 22359920138048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank340]:W0429 13:55:54.212000 22477070616384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank341]:W0429 13:55:54.212000 22754913838912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank342]:W0429 13:55:54.212000 22484517500736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank344]:W0429 13:55:54.212000 23069412353856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank370]:W0429 13:55:54.212000 22430383830848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank346]:W0429 13:55:54.212000 22520540931904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank345]:W0429 13:55:54.213000 22534238447424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank369]:W0429 13:55:54.213000 23034940294976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank371]:W0429 13:55:54.213000 22993928861504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank368]:W0429 13:55:54.213000 22560426473280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank340]:W0429 13:55:54.213000 22477070616384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank360]:W0429 13:55:54.213000 23136244524864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank344]:W0429 13:55:54.214000 23069412353856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank362]:W0429 13:55:54.214000 22817024206656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank346]:W0429 13:55:54.214000 22520540931904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank363]:W0429 13:55:54.214000 23054759327552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank347]:W0429 13:55:54.214000 22503029344064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank345]:W0429 13:55:54.214000 22534238447424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank361]:W0429 13:55:54.214000 22397511513920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank360]:W0429 13:55:54.215000 23136244524864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank362]:W0429 13:55:54.215000 22817024206656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank363]:W0429 13:55:54.215000 23054759327552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank347]:W0429 13:55:54.215000 22503029344064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank361]:W0429 13:55:54.216000 22397511513920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank337]:W0429 13:55:54.216000 23250804184896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank353]:W0429 13:55:54.216000 22931403142976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank339]:W0429 13:55:54.217000 23241268291392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank338]:W0429 13:55:54.217000 23274747696960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank355]:W0429 13:55:54.217000 23307540023104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank337]:W0429 13:55:54.218000 23250804184896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank353]:W0429 13:55:54.218000 22931403142976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank354]:W0429 13:55:54.218000 22668500068160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank339]:W0429 13:55:54.218000 23241268291392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank355]:W0429 13:55:54.218000 23307540023104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank338]:W0429 13:55:54.218000 23274747696960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank336]:W0429 13:55:54.218000 22579939735360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank331]:W0429 13:55:54.219000 23411843065664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank354]:W0429 13:55:54.220000 22668500068160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank336]:W0429 13:55:54.220000 22579939735360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank352]:W0429 13:55:54.220000 23065695553344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank328]:W0429 13:55:54.220000 22471870928704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank329]:W0429 13:55:54.220000 22905199576896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank331]:W0429 13:55:54.220000 23411843065664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank330]:W0429 13:55:54.220000 22384422999872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank352]:W0429 13:55:54.221000 23065695553344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank328]:W0429 13:55:54.221000 22471870928704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank329]:W0429 13:55:54.222000 22905199576896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank330]:W0429 13:55:54.222000 22384422999872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank380]:W0429 13:55:54.229000 22870728668992 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank383]:W0429 13:55:54.229000 22556736902976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank382]:W0429 13:55:54.229000 23264265488192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank358]:W0429 13:55:54.230000 23169085376320 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank380]:W0429 13:55:54.230000 22870728668992 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank383]:W0429 13:55:54.230000 22556736902976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank382]:W0429 13:55:54.230000 23264265488192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank367]:W0429 13:55:54.230000 22608362129216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank356]:W0429 13:55:54.230000 22806514345792 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank365]:W0429 13:55:54.231000 22627828692800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank359]:W0429 13:55:54.231000 22512566675264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank357]:W0429 13:55:54.231000 22729659287360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank381]:W0429 13:55:54.231000 22614743889728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank366]:W0429 13:55:54.231000 22607166797632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank364]:W0429 13:55:54.231000 22890390402880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank358]:W0429 13:55:54.231000 23169085376320 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank367]:W0429 13:55:54.232000 22608362129216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank356]:W0429 13:55:54.232000 22806514345792 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank365]:W0429 13:55:54.232000 22627828692800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank359]:W0429 13:55:54.232000 22512566675264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank381]:W0429 13:55:54.232000 22614743889728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank357]:W0429 13:55:54.232000 22729659287360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank366]:W0429 13:55:54.232000 22607166797632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank364]:W0429 13:55:54.232000 22890390402880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank373]:W0429 13:55:54.233000 23002544293696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank374]:W0429 13:55:54.233000 22441605031744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank349]:W0429 13:55:54.233000 22868971304768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank375]:W0429 13:55:54.233000 22865077942080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank372]:W0429 13:55:54.234000 22566055671616 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank321]:W0429 13:55:54.234000 22450125264704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank373]:W0429 13:55:54.234000 23002544293696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank351]:W0429 13:55:54.234000 22885069985600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank323]:W0429 13:55:54.234000 22971035297600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank348]:W0429 13:55:54.234000 22926501889856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank350]:W0429 13:55:54.234000 22926912390976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank332]:W0429 13:55:54.234000 22441781614400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank333]:W0429 13:55:54.234000 23293754419008 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank349]:W0429 13:55:54.235000 22868971304768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank375]:W0429 13:55:54.235000 22865077942080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank374]:W0429 13:55:54.235000 22441605031744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank335]:W0429 13:55:54.235000 23258202478400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank372]:W0429 13:55:54.235000 22566055671616 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank321]:W0429 13:55:54.235000 22450125264704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank334]:W0429 13:55:54.235000 23386934216512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank351]:W0429 13:55:54.235000 22885069985600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank323]:W0429 13:55:54.236000 22971035297600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank322]:W0429 13:55:54.236000 22866028549952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank348]:W0429 13:55:54.236000 22926501889856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank350]:W0429 13:55:54.236000 22926912390976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank332]:W0429 13:55:54.236000 22441781614400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank333]:W0429 13:55:54.236000 23293754419008 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank325]:W0429 13:55:54.236000 22662752663360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank320]:W0429 13:55:54.236000 23327383484224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank326]:W0429 13:55:54.236000 23203190716224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank335]:W0429 13:55:54.236000 23258202478400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank324]:W0429 13:55:54.236000 23067564107584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank334]:W0429 13:55:54.237000 23386934216512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank322]:W0429 13:55:54.237000 22866028549952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank327]:W0429 13:55:54.237000 22847574398784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank325]:W0429 13:55:54.237000 22662752663360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank326]:W0429 13:55:54.237000 23203190716224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank320]:W0429 13:55:54.237000 23327383484224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank324]:W0429 13:55:54.238000 23067564107584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank378]:W0429 13:55:54.238000 22491001349952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank327]:W0429 13:55:54.238000 22847574398784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank379]:W0429 13:55:54.238000 22794595452736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank377]:W0429 13:55:54.239000 22843360270144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank378]:W0429 13:55:54.239000 22491001349952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank379]:W0429 13:55:54.240000 22794595452736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank343]:W0429 13:55:54.240000 22359920138048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank376]:W0429 13:55:54.240000 23211449399104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank377]:W0429 13:55:54.240000 22843360270144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank341]:W0429 13:55:54.240000 22754913838912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank342]:W0429 13:55:54.240000 22484517500736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank370]:W0429 13:55:54.241000 22430383830848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank343]:W0429 13:55:54.241000 22359920138048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank369]:W0429 13:55:54.241000 23034940294976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank376]:W0429 13:55:54.241000 23211449399104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank371]:W0429 13:55:54.241000 22993928861504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank341]:W0429 13:55:54.241000 22754913838912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank368]:W0429 13:55:54.241000 22560426473280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank340]:W0429 13:55:54.242000 22477070616384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank342]:W0429 13:55:54.242000 22484517500736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank344]:W0429 13:55:54.242000 23069412353856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank345]:W0429 13:55:54.242000 22534238447424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank346]:W0429 13:55:54.242000 22520540931904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank370]:W0429 13:55:54.242000 22430383830848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank369]:W0429 13:55:54.243000 23034940294976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank371]:W0429 13:55:54.243000 22993928861504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank368]:W0429 13:55:54.243000 22560426473280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank340]:W0429 13:55:54.243000 22477070616384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank360]:W0429 13:55:54.243000 23136244524864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank344]:W0429 13:55:54.243000 23069412353856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank345]:W0429 13:55:54.244000 22534238447424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank346]:W0429 13:55:54.244000 22520540931904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank363]:W0429 13:55:54.244000 23054759327552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank362]:W0429 13:55:54.244000 22817024206656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank347]:W0429 13:55:54.244000 22503029344064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank361]:W0429 13:55:54.244000 22397511513920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank360]:W0429 13:55:54.245000 23136244524864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank363]:W0429 13:55:54.245000 23054759327552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank362]:W0429 13:55:54.245000 22817024206656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank347]:W0429 13:55:54.245000 22503029344064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank361]:W0429 13:55:54.246000 22397511513920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank353]:W0429 13:55:54.246000 22931403142976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank337]:W0429 13:55:54.246000 23250804184896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank339]:W0429 13:55:54.247000 23241268291392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank355]:W0429 13:55:54.247000 23307540023104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank338]:W0429 13:55:54.247000 23274747696960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank353]:W0429 13:55:54.247000 22931403142976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank337]:W0429 13:55:54.248000 23250804184896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank354]:W0429 13:55:54.248000 22668500068160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank339]:W0429 13:55:54.248000 23241268291392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank355]:W0429 13:55:54.248000 23307540023104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank338]:W0429 13:55:54.248000 23274747696960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank336]:W0429 13:55:54.249000 22579939735360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank331]:W0429 13:55:54.249000 23411843065664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank354]:W0429 13:55:54.249000 22668500068160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank328]:W0429 13:55:54.250000 22471870928704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank329]:W0429 13:55:54.250000 22905199576896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank336]:W0429 13:55:54.250000 22579939735360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank331]:W0429 13:55:54.250000 23411843065664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank330]:W0429 13:55:54.250000 22384422999872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank328]:W0429 13:55:54.251000 22471870928704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank329]:W0429 13:55:54.251000 22905199576896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank352]:W0429 13:55:54.251000 23065695553344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank330]:W0429 13:55:54.251000 22384422999872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank352]:W0429 13:55:54.253000 23065695553344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank358]:W0429 13:55:54.259000 23169085376320 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank356]:W0429 13:55:54.260000 22806514345792 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank358]:W0429 13:55:54.260000 23169085376320 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank359]:W0429 13:55:54.260000 22512566675264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank357]:W0429 13:55:54.260000 22729659287360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank356]:W0429 13:55:54.261000 22806514345792 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank359]:W0429 13:55:54.262000 22512566675264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank357]:W0429 13:55:54.262000 22729659287360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
[rank431]:W0429 13:56:11.219000 22951535425344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank429]:W0429 13:56:11.219000 22670585063232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank428]:W0429 13:56:11.219000 22357477680960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank430]:W0429 13:56:11.220000 22952957794112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank442]:W0429 13:56:11.220000 22649695659840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank441]:W0429 13:56:11.220000 22453216909120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank431]:W0429 13:56:11.220000 22951535425344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank429]:W0429 13:56:11.221000 22670585063232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank428]:W0429 13:56:11.221000 22357477680960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank440]:W0429 13:56:11.221000 23098667796288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank438]:W0429 13:56:11.221000 22942593591104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank442]:W0429 13:56:11.221000 22649695659840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank420]:W0429 13:56:11.221000 22761717598016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank430]:W0429 13:56:11.221000 22952957794112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank422]:W0429 13:56:11.221000 22712689346368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank443]:W0429 13:56:11.221000 22637748520768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank436]:W0429 13:56:11.221000 23252806960960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank437]:W0429 13:56:11.221000 22689634854720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank441]:W0429 13:56:11.222000 22453216909120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank421]:W0429 13:56:11.222000 22437896886080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank444]:W0429 13:56:11.222000 22512078235456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank384]:W0429 13:56:11.222000 22937400198976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank385]:W0429 13:56:11.222000 22366113105728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank423]:W0429 13:56:11.222000 23231419569984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank439]:W0429 13:56:11.222000 22905536538432 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank438]:W0429 13:56:11.222000 22942593591104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank440]:W0429 13:56:11.222000 23098667796288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank420]:W0429 13:56:11.222000 22761717598016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank386]:W0429 13:56:11.222000 22646063871808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank445]:W0429 13:56:11.222000 22768583878464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank422]:W0429 13:56:11.222000 22712689346368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank436]:W0429 13:56:11.223000 23252806960960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank443]:W0429 13:56:11.223000 22637748520768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank387]:W0429 13:56:11.223000 23215811479360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank437]:W0429 13:56:11.223000 22689634854720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank446]:W0429 13:56:11.223000 22891032381248 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank421]:W0429 13:56:11.223000 22437896886080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank384]:W0429 13:56:11.223000 22937400198976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank444]:W0429 13:56:11.223000 22512078235456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank385]:W0429 13:56:11.223000 22366113105728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank447]:W0429 13:56:11.223000 22404247447360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank423]:W0429 13:56:11.223000 23231419569984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank439]:W0429 13:56:11.223000 22905536538432 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank386]:W0429 13:56:11.224000 22646063871808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank397]:W0429 13:56:11.224000 22949218973504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank445]:W0429 13:56:11.224000 22768583878464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank396]:W0429 13:56:11.224000 22512262661952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank387]:W0429 13:56:11.224000 23215811479360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank446]:W0429 13:56:11.224000 22891032381248 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank398]:W0429 13:56:11.224000 23046882916160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank399]:W0429 13:56:11.225000 23221098444608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank447]:W0429 13:56:11.225000 22404247447360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank397]:W0429 13:56:11.225000 22949218973504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank396]:W0429 13:56:11.225000 22512262661952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank398]:W0429 13:56:11.226000 23046882916160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank399]:W0429 13:56:11.226000 23221098444608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank412]:W0429 13:56:11.227000 22455788693312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank415]:W0429 13:56:11.228000 22481465296704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank413]:W0429 13:56:11.228000 23423858136896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank412]:W0429 13:56:11.228000 22455788693312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank414]:W0429 13:56:11.229000 23241859835712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank406]:W0429 13:56:11.229000 23295765342016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank407]:W0429 13:56:11.229000 22614830737216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank415]:W0429 13:56:11.229000 22481465296704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank405]:W0429 13:56:11.229000 22742770452288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank413]:W0429 13:56:11.229000 23423858136896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank414]:W0429 13:56:11.230000 23241859835712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank406]:W0429 13:56:11.230000 23295765342016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank407]:W0429 13:56:11.230000 22614830737216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank405]:W0429 13:56:11.230000 22742770452288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank404]:W0429 13:56:11.231000 23238445635392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank416]:W0429 13:56:11.231000 22955913787200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank418]:W0429 13:56:11.231000 23447253976896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank419]:W0429 13:56:11.231000 23239836100416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank404]:W0429 13:56:11.232000 23238445635392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank417]:W0429 13:56:11.232000 23319353452352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank416]:W0429 13:56:11.233000 22955913787200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank418]:W0429 13:56:11.233000 23447253976896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank419]:W0429 13:56:11.233000 23239836100416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank389]:W0429 13:56:11.234000 23268836321088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank402]:W0429 13:56:11.234000 23367619778368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank401]:W0429 13:56:11.234000 22497940711232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank417]:W0429 13:56:11.234000 23319353452352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank403]:W0429 13:56:11.234000 22672238352192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank390]:W0429 13:56:11.234000 22701991978816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank400]:W0429 13:56:11.235000 22774152959808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank389]:W0429 13:56:11.235000 23268836321088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank402]:W0429 13:56:11.235000 23367619778368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank401]:W0429 13:56:11.235000 22497940711232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank388]:W0429 13:56:11.235000 23332502480704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank403]:W0429 13:56:11.235000 22672238352192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank434]:W0429 13:56:11.235000 23242611595072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank390]:W0429 13:56:11.236000 22701991978816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank391]:W0429 13:56:11.236000 23017569298240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank432]:W0429 13:56:11.236000 23163105605440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank400]:W0429 13:56:11.236000 22774152959808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank435]:W0429 13:56:11.236000 22513393633088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank395]:W0429 13:56:11.237000 23252114298688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank411]:W0429 13:56:11.237000 23197328906048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank393]:W0429 13:56:11.237000 22866031331136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank388]:W0429 13:56:11.237000 23332502480704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank394]:W0429 13:56:11.237000 22834332890944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank392]:W0429 13:56:11.237000 22934523266880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank434]:W0429 13:56:11.237000 23242611595072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank409]:W0429 13:56:11.237000 23100200687424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank410]:W0429 13:56:11.237000 22714227951424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank433]:W0429 13:56:11.237000 22916192630592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank432]:W0429 13:56:11.237000 23163105605440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank391]:W0429 13:56:11.237000 23017569298240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank408]:W0429 13:56:11.238000 22641381222208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank435]:W0429 13:56:11.238000 22513393633088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank395]:W0429 13:56:11.238000 23252114298688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank411]:W0429 13:56:11.238000 23197328906048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank393]:W0429 13:56:11.238000 22866031331136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank394]:W0429 13:56:11.238000 22834332890944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank392]:W0429 13:56:11.238000 22934523266880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank409]:W0429 13:56:11.238000 23100200687424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank410]:W0429 13:56:11.239000 22714227951424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank433]:W0429 13:56:11.239000 22916192630592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank427]:W0429 13:56:11.239000 23444843583296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank408]:W0429 13:56:11.239000 22641381222208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank425]:W0429 13:56:11.240000 22703602251584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank426]:W0429 13:56:11.240000 23081067693888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank424]:W0429 13:56:11.240000 23330083211072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank427]:W0429 13:56:11.240000 23444843583296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank425]:W0429 13:56:11.241000 22703602251584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank426]:W0429 13:56:11.241000 23081067693888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank424]:W0429 13:56:11.241000 23330083211072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank431]:W0429 13:56:11.248000 22951535425344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank429]:W0429 13:56:11.249000 22670585063232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank428]:W0429 13:56:11.249000 22357477680960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank430]:W0429 13:56:11.249000 22952957794112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank442]:W0429 13:56:11.249000 22649695659840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank431]:W0429 13:56:11.250000 22951535425344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank441]:W0429 13:56:11.250000 22453216909120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank429]:W0429 13:56:11.250000 22670585063232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank438]:W0429 13:56:11.250000 22942593591104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank428]:W0429 13:56:11.251000 22357477680960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank442]:W0429 13:56:11.251000 22649695659840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank430]:W0429 13:56:11.251000 22952957794112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank436]:W0429 13:56:11.251000 23252806960960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank440]:W0429 13:56:11.251000 23098667796288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank422]:W0429 13:56:11.251000 22712689346368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank443]:W0429 13:56:11.251000 22637748520768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank437]:W0429 13:56:11.251000 22689634854720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank420]:W0429 13:56:11.251000 22761717598016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank421]:W0429 13:56:11.251000 22437896886080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank444]:W0429 13:56:11.251000 22512078235456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank441]:W0429 13:56:11.251000 22453216909120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank438]:W0429 13:56:11.252000 22942593591104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank385]:W0429 13:56:11.252000 22366113105728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank384]:W0429 13:56:11.252000 22937400198976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank423]:W0429 13:56:11.252000 23231419569984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank445]:W0429 13:56:11.252000 22768583878464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank439]:W0429 13:56:11.252000 22905536538432 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank436]:W0429 13:56:11.252000 23252806960960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank386]:W0429 13:56:11.252000 22646063871808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank440]:W0429 13:56:11.252000 23098667796288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank422]:W0429 13:56:11.252000 22712689346368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank443]:W0429 13:56:11.252000 22637748520768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank437]:W0429 13:56:11.252000 22689634854720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank420]:W0429 13:56:11.252000 22761717598016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank387]:W0429 13:56:11.252000 23215811479360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank421]:W0429 13:56:11.253000 22437896886080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank446]:W0429 13:56:11.253000 22891032381248 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank444]:W0429 13:56:11.253000 22512078235456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank385]:W0429 13:56:11.253000 22366113105728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank384]:W0429 13:56:11.253000 22937400198976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank397]:W0429 13:56:11.253000 22949218973504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank447]:W0429 13:56:11.253000 22404247447360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank396]:W0429 13:56:11.253000 22512262661952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank445]:W0429 13:56:11.253000 22768583878464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank423]:W0429 13:56:11.253000 23231419569984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank439]:W0429 13:56:11.253000 22905536538432 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank386]:W0429 13:56:11.253000 22646063871808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank387]:W0429 13:56:11.254000 23215811479360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank446]:W0429 13:56:11.254000 22891032381248 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank398]:W0429 13:56:11.254000 23046882916160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank399]:W0429 13:56:11.254000 23221098444608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank397]:W0429 13:56:11.254000 22949218973504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank447]:W0429 13:56:11.255000 22404247447360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank396]:W0429 13:56:11.255000 22512262661952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank398]:W0429 13:56:11.256000 23046882916160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank399]:W0429 13:56:11.256000 23221098444608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank412]:W0429 13:56:11.257000 22455788693312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank415]:W0429 13:56:11.257000 22481465296704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank413]:W0429 13:56:11.258000 23423858136896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank412]:W0429 13:56:11.258000 22455788693312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank414]:W0429 13:56:11.259000 23241859835712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank407]:W0429 13:56:11.259000 22614830737216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank406]:W0429 13:56:11.259000 23295765342016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank415]:W0429 13:56:11.259000 22481465296704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank405]:W0429 13:56:11.259000 22742770452288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank413]:W0429 13:56:11.259000 23423858136896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank414]:W0429 13:56:11.260000 23241859835712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank407]:W0429 13:56:11.260000 22614830737216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank406]:W0429 13:56:11.260000 23295765342016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank404]:W0429 13:56:11.260000 23238445635392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank405]:W0429 13:56:11.260000 22742770452288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank416]:W0429 13:56:11.261000 22955913787200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank418]:W0429 13:56:11.261000 23447253976896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank419]:W0429 13:56:11.261000 23239836100416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank404]:W0429 13:56:11.262000 23238445635392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank416]:W0429 13:56:11.263000 22955913787200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank417]:W0429 13:56:11.263000 23319353452352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank418]:W0429 13:56:11.263000 23447253976896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank419]:W0429 13:56:11.263000 23239836100416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank389]:W0429 13:56:11.263000 23268836321088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank401]:W0429 13:56:11.264000 22497940711232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank402]:W0429 13:56:11.264000 23367619778368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank390]:W0429 13:56:11.264000 22701991978816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank417]:W0429 13:56:11.264000 23319353452352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank403]:W0429 13:56:11.264000 22672238352192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank389]:W0429 13:56:11.264000 23268836321088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank401]:W0429 13:56:11.265000 22497940711232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank402]:W0429 13:56:11.265000 23367619778368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank400]:W0429 13:56:11.265000 22774152959808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank388]:W0429 13:56:11.265000 23332502480704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank434]:W0429 13:56:11.265000 23242611595072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank390]:W0429 13:56:11.265000 22701991978816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank403]:W0429 13:56:11.265000 22672238352192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank391]:W0429 13:56:11.266000 23017569298240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank432]:W0429 13:56:11.266000 23163105605440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank400]:W0429 13:56:11.266000 22774152959808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank388]:W0429 13:56:11.266000 23332502480704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank435]:W0429 13:56:11.266000 22513393633088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank434]:W0429 13:56:11.266000 23242611595072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank395]:W0429 13:56:11.267000 23252114298688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank393]:W0429 13:56:11.267000 22866031331136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank411]:W0429 13:56:11.267000 23197328906048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank394]:W0429 13:56:11.267000 22834332890944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank392]:W0429 13:56:11.267000 22934523266880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank409]:W0429 13:56:11.267000 23100200687424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank433]:W0429 13:56:11.267000 22916192630592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank391]:W0429 13:56:11.267000 23017569298240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank432]:W0429 13:56:11.267000 23163105605440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank410]:W0429 13:56:11.267000 22714227951424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank435]:W0429 13:56:11.268000 22513393633088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank395]:W0429 13:56:11.268000 23252114298688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank393]:W0429 13:56:11.268000 22866031331136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank411]:W0429 13:56:11.268000 23197328906048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank394]:W0429 13:56:11.268000 22834332890944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank392]:W0429 13:56:11.268000 22934523266880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank409]:W0429 13:56:11.268000 23100200687424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank408]:W0429 13:56:11.268000 22641381222208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank433]:W0429 13:56:11.268000 22916192630592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank427]:W0429 13:56:11.268000 23444843583296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank410]:W0429 13:56:11.269000 22714227951424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank408]:W0429 13:56:11.270000 22641381222208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank425]:W0429 13:56:11.270000 22703602251584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank427]:W0429 13:56:11.270000 23444843583296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank424]:W0429 13:56:11.270000 23330083211072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank426]:W0429 13:56:11.270000 23081067693888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank425]:W0429 13:56:11.271000 22703602251584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank424]:W0429 13:56:11.271000 23330083211072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank426]:W0429 13:56:11.271000 23081067693888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
[rank507]:W0429 13:56:34.445000 22361245509440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank467]:W0429 13:56:34.446000 23324186007360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank504]:W0429 13:56:34.446000 22746618603328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank505]:W0429 13:56:34.446000 23027986097984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank464]:W0429 13:56:34.446000 22443972241216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank506]:W0429 13:56:34.446000 22417658017600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank466]:W0429 13:56:34.446000 23017173817152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank473]:W0429 13:56:34.446000 22784136845120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank507]:W0429 13:56:34.447000 22361245509440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank465]:W0429 13:56:34.447000 22907261994816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank467]:W0429 13:56:34.447000 23324186007360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank505]:W0429 13:56:34.447000 23027986097984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank504]:W0429 13:56:34.447000 22746618603328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank464]:W0429 13:56:34.447000 22443972241216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank475]:W0429 13:56:34.447000 23070958032704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank474]:W0429 13:56:34.447000 23152234903360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank506]:W0429 13:56:34.448000 22417658017600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank473]:W0429 13:56:34.448000 22784136845120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank466]:W0429 13:56:34.448000 23017173817152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank472]:W0429 13:56:34.448000 22423906285376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank465]:W0429 13:56:34.448000 22907261994816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank475]:W0429 13:56:34.449000 23070958032704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank474]:W0429 13:56:34.449000 23152234903360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank448]:W0429 13:56:34.449000 23176198432576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank472]:W0429 13:56:34.449000 22423906285376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank450]:W0429 13:56:34.449000 22641597495104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank451]:W0429 13:56:34.449000 23310950033216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank448]:W0429 13:56:34.451000 23176198432576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank450]:W0429 13:56:34.451000 22641597495104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank489]:W0429 13:56:34.451000 22745110079296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank449]:W0429 13:56:34.451000 22752745822016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank451]:W0429 13:56:34.451000 23310950033216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank490]:W0429 13:56:34.451000 22676382320448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank468]:W0429 13:56:34.451000 22763188746048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank491]:W0429 13:56:34.451000 22722845161280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank502]:W0429 13:56:34.451000 23016363562816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank471]:W0429 13:56:34.451000 22426093360960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank501]:W0429 13:56:34.452000 22659964262208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank511]:W0429 13:56:34.452000 23396320180032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank489]:W0429 13:56:34.452000 22745110079296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank508]:W0429 13:56:34.452000 23051299559232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank503]:W0429 13:56:34.452000 23151625328448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank470]:W0429 13:56:34.452000 22504873580352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank500]:W0429 13:56:34.452000 22549561767744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank449]:W0429 13:56:34.452000 22752745822016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank469]:W0429 13:56:34.452000 23076983195456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank490]:W0429 13:56:34.453000 22676382320448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank468]:W0429 13:56:34.453000 22763188746048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank509]:W0429 13:56:34.453000 23306790573888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank502]:W0429 13:56:34.453000 23016363562816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank491]:W0429 13:56:34.453000 22722845161280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank471]:W0429 13:56:34.453000 22426093360960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank510]:W0429 13:56:34.453000 23370018850624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank501]:W0429 13:56:34.453000 22659964262208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank488]:W0429 13:56:34.453000 22611438286656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank511]:W0429 13:56:34.453000 23396320180032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank503]:W0429 13:56:34.453000 23151625328448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank508]:W0429 13:56:34.453000 23051299559232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank470]:W0429 13:56:34.453000 22504873580352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank500]:W0429 13:56:34.454000 22549561767744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank469]:W0429 13:56:34.454000 23076983195456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank509]:W0429 13:56:34.454000 23306790573888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank499]:W0429 13:56:34.454000 22926282159936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank510]:W0429 13:56:34.454000 23370018850624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank498]:W0429 13:56:34.454000 22773604042560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank488]:W0429 13:56:34.455000 22611438286656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank496]:W0429 13:56:34.455000 23126151235392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank499]:W0429 13:56:34.456000 22926282159936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank498]:W0429 13:56:34.456000 22773604042560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank497]:W0429 13:56:34.456000 22829898442560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank453]:W0429 13:56:34.456000 22744295339840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank496]:W0429 13:56:34.457000 23126151235392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank497]:W0429 13:56:34.457000 22829898442560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank453]:W0429 13:56:34.457000 22744295339840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank493]:W0429 13:56:34.458000 22694004135744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank452]:W0429 13:56:34.458000 22506324031296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank455]:W0429 13:56:34.458000 22743219210048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank495]:W0429 13:56:34.458000 23238761232192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank461]:W0429 13:56:34.458000 23210510718784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank454]:W0429 13:56:34.458000 22660091610944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank492]:W0429 13:56:34.459000 23119146334016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank463]:W0429 13:56:34.459000 23106920064832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank493]:W0429 13:56:34.459000 22694004135744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank452]:W0429 13:56:34.459000 22506324031296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank460]:W0429 13:56:34.459000 22846697215808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank462]:W0429 13:56:34.459000 23243672041280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank455]:W0429 13:56:34.460000 22743219210048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank461]:W0429 13:56:34.460000 23210510718784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank495]:W0429 13:56:34.460000 23238761232192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank454]:W0429 13:56:34.460000 22660091610944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank494]:W0429 13:56:34.460000 23126543652672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank492]:W0429 13:56:34.460000 23119146334016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank463]:W0429 13:56:34.460000 23106920064832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank460]:W0429 13:56:34.461000 22846697215808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank462]:W0429 13:56:34.461000 23243672041280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank494]:W0429 13:56:34.461000 23126543652672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank458]:W0429 13:56:34.462000 23232923215680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank459]:W0429 13:56:34.462000 22869777487680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank458]:W0429 13:56:34.463000 23232923215680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank483]:W0429 13:56:34.463000 23216585144128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank457]:W0429 13:56:34.463000 23162835486528 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank482]:W0429 13:56:34.463000 23368232335168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank459]:W0429 13:56:34.464000 22869777487680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank456]:W0429 13:56:34.464000 23220885026624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank480]:W0429 13:56:34.464000 22697628272448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank484]:W0429 13:56:34.464000 23138720737088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank483]:W0429 13:56:34.464000 23216585144128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank457]:W0429 13:56:34.465000 23162835486528 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank482]:W0429 13:56:34.465000 23368232335168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank486]:W0429 13:56:34.465000 22828633618240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank487]:W0429 13:56:34.465000 23153512998720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank456]:W0429 13:56:34.465000 23220885026624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank480]:W0429 13:56:34.465000 22697628272448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank481]:W0429 13:56:34.465000 23375286990656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank484]:W0429 13:56:34.465000 23138720737088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank485]:W0429 13:56:34.465000 23046725105472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank486]:W0429 13:56:34.466000 22828633618240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank487]:W0429 13:56:34.466000 23153512998720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank478]:W0429 13:56:34.467000 22964760631104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank485]:W0429 13:56:34.467000 23046725105472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank481]:W0429 13:56:34.467000 23375286990656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank476]:W0429 13:56:34.468000 23076836583232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank478]:W0429 13:56:34.468000 22964760631104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank477]:W0429 13:56:34.468000 22963107784512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank479]:W0429 13:56:34.468000 22522386667328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank476]:W0429 13:56:34.470000 23076836583232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank479]:W0429 13:56:34.470000 22522386667328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank477]:W0429 13:56:34.470000 22963107784512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank507]:W0429 13:56:34.475000 22361245509440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank505]:W0429 13:56:34.475000 23027986097984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank504]:W0429 13:56:34.475000 22746618603328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank467]:W0429 13:56:34.475000 23324186007360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank464]:W0429 13:56:34.476000 22443972241216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank473]:W0429 13:56:34.476000 22784136845120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank506]:W0429 13:56:34.476000 22417658017600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank466]:W0429 13:56:34.476000 23017173817152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank507]:W0429 13:56:34.476000 22361245509440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank505]:W0429 13:56:34.477000 23027986097984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank504]:W0429 13:56:34.477000 22746618603328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank467]:W0429 13:56:34.477000 23324186007360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank465]:W0429 13:56:34.477000 22907261994816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank464]:W0429 13:56:34.477000 22443972241216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank475]:W0429 13:56:34.477000 23070958032704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank474]:W0429 13:56:34.477000 23152234903360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank473]:W0429 13:56:34.477000 22784136845120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank506]:W0429 13:56:34.478000 22417658017600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank466]:W0429 13:56:34.478000 23017173817152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank472]:W0429 13:56:34.478000 22423906285376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank465]:W0429 13:56:34.478000 22907261994816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank475]:W0429 13:56:34.479000 23070958032704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank474]:W0429 13:56:34.479000 23152234903360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank448]:W0429 13:56:34.479000 23176198432576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank450]:W0429 13:56:34.479000 22641597495104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank472]:W0429 13:56:34.479000 22423906285376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank451]:W0429 13:56:34.479000 23310950033216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank448]:W0429 13:56:34.480000 23176198432576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank450]:W0429 13:56:34.480000 22641597495104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank489]:W0429 13:56:34.481000 22745110079296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank451]:W0429 13:56:34.481000 23310950033216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank490]:W0429 13:56:34.481000 22676382320448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank502]:W0429 13:56:34.481000 23016363562816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank468]:W0429 13:56:34.481000 22763188746048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank471]:W0429 13:56:34.481000 22426093360960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank449]:W0429 13:56:34.481000 22752745822016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank501]:W0429 13:56:34.481000 22659964262208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank491]:W0429 13:56:34.481000 22722845161280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank511]:W0429 13:56:34.481000 23396320180032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank503]:W0429 13:56:34.482000 23151625328448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank508]:W0429 13:56:34.482000 23051299559232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank470]:W0429 13:56:34.482000 22504873580352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank489]:W0429 13:56:34.482000 22745110079296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank500]:W0429 13:56:34.482000 22549561767744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank490]:W0429 13:56:34.482000 22676382320448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank502]:W0429 13:56:34.482000 23016363562816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank468]:W0429 13:56:34.482000 22763188746048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank469]:W0429 13:56:34.482000 23076983195456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank471]:W0429 13:56:34.482000 22426093360960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank509]:W0429 13:56:34.482000 23306790573888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank449]:W0429 13:56:34.482000 22752745822016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank501]:W0429 13:56:34.483000 22659964262208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank510]:W0429 13:56:34.483000 23370018850624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank491]:W0429 13:56:34.483000 22722845161280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank511]:W0429 13:56:34.483000 23396320180032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank503]:W0429 13:56:34.483000 23151625328448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank508]:W0429 13:56:34.483000 23051299559232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank470]:W0429 13:56:34.483000 22504873580352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank500]:W0429 13:56:34.483000 22549561767744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank469]:W0429 13:56:34.483000 23076983195456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank488]:W0429 13:56:34.484000 22611438286656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank509]:W0429 13:56:34.484000 23306790573888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank510]:W0429 13:56:34.484000 23370018850624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank499]:W0429 13:56:34.484000 22926282159936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank498]:W0429 13:56:34.484000 22773604042560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank496]:W0429 13:56:34.485000 23126151235392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank488]:W0429 13:56:34.485000 22611438286656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank499]:W0429 13:56:34.485000 22926282159936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank498]:W0429 13:56:34.486000 22773604042560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank497]:W0429 13:56:34.486000 22829898442560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank453]:W0429 13:56:34.486000 22744295339840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank496]:W0429 13:56:34.486000 23126151235392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank453]:W0429 13:56:34.487000 22744295339840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank497]:W0429 13:56:34.487000 22829898442560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank493]:W0429 13:56:34.488000 22694004135744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank452]:W0429 13:56:34.487000 22506324031296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank461]:W0429 13:56:34.488000 23210510718784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank455]:W0429 13:56:34.488000 22743219210048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank454]:W0429 13:56:34.488000 22660091610944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank495]:W0429 13:56:34.488000 23238761232192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank492]:W0429 13:56:34.488000 23119146334016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank493]:W0429 13:56:34.489000 22694004135744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank460]:W0429 13:56:34.489000 22846697215808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank452]:W0429 13:56:34.489000 22506324031296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank463]:W0429 13:56:34.489000 23106920064832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank462]:W0429 13:56:34.489000 23243672041280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank461]:W0429 13:56:34.489000 23210510718784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank455]:W0429 13:56:34.489000 22743219210048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank454]:W0429 13:56:34.489000 22660091610944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank495]:W0429 13:56:34.490000 23238761232192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank492]:W0429 13:56:34.490000 23119146334016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank494]:W0429 13:56:34.490000 23126543652672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank463]:W0429 13:56:34.490000 23106920064832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank462]:W0429 13:56:34.490000 23243672041280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank460]:W0429 13:56:34.491000 22846697215808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank494]:W0429 13:56:34.491000 23126543652672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank458]:W0429 13:56:34.491000 23232923215680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank459]:W0429 13:56:34.492000 22869777487680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank458]:W0429 13:56:34.493000 23232923215680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank483]:W0429 13:56:34.493000 23216585144128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank459]:W0429 13:56:34.493000 22869777487680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank457]:W0429 13:56:34.493000 23162835486528 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank482]:W0429 13:56:34.493000 23368232335168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank484]:W0429 13:56:34.494000 23138720737088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank483]:W0429 13:56:34.494000 23216585144128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank480]:W0429 13:56:34.494000 22697628272448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank456]:W0429 13:56:34.494000 23220885026624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank486]:W0429 13:56:34.494000 22828633618240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank457]:W0429 13:56:34.495000 23162835486528 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank487]:W0429 13:56:34.495000 23153512998720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank482]:W0429 13:56:34.495000 23368232335168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank484]:W0429 13:56:34.495000 23138720737088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank485]:W0429 13:56:34.495000 23046725105472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank480]:W0429 13:56:34.495000 22697628272448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank481]:W0429 13:56:34.495000 23375286990656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank456]:W0429 13:56:34.496000 23220885026624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank486]:W0429 13:56:34.496000 22828633618240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank487]:W0429 13:56:34.496000 23153512998720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank485]:W0429 13:56:34.497000 23046725105472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank478]:W0429 13:56:34.497000 22964760631104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank481]:W0429 13:56:34.497000 23375286990656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank476]:W0429 13:56:34.498000 23076836583232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank478]:W0429 13:56:34.498000 22964760631104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank477]:W0429 13:56:34.498000 22963107784512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank479]:W0429 13:56:34.498000 22522386667328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank476]:W0429 13:56:34.499000 23076836583232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank477]:W0429 13:56:34.500000 22963107784512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank479]:W0429 13:56:34.500000 22522386667328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
[rank565]:W0429 13:56:57.755000 22565773408064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank564]:W0429 13:56:57.755000 22535089190720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank567]:W0429 13:56:57.755000 22584487786304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank565]:W0429 13:56:57.756000 22565773408064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank564]:W0429 13:56:57.756000 22535089190720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank567]:W0429 13:56:57.757000 22584487786304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank566]:W0429 13:56:57.757000 22426856404800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank566]:W0429 13:56:57.759000 22426856404800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank549]:W0429 13:56:57.759000 23008057497408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank548]:W0429 13:56:57.759000 22941174024000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank520]:W0429 13:56:57.759000 23334123398976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank550]:W0429 13:56:57.760000 23232866502464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank549]:W0429 13:56:57.760000 23008057497408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank551]:W0429 13:56:57.760000 22376964278080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank521]:W0429 13:56:57.760000 22997051119424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank522]:W0429 13:56:57.760000 22493567997760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank548]:W0429 13:56:57.761000 22941174024000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank520]:W0429 13:56:57.761000 23334123398976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank547]:W0429 13:56:57.761000 23088194828096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank550]:W0429 13:56:57.761000 23232866502464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank545]:W0429 13:56:57.761000 23301879301952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank563]:W0429 13:56:57.761000 23287863494464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank523]:W0429 13:56:57.761000 23356703872832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank575]:W0429 13:56:57.761000 23174516569920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank551]:W0429 13:56:57.762000 22376964278080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank555]:W0429 13:56:57.761000 22451617392448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank532]:W0429 13:56:57.762000 22602198030144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank521]:W0429 13:56:57.762000 22997051119424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank522]:W0429 13:56:57.762000 22493567997760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank536]:W0429 13:56:57.762000 22735768819520 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank554]:W0429 13:56:57.762000 22366861752128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank544]:W0429 13:56:57.762000 22492820657984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank553]:W0429 13:56:57.762000 22884358231872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank531]:W0429 13:56:57.762000 22377523238720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank572]:W0429 13:56:57.762000 22995591173952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank538]:W0429 13:56:57.762000 22723151595328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank560]:W0429 13:56:57.762000 23207245494080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank530]:W0429 13:56:57.762000 23320120518464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank561]:W0429 13:56:57.762000 22638208882496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank547]:W0429 13:56:57.762000 23088194828096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank534]:W0429 13:56:57.762000 22705775216448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank543]:W0429 13:56:57.762000 22671998539584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank552]:W0429 13:56:57.763000 22658267547456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank545]:W0429 13:56:57.763000 23301879301952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank563]:W0429 13:56:57.763000 23287863494464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank523]:W0429 13:56:57.763000 23356703872832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank562]:W0429 13:56:57.763000 22724619601728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank573]:W0429 13:56:57.763000 22848198870848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank575]:W0429 13:56:57.763000 23174516569920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank555]:W0429 13:56:57.763000 22451617392448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank528]:W0429 13:56:57.763000 23136158848832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank532]:W0429 13:56:57.763000 22602198030144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank574]:W0429 13:56:57.763000 23449210459968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank542]:W0429 13:56:57.763000 23400133113664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank569]:W0429 13:56:57.763000 23025217898304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank537]:W0429 13:56:57.763000 23160562607936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank536]:W0429 13:56:57.763000 22735768819520 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank533]:W0429 13:56:57.763000 22608679524160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank529]:W0429 13:56:57.763000 22694409447232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank540]:W0429 13:56:57.763000 22588840044352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank546]:W0429 13:56:57.763000 23160892380992 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank535]:W0429 13:56:57.763000 23020759340864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank554]:W0429 13:56:57.763000 22366861752128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank541]:W0429 13:56:57.763000 22476915005248 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank571]:W0429 13:56:57.763000 22657427879744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank544]:W0429 13:56:57.764000 22492820657984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank553]:W0429 13:56:57.764000 22884358231872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank531]:W0429 13:56:57.764000 22377523238720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank572]:W0429 13:56:57.764000 22995591173952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank568]:W0429 13:56:57.764000 22627546986304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank556]:W0429 13:56:57.764000 23205534914368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank538]:W0429 13:56:57.764000 22723151595328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank560]:W0429 13:56:57.764000 23207245494080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank530]:W0429 13:56:57.764000 23320120518464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank561]:W0429 13:56:57.764000 22638208882496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank526]:W0429 13:56:57.764000 22931767183168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank570]:W0429 13:56:57.764000 23275141949248 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank534]:W0429 13:56:57.764000 22705775216448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank543]:W0429 13:56:57.764000 22671998539584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank552]:W0429 13:56:57.764000 22658267547456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank573]:W0429 13:56:57.764000 22848198870848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank562]:W0429 13:56:57.764000 22724619601728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank528]:W0429 13:56:57.764000 23136158848832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank527]:W0429 13:56:57.764000 23238924977984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank574]:W0429 13:56:57.764000 23449210459968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank539]:W0429 13:56:57.764000 22403917678400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank542]:W0429 13:56:57.764000 23400133113664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank569]:W0429 13:56:57.764000 23025217898304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank537]:W0429 13:56:57.764000 23160562607936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank524]:W0429 13:56:57.764000 23235223582528 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank533]:W0429 13:56:57.764000 22608679524160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank529]:W0429 13:56:57.765000 22694409447232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank540]:W0429 13:56:57.765000 22588840044352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank558]:W0429 13:56:57.764000 23319886497600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank546]:W0429 13:56:57.765000 23160892380992 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank535]:W0429 13:56:57.765000 23020759340864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank541]:W0429 13:56:57.765000 22476915005248 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank525]:W0429 13:56:57.765000 22753390888768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank571]:W0429 13:56:57.765000 22657427879744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank557]:W0429 13:56:57.765000 22461013473088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank568]:W0429 13:56:57.765000 22627546986304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank556]:W0429 13:56:57.765000 23205534914368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank526]:W0429 13:56:57.765000 22931767183168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank559]:W0429 13:56:57.765000 23441618888512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank570]:W0429 13:56:57.765000 23275141949248 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank527]:W0429 13:56:57.766000 23238924977984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank524]:W0429 13:56:57.766000 23235223582528 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank539]:W0429 13:56:57.766000 22403917678400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank558]:W0429 13:56:57.766000 23319886497600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank525]:W0429 13:56:57.766000 22753390888768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank557]:W0429 13:56:57.766000 22461013473088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank559]:W0429 13:56:57.766000 23441618888512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank516]:W0429 13:56:57.769000 23291955988288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank519]:W0429 13:56:57.769000 22363046254400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank518]:W0429 13:56:57.769000 22434595665728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank513]:W0429 13:56:57.770000 23267651802944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank516]:W0429 13:56:57.770000 23291955988288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank517]:W0429 13:56:57.770000 23053019723584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank514]:W0429 13:56:57.770000 22687983380288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank519]:W0429 13:56:57.771000 22363046254400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank518]:W0429 13:56:57.771000 22434595665728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank515]:W0429 13:56:57.771000 23431350937408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank512]:W0429 13:56:57.771000 22450596362048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank513]:W0429 13:56:57.771000 23267651802944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank517]:W0429 13:56:57.772000 23053019723584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank514]:W0429 13:56:57.772000 22687983380288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank515]:W0429 13:56:57.772000 23431350937408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank512]:W0429 13:56:57.773000 22450596362048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank565]:W0429 13:56:57.784000 22565773408064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank567]:W0429 13:56:57.785000 22584487786304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank564]:W0429 13:56:57.785000 22535089190720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank565]:W0429 13:56:57.786000 22565773408064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank567]:W0429 13:56:57.786000 22584487786304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank564]:W0429 13:56:57.786000 22535089190720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank566]:W0429 13:56:57.787000 22426856404800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank549]:W0429 13:56:57.788000 23008057497408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank566]:W0429 13:56:57.789000 22426856404800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank548]:W0429 13:56:57.789000 22941174024000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank520]:W0429 13:56:57.789000 23334123398976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank550]:W0429 13:56:57.790000 23232866502464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank549]:W0429 13:56:57.790000 23008057497408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank551]:W0429 13:56:57.790000 22376964278080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank521]:W0429 13:56:57.790000 22997051119424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank548]:W0429 13:56:57.790000 22941174024000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank520]:W0429 13:56:57.790000 23334123398976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank522]:W0429 13:56:57.791000 22493567997760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank550]:W0429 13:56:57.791000 23232866502464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank547]:W0429 13:56:57.791000 23088194828096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank555]:W0429 13:56:57.791000 22451617392448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank545]:W0429 13:56:57.791000 23301879301952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank532]:W0429 13:56:57.791000 22602198030144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank523]:W0429 13:56:57.791000 23356703872832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank563]:W0429 13:56:57.791000 23287863494464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank521]:W0429 13:56:57.791000 22997051119424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank551]:W0429 13:56:57.792000 22376964278080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank575]:W0429 13:56:57.791000 23174516569920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank536]:W0429 13:56:57.792000 22735768819520 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank554]:W0429 13:56:57.792000 22366861752128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank531]:W0429 13:56:57.792000 22377523238720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank553]:W0429 13:56:57.792000 22884358231872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank538]:W0429 13:56:57.792000 22723151595328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank530]:W0429 13:56:57.792000 23320120518464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank572]:W0429 13:56:57.792000 22995591173952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank522]:W0429 13:56:57.792000 22493567997760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank543]:W0429 13:56:57.792000 22671998539584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank544]:W0429 13:56:57.792000 22492820657984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank547]:W0429 13:56:57.792000 23088194828096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank561]:W0429 13:56:57.792000 22638208882496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank560]:W0429 13:56:57.792000 23207245494080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank534]:W0429 13:56:57.792000 22705775216448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank555]:W0429 13:56:57.792000 22451617392448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank552]:W0429 13:56:57.792000 22658267547456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank545]:W0429 13:56:57.793000 23301879301952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank532]:W0429 13:56:57.792000 22602198030144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank523]:W0429 13:56:57.792000 23356703872832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank563]:W0429 13:56:57.793000 23287863494464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank533]:W0429 13:56:57.793000 22608679524160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank575]:W0429 13:56:57.793000 23174516569920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank542]:W0429 13:56:57.793000 23400133113664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank573]:W0429 13:56:57.793000 22848198870848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank574]:W0429 13:56:57.793000 23449210459968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank535]:W0429 13:56:57.793000 23020759340864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank528]:W0429 13:56:57.793000 23136158848832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank537]:W0429 13:56:57.793000 23160562607936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank536]:W0429 13:56:57.793000 22735768819520 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank556]:W0429 13:56:57.793000 23205534914368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank569]:W0429 13:56:57.793000 23025217898304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank540]:W0429 13:56:57.793000 22588840044352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank529]:W0429 13:56:57.793000 22694409447232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank554]:W0429 13:56:57.793000 22366861752128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank531]:W0429 13:56:57.793000 22377523238720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank553]:W0429 13:56:57.793000 22884358231872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank562]:W0429 13:56:57.793000 22724619601728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank538]:W0429 13:56:57.793000 22723151595328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank568]:W0429 13:56:57.793000 22627546986304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank572]:W0429 13:56:57.793000 22995591173952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank526]:W0429 13:56:57.793000 22931767183168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank571]:W0429 13:56:57.793000 22657427879744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank546]:W0429 13:56:57.793000 23160892380992 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank530]:W0429 13:56:57.793000 23320120518464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank543]:W0429 13:56:57.793000 22671998539584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank544]:W0429 13:56:57.794000 22492820657984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank561]:W0429 13:56:57.794000 22638208882496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank541]:W0429 13:56:57.794000 22476915005248 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank560]:W0429 13:56:57.794000 23207245494080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank534]:W0429 13:56:57.794000 22705775216448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank552]:W0429 13:56:57.794000 22658267547456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank570]:W0429 13:56:57.794000 23275141949248 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank533]:W0429 13:56:57.794000 22608679524160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank527]:W0429 13:56:57.794000 23238924977984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank524]:W0429 13:56:57.794000 23235223582528 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank542]:W0429 13:56:57.794000 23400133113664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank535]:W0429 13:56:57.794000 23020759340864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank573]:W0429 13:56:57.794000 22848198870848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank528]:W0429 13:56:57.794000 23136158848832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank537]:W0429 13:56:57.794000 23160562607936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank574]:W0429 13:56:57.794000 23449210459968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank558]:W0429 13:56:57.794000 23319886497600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank556]:W0429 13:56:57.794000 23205534914368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank569]:W0429 13:56:57.794000 23025217898304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank529]:W0429 13:56:57.794000 22694409447232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank540]:W0429 13:56:57.794000 22588840044352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank525]:W0429 13:56:57.794000 22753390888768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank539]:W0429 13:56:57.794000 22403917678400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank557]:W0429 13:56:57.794000 22461013473088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank568]:W0429 13:56:57.795000 22627546986304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank562]:W0429 13:56:57.795000 22724619601728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank526]:W0429 13:56:57.794000 22931767183168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank571]:W0429 13:56:57.795000 22657427879744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank546]:W0429 13:56:57.795000 23160892380992 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank559]:W0429 13:56:57.795000 23441618888512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank541]:W0429 13:56:57.795000 22476915005248 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank570]:W0429 13:56:57.795000 23275141949248 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank527]:W0429 13:56:57.795000 23238924977984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank524]:W0429 13:56:57.795000 23235223582528 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank558]:W0429 13:56:57.795000 23319886497600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank525]:W0429 13:56:57.796000 22753390888768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank539]:W0429 13:56:57.796000 22403917678400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank557]:W0429 13:56:57.796000 22461013473088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank559]:W0429 13:56:57.796000 23441618888512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank516]:W0429 13:56:57.799000 23291955988288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank519]:W0429 13:56:57.799000 22363046254400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank518]:W0429 13:56:57.799000 22434595665728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank513]:W0429 13:56:57.800000 23267651802944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank517]:W0429 13:56:57.800000 23053019723584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank516]:W0429 13:56:57.800000 23291955988288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank519]:W0429 13:56:57.800000 22363046254400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank514]:W0429 13:56:57.800000 22687983380288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank513]:W0429 13:56:57.801000 23267651802944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank518]:W0429 13:56:57.801000 22434595665728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank515]:W0429 13:56:57.801000 23431350937408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank517]:W0429 13:56:57.801000 23053019723584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank514]:W0429 13:56:57.802000 22687983380288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank512]:W0429 13:56:57.802000 22450596362048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank515]:W0429 13:56:57.802000 23431350937408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank512]:W0429 13:56:57.803000 22450596362048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
[rank596]:W0429 13:57:15.176000 22893184579392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank599]:W0429 13:57:15.176000 23059337033536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank631]:W0429 13:57:15.176000 23325429204800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank597]:W0429 13:57:15.177000 23014165710656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank598]:W0429 13:57:15.177000 22881780889408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank628]:W0429 13:57:15.177000 23298908313408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank629]:W0429 13:57:15.177000 22667882944320 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank596]:W0429 13:57:15.177000 22893184579392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank599]:W0429 13:57:15.178000 23059337033536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank631]:W0429 13:57:15.177000 23325429204800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank630]:W0429 13:57:15.178000 22655651465024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank597]:W0429 13:57:15.178000 23014165710656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank598]:W0429 13:57:15.178000 22881780889408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank628]:W0429 13:57:15.178000 23298908313408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank629]:W0429 13:57:15.178000 22667882944320 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank630]:W0429 13:57:15.179000 22655651465024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank639]:W0429 13:57:15.179000 23241892407104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank638]:W0429 13:57:15.179000 22882082166592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank637]:W0429 13:57:15.180000 22418523752256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank639]:W0429 13:57:15.181000 23241892407104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank638]:W0429 13:57:15.181000 22882082166592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank636]:W0429 13:57:15.181000 22943143552832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank637]:W0429 13:57:15.181000 22418523752256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank636]:W0429 13:57:15.182000 22943143552832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank623]:W0429 13:57:15.185000 22569800378176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank620]:W0429 13:57:15.185000 22410940847936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank604]:W0429 13:57:15.185000 22757862881088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank621]:W0429 13:57:15.185000 23109450106688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank607]:W0429 13:57:15.185000 23228482393920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank606]:W0429 13:57:15.185000 22656555009856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank622]:W0429 13:57:15.185000 22520662140736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank623]:W0429 13:57:15.186000 22569800378176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank581]:W0429 13:57:15.186000 22583568193344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank604]:W0429 13:57:15.186000 22757862881088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank620]:W0429 13:57:15.186000 22410940847936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank582]:W0429 13:57:15.186000 22751713703744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank607]:W0429 13:57:15.187000 23228482393920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank621]:W0429 13:57:15.187000 23109450106688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank580]:W0429 13:57:15.187000 22358994495296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank622]:W0429 13:57:15.187000 22520662140736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank606]:W0429 13:57:15.187000 22656555009856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank583]:W0429 13:57:15.187000 22590516025152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank605]:W0429 13:57:15.187000 22870227236672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank581]:W0429 13:57:15.187000 22583568193344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank589]:W0429 13:57:15.188000 23292554454848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank582]:W0429 13:57:15.188000 22751713703744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank580]:W0429 13:57:15.188000 22358994495296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank588]:W0429 13:57:15.188000 23450514609984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank583]:W0429 13:57:15.188000 22590516025152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank605]:W0429 13:57:15.189000 22870227236672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank590]:W0429 13:57:15.189000 22469383337792 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank576]:W0429 13:57:15.189000 23250699491136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank589]:W0429 13:57:15.189000 23292554454848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank591]:W0429 13:57:15.189000 23217517430592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank578]:W0429 13:57:15.189000 22965602891584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank588]:W0429 13:57:15.190000 23450514609984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank590]:W0429 13:57:15.190000 22469383337792 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank577]:W0429 13:57:15.190000 22511705249600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank579]:W0429 13:57:15.190000 22678035662656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank576]:W0429 13:57:15.190000 23250699491136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank591]:W0429 13:57:15.190000 23217517430592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank578]:W0429 13:57:15.191000 22965602891584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank577]:W0429 13:57:15.191000 22511705249600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank579]:W0429 13:57:15.192000 22678035662656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank587]:W0429 13:57:15.200000 22684240340800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank626]:W0429 13:57:15.200000 22760942085952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank613]:W0429 13:57:15.200000 22731195914048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank585]:W0429 13:57:15.200000 22575819077440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank593]:W0429 13:57:15.200000 23038899234624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank627]:W0429 13:57:15.200000 22564601329472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank619]:W0429 13:57:15.200000 22760452396864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank614]:W0429 13:57:15.201000 22693698131776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank612]:W0429 13:57:15.201000 22897702086464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank584]:W0429 13:57:15.201000 22689961039680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank617]:W0429 13:57:15.201000 22556705244992 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank592]:W0429 13:57:15.201000 22973753423680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank595]:W0429 13:57:15.201000 22689469466432 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank625]:W0429 13:57:15.201000 23435289950016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank613]:W0429 13:57:15.201000 22731195914048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank587]:W0429 13:57:15.201000 22684240340800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank626]:W0429 13:57:15.201000 22760942085952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank586]:W0429 13:57:15.201000 22833878312768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank616]:W0429 13:57:15.201000 23160023455552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank585]:W0429 13:57:15.202000 22575819077440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank618]:W0429 13:57:15.202000 23116203673408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank593]:W0429 13:57:15.202000 23038899234624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank627]:W0429 13:57:15.202000 22564601329472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank610]:W0429 13:57:15.202000 22458499680064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank608]:W0429 13:57:15.202000 23040097892160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank619]:W0429 13:57:15.202000 22760452396864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank614]:W0429 13:57:15.202000 22693698131776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank612]:W0429 13:57:15.202000 22897702086464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank624]:W0429 13:57:15.202000 22891810350912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank584]:W0429 13:57:15.202000 22689961039680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank594]:W0429 13:57:15.202000 23249807648576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank611]:W0429 13:57:15.202000 22482356447040 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank617]:W0429 13:57:15.202000 22556705244992 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank615]:W0429 13:57:15.202000 22442679256896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank592]:W0429 13:57:15.202000 22973753423680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank595]:W0429 13:57:15.203000 22689469466432 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank609]:W0429 13:57:15.202000 22827155060544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank625]:W0429 13:57:15.203000 23435289950016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank586]:W0429 13:57:15.203000 22833878312768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank616]:W0429 13:57:15.203000 23160023455552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank618]:W0429 13:57:15.203000 23116203673408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank610]:W0429 13:57:15.203000 22458499680064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank608]:W0429 13:57:15.203000 23040097892160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank611]:W0429 13:57:15.203000 22482356447040 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank615]:W0429 13:57:15.204000 22442679256896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank624]:W0429 13:57:15.204000 22891810350912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank594]:W0429 13:57:15.204000 23249807648576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank609]:W0429 13:57:15.204000 22827155060544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank634]:W0429 13:57:15.204000 23371288901440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank632]:W0429 13:57:15.204000 23335599806272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank633]:W0429 13:57:15.204000 22683296884544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank635]:W0429 13:57:15.205000 22857665959744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank596]:W0429 13:57:15.205000 22893184579392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank634]:W0429 13:57:15.205000 23371288901440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank599]:W0429 13:57:15.206000 23059337033536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank632]:W0429 13:57:15.206000 23335599806272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank633]:W0429 13:57:15.206000 22683296884544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank631]:W0429 13:57:15.206000 23325429204800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank601]:W0429 13:57:15.206000 22931483809600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank635]:W0429 13:57:15.206000 22857665959744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank597]:W0429 13:57:15.207000 23014165710656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank598]:W0429 13:57:15.207000 22881780889408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank628]:W0429 13:57:15.207000 23298908313408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank603]:W0429 13:57:15.207000 22818365855552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank596]:W0429 13:57:15.207000 22893184579392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank599]:W0429 13:57:15.207000 23059337033536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank629]:W0429 13:57:15.207000 22667882944320 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank631]:W0429 13:57:15.207000 23325429204800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank602]:W0429 13:57:15.207000 22739419236160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank600]:W0429 13:57:15.207000 23018332428096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank601]:W0429 13:57:15.207000 22931483809600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank630]:W0429 13:57:15.208000 22655651465024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank598]:W0429 13:57:15.208000 22881780889408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank597]:W0429 13:57:15.208000 23014165710656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank628]:W0429 13:57:15.208000 23298908313408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank603]:W0429 13:57:15.208000 22818365855552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank629]:W0429 13:57:15.208000 22667882944320 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank602]:W0429 13:57:15.208000 22739419236160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank600]:W0429 13:57:15.209000 23018332428096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank630]:W0429 13:57:15.209000 22655651465024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank638]:W0429 13:57:15.209000 22882082166592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank639]:W0429 13:57:15.210000 23241892407104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank637]:W0429 13:57:15.210000 22418523752256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank638]:W0429 13:57:15.211000 22882082166592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank639]:W0429 13:57:15.211000 23241892407104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank636]:W0429 13:57:15.211000 22943143552832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank637]:W0429 13:57:15.211000 22418523752256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank636]:W0429 13:57:15.212000 22943143552832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank620]:W0429 13:57:15.214000 22410940847936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank623]:W0429 13:57:15.214000 22569800378176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank604]:W0429 13:57:15.214000 22757862881088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank607]:W0429 13:57:15.215000 23228482393920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank621]:W0429 13:57:15.215000 23109450106688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank622]:W0429 13:57:15.215000 22520662140736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank606]:W0429 13:57:15.215000 22656555009856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank581]:W0429 13:57:15.216000 22583568193344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank623]:W0429 13:57:15.216000 22569800378176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank620]:W0429 13:57:15.216000 22410940847936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank604]:W0429 13:57:15.216000 22757862881088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank607]:W0429 13:57:15.216000 23228482393920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank582]:W0429 13:57:15.216000 22751713703744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank621]:W0429 13:57:15.216000 23109450106688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank580]:W0429 13:57:15.216000 22358994495296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank622]:W0429 13:57:15.216000 22520662140736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank606]:W0429 13:57:15.216000 22656555009856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank583]:W0429 13:57:15.217000 22590516025152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank581]:W0429 13:57:15.217000 22583568193344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank605]:W0429 13:57:15.217000 22870227236672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank582]:W0429 13:57:15.218000 22751713703744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank589]:W0429 13:57:15.217000 23292554454848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank580]:W0429 13:57:15.218000 22358994495296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank583]:W0429 13:57:15.218000 22590516025152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank588]:W0429 13:57:15.218000 23450514609984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank590]:W0429 13:57:15.218000 22469383337792 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank605]:W0429 13:57:15.218000 22870227236672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank576]:W0429 13:57:15.219000 23250699491136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank589]:W0429 13:57:15.219000 23292554454848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank591]:W0429 13:57:15.219000 23217517430592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank578]:W0429 13:57:15.219000 22965602891584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank588]:W0429 13:57:15.219000 23450514609984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank590]:W0429 13:57:15.220000 22469383337792 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank576]:W0429 13:57:15.220000 23250699491136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank577]:W0429 13:57:15.220000 22511705249600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank591]:W0429 13:57:15.220000 23217517430592 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank579]:W0429 13:57:15.220000 22678035662656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank578]:W0429 13:57:15.220000 22965602891584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank577]:W0429 13:57:15.222000 22511705249600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank579]:W0429 13:57:15.222000 22678035662656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank613]:W0429 13:57:15.229000 22731195914048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank587]:W0429 13:57:15.230000 22684240340800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank626]:W0429 13:57:15.230000 22760942085952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank593]:W0429 13:57:15.230000 23038899234624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank614]:W0429 13:57:15.230000 22693698131776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank585]:W0429 13:57:15.230000 22575819077440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank627]:W0429 13:57:15.230000 22564601329472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank619]:W0429 13:57:15.230000 22760452396864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank612]:W0429 13:57:15.230000 22897702086464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank584]:W0429 13:57:15.231000 22689961039680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank592]:W0429 13:57:15.231000 22973753423680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank613]:W0429 13:57:15.231000 22731195914048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank617]:W0429 13:57:15.231000 22556705244992 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank625]:W0429 13:57:15.231000 23435289950016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank587]:W0429 13:57:15.231000 22684240340800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank616]:W0429 13:57:15.231000 23160023455552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank593]:W0429 13:57:15.231000 23038899234624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank626]:W0429 13:57:15.231000 22760942085952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank595]:W0429 13:57:15.231000 22689469466432 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank586]:W0429 13:57:15.231000 22833878312768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank614]:W0429 13:57:15.231000 22693698131776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank618]:W0429 13:57:15.231000 23116203673408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank585]:W0429 13:57:15.231000 22575819077440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank627]:W0429 13:57:15.232000 22564601329472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank610]:W0429 13:57:15.232000 22458499680064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank619]:W0429 13:57:15.232000 22760452396864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank608]:W0429 13:57:15.232000 23040097892160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank612]:W0429 13:57:15.232000 22897702086464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank615]:W0429 13:57:15.232000 22442679256896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank611]:W0429 13:57:15.232000 22482356447040 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank592]:W0429 13:57:15.232000 22973753423680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank584]:W0429 13:57:15.232000 22689961039680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank617]:W0429 13:57:15.232000 22556705244992 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank594]:W0429 13:57:15.232000 23249807648576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank625]:W0429 13:57:15.232000 23435289950016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank609]:W0429 13:57:15.232000 22827155060544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank624]:W0429 13:57:15.232000 22891810350912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank616]:W0429 13:57:15.233000 23160023455552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank595]:W0429 13:57:15.233000 22689469466432 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank586]:W0429 13:57:15.233000 22833878312768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank618]:W0429 13:57:15.233000 23116203673408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank610]:W0429 13:57:15.233000 22458499680064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank608]:W0429 13:57:15.233000 23040097892160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank615]:W0429 13:57:15.233000 22442679256896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank611]:W0429 13:57:15.233000 22482356447040 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank609]:W0429 13:57:15.234000 22827155060544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank594]:W0429 13:57:15.234000 23249807648576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank624]:W0429 13:57:15.234000 22891810350912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank634]:W0429 13:57:15.234000 23371288901440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank633]:W0429 13:57:15.234000 22683296884544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank632]:W0429 13:57:15.234000 23335599806272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank635]:W0429 13:57:15.235000 22857665959744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank633]:W0429 13:57:15.235000 22683296884544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank634]:W0429 13:57:15.236000 23371288901440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank632]:W0429 13:57:15.236000 23335599806272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank601]:W0429 13:57:15.236000 22931483809600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank635]:W0429 13:57:15.236000 22857665959744 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank603]:W0429 13:57:15.236000 22818365855552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank602]:W0429 13:57:15.237000 22739419236160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank601]:W0429 13:57:15.237000 22931483809600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank600]:W0429 13:57:15.237000 23018332428096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank603]:W0429 13:57:15.238000 22818365855552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank602]:W0429 13:57:15.238000 22739419236160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank600]:W0429 13:57:15.239000 23018332428096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
[rank699]:W0429 13:57:32.100000 22465743038272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank696]:W0429 13:57:32.100000 23215235180352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank697]:W0429 13:57:32.101000 22640711280448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank698]:W0429 13:57:32.101000 22897939339072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank699]:W0429 13:57:32.101000 22465743038272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank696]:W0429 13:57:32.101000 23215235180352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank697]:W0429 13:57:32.102000 22640711280448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank698]:W0429 13:57:32.102000 22897939339072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank652]:W0429 13:57:32.102000 23040079927104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank654]:W0429 13:57:32.103000 22732551247680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank653]:W0429 13:57:32.103000 22683508963136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank652]:W0429 13:57:32.104000 23040079927104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank655]:W0429 13:57:32.104000 22727150806848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank654]:W0429 13:57:32.104000 22732551247680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank653]:W0429 13:57:32.105000 22683508963136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank655]:W0429 13:57:32.105000 22727150806848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank701]:W0429 13:57:32.107000 22871379601216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank700]:W0429 13:57:32.107000 22391208015680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank703]:W0429 13:57:32.107000 22565326714688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank646]:W0429 13:57:32.108000 22772984190784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank702]:W0429 13:57:32.108000 23380567582528 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank668]:W0429 13:57:32.108000 22480624805696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank701]:W0429 13:57:32.109000 22871379601216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank669]:W0429 13:57:32.108000 23043526330176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank700]:W0429 13:57:32.109000 22391208015680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank703]:W0429 13:57:32.109000 22565326714688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank647]:W0429 13:57:32.109000 22885132781376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank671]:W0429 13:57:32.109000 22971257812800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank644]:W0429 13:57:32.109000 22367670413120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank646]:W0429 13:57:32.109000 22772984190784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank668]:W0429 13:57:32.109000 22480624805696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank702]:W0429 13:57:32.110000 23380567582528 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank670]:W0429 13:57:32.110000 22659587987264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank669]:W0429 13:57:32.110000 23043526330176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank645]:W0429 13:57:32.110000 22661711648576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank671]:W0429 13:57:32.110000 22971257812800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank647]:W0429 13:57:32.110000 22885132781376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank644]:W0429 13:57:32.111000 22367670413120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank670]:W0429 13:57:32.111000 22659587987264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank645]:W0429 13:57:32.111000 22661711648576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank663]:W0429 13:57:32.112000 22561982809920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank640]:W0429 13:57:32.112000 22683135678272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank660]:W0429 13:57:32.112000 22603111139136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank661]:W0429 13:57:32.112000 22853391746880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank662]:W0429 13:57:32.112000 22719805949760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank642]:W0429 13:57:32.113000 22768890541888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank663]:W0429 13:57:32.113000 22561982809920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank643]:W0429 13:57:32.113000 23172481980224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank640]:W0429 13:57:32.113000 22683135678272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank660]:W0429 13:57:32.114000 22603111139136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank661]:W0429 13:57:32.114000 22853391746880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank662]:W0429 13:57:32.114000 22719805949760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank641]:W0429 13:57:32.114000 23040549812032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank642]:W0429 13:57:32.114000 22768890541888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank643]:W0429 13:57:32.115000 23172481980224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank641]:W0429 13:57:32.115000 23040549812032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank656]:W0429 13:57:32.117000 22959686764352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank648]:W0429 13:57:32.118000 22994089736000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank657]:W0429 13:57:32.118000 23018153670464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank650]:W0429 13:57:32.119000 23428159215424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank649]:W0429 13:57:32.119000 22756031973184 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank656]:W0429 13:57:32.119000 22959686764352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank651]:W0429 13:57:32.119000 22746668730176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank658]:W0429 13:57:32.119000 22785037907776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank648]:W0429 13:57:32.120000 22994089736000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank657]:W0429 13:57:32.120000 23018153670464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank650]:W0429 13:57:32.120000 23428159215424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank649]:W0429 13:57:32.120000 22756031973184 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank659]:W0429 13:57:32.120000 23315971893056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank651]:W0429 13:57:32.120000 22746668730176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank658]:W0429 13:57:32.120000 22785037907776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank659]:W0429 13:57:32.121000 23315971893056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank673]:W0429 13:57:32.123000 22376880006976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank675]:W0429 13:57:32.123000 23016493020992 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank674]:W0429 13:57:32.124000 22608000313152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank673]:W0429 13:57:32.124000 22376880006976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank667]:W0429 13:57:32.125000 22698582361920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank672]:W0429 13:57:32.125000 22468806145856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank675]:W0429 13:57:32.125000 23016493020992 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank674]:W0429 13:57:32.125000 22608000313152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank664]:W0429 13:57:32.125000 22797967705920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank665]:W0429 13:57:32.126000 23199421679424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank667]:W0429 13:57:32.126000 22698582361920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank672]:W0429 13:57:32.126000 22468806145856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank677]:W0429 13:57:32.126000 23189427386176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank666]:W0429 13:57:32.126000 23305199331136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank664]:W0429 13:57:32.127000 22797967705920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank687]:W0429 13:57:32.127000 22503642937152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank686]:W0429 13:57:32.127000 22965896619840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank678]:W0429 13:57:32.127000 22698811938624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank665]:W0429 13:57:32.127000 23199421679424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank684]:W0429 13:57:32.127000 22534547220288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank685]:W0429 13:57:32.127000 22457898661696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank677]:W0429 13:57:32.128000 23189427386176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank666]:W0429 13:57:32.128000 23305199331136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank676]:W0429 13:57:32.128000 23098472863552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank679]:W0429 13:57:32.128000 22685470562112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank687]:W0429 13:57:32.128000 22503642937152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank686]:W0429 13:57:32.129000 22965896619840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank678]:W0429 13:57:32.129000 22698811938624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank684]:W0429 13:57:32.129000 22534547220288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank685]:W0429 13:57:32.129000 22457898661696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank676]:W0429 13:57:32.129000 23098472863552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank679]:W0429 13:57:32.129000 22685470562112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank699]:W0429 13:57:32.130000 22465743038272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank691]:W0429 13:57:32.129000 22602586548032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank688]:W0429 13:57:32.130000 23359325636416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank696]:W0429 13:57:32.130000 23215235180352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank681]:W0429 13:57:32.130000 22666878543680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank689]:W0429 13:57:32.130000 23443945326400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank697]:W0429 13:57:32.130000 22640711280448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank690]:W0429 13:57:32.130000 23364865677120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank698]:W0429 13:57:32.131000 22897939339072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank683]:W0429 13:57:32.131000 23009617766208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank680]:W0429 13:57:32.131000 22801333237568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank699]:W0429 13:57:32.131000 22465743038272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank691]:W0429 13:57:32.131000 22602586548032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank688]:W0429 13:57:32.131000 23359325636416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank696]:W0429 13:57:32.131000 23215235180352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank682]:W0429 13:57:32.131000 22823303997248 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank681]:W0429 13:57:32.131000 22666878543680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank689]:W0429 13:57:32.131000 23443945326400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank652]:W0429 13:57:32.131000 23040079927104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank697]:W0429 13:57:32.132000 22640711280448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank690]:W0429 13:57:32.132000 23364865677120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank698]:W0429 13:57:32.132000 22897939339072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank680]:W0429 13:57:32.132000 22801333237568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank683]:W0429 13:57:32.132000 23009617766208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank693]:W0429 13:57:32.132000 22763958310720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank694]:W0429 13:57:32.132000 23162688546624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank654]:W0429 13:57:32.133000 22732551247680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank682]:W0429 13:57:32.133000 22823303997248 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank652]:W0429 13:57:32.133000 23040079927104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank653]:W0429 13:57:32.133000 22683508963136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank693]:W0429 13:57:32.133000 22763958310720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank692]:W0429 13:57:32.133000 22680662316864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank655]:W0429 13:57:32.134000 22727150806848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank694]:W0429 13:57:32.134000 23162688546624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank695]:W0429 13:57:32.134000 22837797455680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank654]:W0429 13:57:32.134000 22732551247680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank653]:W0429 13:57:32.135000 22683508963136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank692]:W0429 13:57:32.135000 22680662316864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank655]:W0429 13:57:32.135000 22727150806848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank695]:W0429 13:57:32.135000 22837797455680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank701]:W0429 13:57:32.137000 22871379601216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank700]:W0429 13:57:32.137000 22391208015680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank703]:W0429 13:57:32.137000 22565326714688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank668]:W0429 13:57:32.138000 22480624805696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank646]:W0429 13:57:32.138000 22772984190784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank701]:W0429 13:57:32.138000 22871379601216 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank702]:W0429 13:57:32.138000 23380567582528 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank669]:W0429 13:57:32.138000 23043526330176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank703]:W0429 13:57:32.138000 22565326714688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank700]:W0429 13:57:32.138000 22391208015680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank671]:W0429 13:57:32.138000 22971257812800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank647]:W0429 13:57:32.138000 22885132781376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank668]:W0429 13:57:32.139000 22480624805696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank646]:W0429 13:57:32.139000 22772984190784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank644]:W0429 13:57:32.139000 22367670413120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank669]:W0429 13:57:32.139000 23043526330176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank702]:W0429 13:57:32.139000 23380567582528 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank670]:W0429 13:57:32.139000 22659587987264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank645]:W0429 13:57:32.140000 22661711648576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank671]:W0429 13:57:32.140000 22971257812800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank647]:W0429 13:57:32.140000 22885132781376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank644]:W0429 13:57:32.140000 22367670413120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank670]:W0429 13:57:32.141000 22659587987264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank645]:W0429 13:57:32.141000 22661711648576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank663]:W0429 13:57:32.141000 22561982809920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank640]:W0429 13:57:32.142000 22683135678272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank662]:W0429 13:57:32.142000 22719805949760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank661]:W0429 13:57:32.142000 22853391746880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank660]:W0429 13:57:32.142000 22603111139136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank663]:W0429 13:57:32.143000 22561982809920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank642]:W0429 13:57:32.143000 22768890541888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank643]:W0429 13:57:32.143000 23172481980224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank640]:W0429 13:57:32.143000 22683135678272 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank662]:W0429 13:57:32.143000 22719805949760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank661]:W0429 13:57:32.144000 22853391746880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank641]:W0429 13:57:32.144000 23040549812032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank660]:W0429 13:57:32.144000 22603111139136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank642]:W0429 13:57:32.144000 22768890541888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank643]:W0429 13:57:32.144000 23172481980224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank641]:W0429 13:57:32.145000 23040549812032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank656]:W0429 13:57:32.147000 22959686764352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank648]:W0429 13:57:32.148000 22994089736000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank657]:W0429 13:57:32.148000 23018153670464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank656]:W0429 13:57:32.148000 22959686764352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank649]:W0429 13:57:32.149000 22756031973184 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank651]:W0429 13:57:32.149000 22746668730176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank650]:W0429 13:57:32.149000 23428159215424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank658]:W0429 13:57:32.149000 22785037907776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank648]:W0429 13:57:32.149000 22994089736000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank657]:W0429 13:57:32.150000 23018153670464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank649]:W0429 13:57:32.150000 22756031973184 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank651]:W0429 13:57:32.150000 22746668730176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank650]:W0429 13:57:32.150000 23428159215424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank659]:W0429 13:57:32.150000 23315971893056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank658]:W0429 13:57:32.150000 22785037907776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank659]:W0429 13:57:32.151000 23315971893056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank673]:W0429 13:57:32.153000 22376880006976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank675]:W0429 13:57:32.154000 23016493020992 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank674]:W0429 13:57:32.154000 22608000313152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank673]:W0429 13:57:32.154000 22376880006976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank667]:W0429 13:57:32.155000 22698582361920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank672]:W0429 13:57:32.155000 22468806145856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank675]:W0429 13:57:32.155000 23016493020992 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank674]:W0429 13:57:32.155000 22608000313152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank664]:W0429 13:57:32.156000 22797967705920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank665]:W0429 13:57:32.156000 23199421679424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank667]:W0429 13:57:32.156000 22698582361920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank677]:W0429 13:57:32.156000 23189427386176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank672]:W0429 13:57:32.156000 22468806145856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank666]:W0429 13:57:32.156000 23305199331136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank687]:W0429 13:57:32.156000 22503642937152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank665]:W0429 13:57:32.157000 23199421679424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank678]:W0429 13:57:32.157000 22698811938624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank664]:W0429 13:57:32.157000 22797967705920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank677]:W0429 13:57:32.157000 23189427386176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank684]:W0429 13:57:32.157000 22534547220288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank685]:W0429 13:57:32.157000 22457898661696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank686]:W0429 13:57:32.157000 22965896619840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank666]:W0429 13:57:32.158000 23305199331136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank687]:W0429 13:57:32.158000 22503642937152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank676]:W0429 13:57:32.158000 23098472863552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank679]:W0429 13:57:32.158000 22685470562112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank678]:W0429 13:57:32.158000 22698811938624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank684]:W0429 13:57:32.159000 22534547220288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank685]:W0429 13:57:32.159000 22457898661696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank686]:W0429 13:57:32.159000 22965896619840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank676]:W0429 13:57:32.159000 23098472863552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank679]:W0429 13:57:32.159000 22685470562112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank691]:W0429 13:57:32.159000 22602586548032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank688]:W0429 13:57:32.160000 23359325636416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank681]:W0429 13:57:32.160000 22666878543680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank689]:W0429 13:57:32.160000 23443945326400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank690]:W0429 13:57:32.161000 23364865677120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank691]:W0429 13:57:32.161000 22602586548032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank680]:W0429 13:57:32.161000 22801333237568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank683]:W0429 13:57:32.161000 23009617766208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank688]:W0429 13:57:32.161000 23359325636416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank681]:W0429 13:57:32.161000 22666878543680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank682]:W0429 13:57:32.161000 22823303997248 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank689]:W0429 13:57:32.161000 23443945326400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank690]:W0429 13:57:32.162000 23364865677120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank693]:W0429 13:57:32.162000 22763958310720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank680]:W0429 13:57:32.162000 22801333237568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank683]:W0429 13:57:32.162000 23009617766208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank694]:W0429 13:57:32.162000 23162688546624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank682]:W0429 13:57:32.163000 22823303997248 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank692]:W0429 13:57:32.163000 22680662316864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank693]:W0429 13:57:32.163000 22763958310720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank694]:W0429 13:57:32.164000 23162688546624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank695]:W0429 13:57:32.164000 22837797455680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank692]:W0429 13:57:32.165000 22680662316864 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank695]:W0429 13:57:32.165000 22837797455680 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
[rank751]:W0429 13:57:49.194000 22924163565376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank749]:W0429 13:57:49.194000 23448572438336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank748]:W0429 13:57:49.195000 22760511526720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank750]:W0429 13:57:49.195000 22606476175168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank751]:W0429 13:57:49.196000 22924163565376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank749]:W0429 13:57:49.196000 23448572438336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank748]:W0429 13:57:49.197000 22760511526720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank750]:W0429 13:57:49.197000 22606476175168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank753]:W0429 13:57:49.198000 22763120146240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank755]:W0429 13:57:49.198000 22505148270400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank710]:W0429 13:57:49.198000 22376353933120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank708]:W0429 13:57:49.198000 23179490940736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank721]:W0429 13:57:49.198000 23155126424576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank722]:W0429 13:57:49.198000 22776563545088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank711]:W0429 13:57:49.198000 23322357270336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank752]:W0429 13:57:49.198000 23168544749376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank723]:W0429 13:57:49.199000 22490603176960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank754]:W0429 13:57:49.199000 22715288688448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank710]:W0429 13:57:49.199000 22376353933120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank753]:W0429 13:57:49.199000 22763120146240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank709]:W0429 13:57:49.199000 22654180480832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank755]:W0429 13:57:49.199000 22505148270400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank744]:W0429 13:57:49.199000 22668953737024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank720]:W0429 13:57:49.199000 23315191362560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank708]:W0429 13:57:49.199000 23179490940736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank745]:W0429 13:57:49.199000 22738073614144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank746]:W0429 13:57:49.199000 22818566858560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank721]:W0429 13:57:49.199000 23155126424576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank747]:W0429 13:57:49.199000 23404107892544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank722]:W0429 13:57:49.200000 22776563545088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank711]:W0429 13:57:49.200000 23322357270336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank752]:W0429 13:57:49.200000 23168544749376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank739]:W0429 13:57:49.200000 22943909345088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank737]:W0429 13:57:49.200000 22488513161024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank723]:W0429 13:57:49.200000 22490603176960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank709]:W0429 13:57:49.200000 22654180480832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank736]:W0429 13:57:49.200000 23263566952256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank754]:W0429 13:57:49.200000 22715288688448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank744]:W0429 13:57:49.200000 22668953737024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank720]:W0429 13:57:49.200000 23315191362560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank745]:W0429 13:57:49.201000 22738073614144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank746]:W0429 13:57:49.201000 22818566858560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank747]:W0429 13:57:49.201000 23404107892544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank738]:W0429 13:57:49.201000 22911507720000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank739]:W0429 13:57:49.201000 22943909345088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank737]:W0429 13:57:49.202000 22488513161024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank736]:W0429 13:57:49.202000 23263566952256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank718]:W0429 13:57:49.202000 22854449001472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank738]:W0429 13:57:49.202000 22911507720000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank705]:W0429 13:57:49.203000 22569517750080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank719]:W0429 13:57:49.203000 22812748506112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank716]:W0429 13:57:49.203000 23016966423552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank718]:W0429 13:57:49.203000 22854449001472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank717]:W0429 13:57:49.203000 22610587202560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank725]:W0429 13:57:49.203000 22740391089152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank724]:W0429 13:57:49.203000 22787193586688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank707]:W0429 13:57:49.204000 22440456722240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank727]:W0429 13:57:49.204000 22479236699136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank726]:W0429 13:57:49.204000 23452142531584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank704]:W0429 13:57:49.204000 22412134881088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank706]:W0429 13:57:49.204000 22405152806720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank705]:W0429 13:57:49.205000 22569517750080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank740]:W0429 13:57:49.204000 22845717346112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank719]:W0429 13:57:49.205000 22812748506112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank716]:W0429 13:57:49.205000 23016966423552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank717]:W0429 13:57:49.205000 22610587202560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank742]:W0429 13:57:49.205000 22968493909824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank725]:W0429 13:57:49.205000 22740391089152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank724]:W0429 13:57:49.205000 22787193586688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank707]:W0429 13:57:49.205000 22440456722240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank741]:W0429 13:57:49.205000 23235187554112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank727]:W0429 13:57:49.205000 22479236699136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank743]:W0429 13:57:49.205000 23055009830720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank726]:W0429 13:57:49.205000 23452142531584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank704]:W0429 13:57:49.205000 22412134881088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank706]:W0429 13:57:49.206000 22405152806720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank740]:W0429 13:57:49.206000 22845717346112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank742]:W0429 13:57:49.206000 22968493909824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank741]:W0429 13:57:49.206000 23235187554112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank715]:W0429 13:57:49.206000 22569475699712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank712]:W0429 13:57:49.206000 22402107589632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank743]:W0429 13:57:49.207000 23055009830720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank757]:W0429 13:57:49.207000 23353338140480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank756]:W0429 13:57:49.207000 22938111616832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank714]:W0429 13:57:49.207000 23425367192576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank713]:W0429 13:57:49.207000 23306165568512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank759]:W0429 13:57:49.208000 22811173467968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank758]:W0429 13:57:49.208000 22377320355648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank715]:W0429 13:57:49.208000 22569475699712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank712]:W0429 13:57:49.208000 22402107589632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank757]:W0429 13:57:49.208000 23353338140480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank756]:W0429 13:57:49.208000 22938111616832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank714]:W0429 13:57:49.208000 23425367192576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank713]:W0429 13:57:49.209000 23306165568512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank759]:W0429 13:57:49.209000 22811173467968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank758]:W0429 13:57:49.209000 22377320355648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank765]:W0429 13:57:49.216000 22765878830912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank761]:W0429 13:57:49.216000 23049252194112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank764]:W0429 13:57:49.216000 22595464922944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank766]:W0429 13:57:49.217000 23441323767616 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank767]:W0429 13:57:49.217000 22465421530944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank765]:W0429 13:57:49.217000 22765878830912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank732]:W0429 13:57:49.217000 22638099982144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank734]:W0429 13:57:49.217000 23344074233664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank760]:W0429 13:57:49.217000 22681044891456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank762]:W0429 13:57:49.217000 22382235981632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank733]:W0429 13:57:49.217000 23031051757376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank729]:W0429 13:57:49.217000 22789043586880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank735]:W0429 13:57:49.218000 23343584610112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank761]:W0429 13:57:49.218000 23049252194112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank763]:W0429 13:57:49.218000 23405524485952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank764]:W0429 13:57:49.218000 22595464922944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank731]:W0429 13:57:49.218000 22906284681024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank730]:W0429 13:57:49.218000 23161949112128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank766]:W0429 13:57:49.218000 23441323767616 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank767]:W0429 13:57:49.218000 22465421530944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank732]:W0429 13:57:49.218000 22638099982144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank760]:W0429 13:57:49.219000 22681044891456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank734]:W0429 13:57:49.219000 23344074233664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank762]:W0429 13:57:49.219000 22382235981632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank733]:W0429 13:57:49.219000 23031051757376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank729]:W0429 13:57:49.219000 22789043586880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank735]:W0429 13:57:49.219000 23343584610112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank763]:W0429 13:57:49.219000 23405524485952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank728]:W0429 13:57:49.219000 23072025175872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank731]:W0429 13:57:49.219000 22906284681024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank730]:W0429 13:57:49.219000 23161949112128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank728]:W0429 13:57:49.221000 23072025175872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank751]:W0429 13:57:49.224000 22924163565376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank749]:W0429 13:57:49.224000 23448572438336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank748]:W0429 13:57:49.225000 22760511526720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank750]:W0429 13:57:49.225000 22606476175168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank751]:W0429 13:57:49.225000 22924163565376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank749]:W0429 13:57:49.226000 23448572438336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank748]:W0429 13:57:49.226000 22760511526720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank750]:W0429 13:57:49.227000 22606476175168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank710]:W0429 13:57:49.227000 22376353933120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank708]:W0429 13:57:49.227000 23179490940736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank753]:W0429 13:57:49.228000 22763120146240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank711]:W0429 13:57:49.228000 23322357270336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank755]:W0429 13:57:49.228000 22505148270400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank721]:W0429 13:57:49.228000 23155126424576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank722]:W0429 13:57:49.228000 22776563545088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank752]:W0429 13:57:49.229000 23168544749376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank710]:W0429 13:57:49.229000 22376353933120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank708]:W0429 13:57:49.229000 23179490940736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank709]:W0429 13:57:49.229000 22654180480832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank744]:W0429 13:57:49.229000 22668953737024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank746]:W0429 13:57:49.229000 22818566858560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank723]:W0429 13:57:49.229000 22490603176960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank754]:W0429 13:57:49.229000 22715288688448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank745]:W0429 13:57:49.229000 22738073614144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank711]:W0429 13:57:49.229000 23322357270336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank753]:W0429 13:57:49.229000 22763120146240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank755]:W0429 13:57:49.229000 22505148270400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank720]:W0429 13:57:49.229000 23315191362560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank747]:W0429 13:57:49.229000 23404107892544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank739]:W0429 13:57:49.230000 22943909345088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank721]:W0429 13:57:49.229000 23155126424576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank722]:W0429 13:57:49.230000 22776563545088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank737]:W0429 13:57:49.230000 22488513161024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank752]:W0429 13:57:49.230000 23168544749376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank709]:W0429 13:57:49.230000 22654180480832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank736]:W0429 13:57:49.230000 23263566952256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank744]:W0429 13:57:49.230000 22668953737024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank746]:W0429 13:57:49.230000 22818566858560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank723]:W0429 13:57:49.230000 22490603176960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank745]:W0429 13:57:49.230000 22738073614144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank754]:W0429 13:57:49.231000 22715288688448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank720]:W0429 13:57:49.231000 23315191362560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank747]:W0429 13:57:49.231000 23404107892544 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank739]:W0429 13:57:49.231000 22943909345088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank738]:W0429 13:57:49.231000 22911507720000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank737]:W0429 13:57:49.231000 22488513161024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank736]:W0429 13:57:49.232000 23263566952256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank718]:W0429 13:57:49.232000 22854449001472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank738]:W0429 13:57:49.232000 22911507720000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank705]:W0429 13:57:49.233000 22569517750080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank718]:W0429 13:57:49.233000 22854449001472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank719]:W0429 13:57:49.233000 22812748506112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank716]:W0429 13:57:49.233000 23016966423552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank717]:W0429 13:57:49.233000 22610587202560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank707]:W0429 13:57:49.233000 22440456722240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank725]:W0429 13:57:49.233000 22740391089152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank724]:W0429 13:57:49.233000 22787193586688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank726]:W0429 13:57:49.234000 23452142531584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank727]:W0429 13:57:49.234000 22479236699136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank704]:W0429 13:57:49.234000 22412134881088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank740]:W0429 13:57:49.234000 22845717346112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank705]:W0429 13:57:49.234000 22569517750080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank706]:W0429 13:57:49.234000 22405152806720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank719]:W0429 13:57:49.234000 22812748506112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank716]:W0429 13:57:49.234000 23016966423552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank717]:W0429 13:57:49.234000 22610587202560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank742]:W0429 13:57:49.235000 22968493909824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank707]:W0429 13:57:49.235000 22440456722240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank741]:W0429 13:57:49.235000 23235187554112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank725]:W0429 13:57:49.235000 22740391089152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank743]:W0429 13:57:49.235000 23055009830720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank724]:W0429 13:57:49.235000 22787193586688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank726]:W0429 13:57:49.235000 23452142531584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank727]:W0429 13:57:49.235000 22479236699136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank704]:W0429 13:57:49.235000 22412134881088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank740]:W0429 13:57:49.235000 22845717346112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank706]:W0429 13:57:49.236000 22405152806720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank742]:W0429 13:57:49.236000 22968493909824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank741]:W0429 13:57:49.236000 23235187554112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank743]:W0429 13:57:49.236000 23055009830720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank757]:W0429 13:57:49.236000 23353338140480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank712]:W0429 13:57:49.236000 22402107589632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank715]:W0429 13:57:49.236000 22569475699712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank756]:W0429 13:57:49.237000 22938111616832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank714]:W0429 13:57:49.237000 23425367192576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank713]:W0429 13:57:49.237000 23306165568512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank759]:W0429 13:57:49.238000 22811173467968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank757]:W0429 13:57:49.238000 23353338140480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank758]:W0429 13:57:49.238000 22377320355648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank712]:W0429 13:57:49.238000 22402107589632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank715]:W0429 13:57:49.238000 22569475699712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank756]:W0429 13:57:49.238000 22938111616832 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank714]:W0429 13:57:49.239000 23425367192576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank713]:W0429 13:57:49.239000 23306165568512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank758]:W0429 13:57:49.239000 22377320355648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank759]:W0429 13:57:49.239000 22811173467968 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank765]:W0429 13:57:49.245000 22765878830912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank761]:W0429 13:57:49.246000 23049252194112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank764]:W0429 13:57:49.246000 22595464922944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank766]:W0429 13:57:49.247000 23441323767616 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank765]:W0429 13:57:49.247000 22765878830912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank732]:W0429 13:57:49.247000 22638099982144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank767]:W0429 13:57:49.247000 22465421530944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank733]:W0429 13:57:49.247000 23031051757376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank760]:W0429 13:57:49.247000 22681044891456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank734]:W0429 13:57:49.247000 23344074233664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank762]:W0429 13:57:49.247000 22382235981632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank729]:W0429 13:57:49.247000 22789043586880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank735]:W0429 13:57:49.247000 23343584610112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank761]:W0429 13:57:49.248000 23049252194112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank763]:W0429 13:57:49.248000 23405524485952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank764]:W0429 13:57:49.248000 22595464922944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank766]:W0429 13:57:49.248000 23441323767616 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank730]:W0429 13:57:49.248000 23161949112128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank731]:W0429 13:57:49.248000 22906284681024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank732]:W0429 13:57:49.248000 22638099982144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank767]:W0429 13:57:49.248000 22465421530944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank733]:W0429 13:57:49.248000 23031051757376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank760]:W0429 13:57:49.248000 22681044891456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank762]:W0429 13:57:49.248000 22382235981632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank734]:W0429 13:57:49.248000 23344074233664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank729]:W0429 13:57:49.249000 22789043586880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank735]:W0429 13:57:49.249000 23343584610112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank763]:W0429 13:57:49.249000 23405524485952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank730]:W0429 13:57:49.249000 23161949112128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank731]:W0429 13:57:49.249000 22906284681024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank728]:W0429 13:57:49.249000 23072025175872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank728]:W0429 13:57:49.251000 23072025175872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
[rank819]:W0429 13:58:06.093000 22574078154560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank789]:W0429 13:58:06.094000 23073957353280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank788]:W0429 13:58:06.094000 22453156910912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank816]:W0429 13:58:06.094000 22636896905024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank790]:W0429 13:58:06.095000 23408990082880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank791]:W0429 13:58:06.095000 23273373370176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank819]:W0429 13:58:06.095000 22574078154560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank817]:W0429 13:58:06.095000 22727004968768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank803]:W0429 13:58:06.095000 22543688345408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank789]:W0429 13:58:06.095000 23073957353280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank801]:W0429 13:58:06.095000 23120765495104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank802]:W0429 13:58:06.095000 23148012209984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank818]:W0429 13:58:06.095000 23344249673536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank788]:W0429 13:58:06.096000 22453156910912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank816]:W0429 13:58:06.096000 22636896905024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank790]:W0429 13:58:06.096000 23408990082880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank800]:W0429 13:58:06.096000 22503706830656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank791]:W0429 13:58:06.096000 23273373370176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank831]:W0429 13:58:06.096000 22748608661312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank817]:W0429 13:58:06.096000 22727004968768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank803]:W0429 13:58:06.096000 22543688345408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank801]:W0429 13:58:06.096000 23120765495104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank792]:W0429 13:58:06.096000 23213449094976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank795]:W0429 13:58:06.096000 23347344480064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank818]:W0429 13:58:06.097000 23344249673536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank794]:W0429 13:58:06.096000 22455880943424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank802]:W0429 13:58:06.096000 23148012209984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank793]:W0429 13:58:06.096000 23282207594304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank829]:W0429 13:58:06.097000 22574389172032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank800]:W0429 13:58:06.097000 22503706830656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank831]:W0429 13:58:06.097000 22748608661312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank830]:W0429 13:58:06.098000 23078621550400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank792]:W0429 13:58:06.098000 23213449094976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank795]:W0429 13:58:06.098000 23347344480064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank783]:W0429 13:58:06.098000 23302576711488 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank780]:W0429 13:58:06.098000 22945714526016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank781]:W0429 13:58:06.098000 22676889581376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank794]:W0429 13:58:06.098000 22455880943424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank793]:W0429 13:58:06.098000 23282207594304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank828]:W0429 13:58:06.098000 22396546631488 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank829]:W0429 13:58:06.098000 22574389172032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank782]:W0429 13:58:06.098000 23356045911872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank830]:W0429 13:58:06.099000 23078621550400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank780]:W0429 13:58:06.099000 22945714526016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank783]:W0429 13:58:06.099000 23302576711488 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank781]:W0429 13:58:06.099000 22676889581376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank828]:W0429 13:58:06.099000 22396546631488 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank782]:W0429 13:58:06.100000 23356045911872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank787]:W0429 13:58:06.102000 23058257061696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank786]:W0429 13:58:06.103000 23283476203328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank784]:W0429 13:58:06.103000 23368013875008 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank808]:W0429 13:58:06.103000 22955019372352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank810]:W0429 13:58:06.103000 22753685813056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank809]:W0429 13:58:06.103000 22618185955136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank775]:W0429 13:58:06.103000 22482860742464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank771]:W0429 13:58:06.104000 22507125675840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank811]:W0429 13:58:06.104000 23078087501632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank787]:W0429 13:58:06.104000 23058257061696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank785]:W0429 13:58:06.104000 23134760998720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank786]:W0429 13:58:06.104000 23283476203328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank808]:W0429 13:58:06.104000 22955019372352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank784]:W0429 13:58:06.105000 23368013875008 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank810]:W0429 13:58:06.105000 22753685813056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank809]:W0429 13:58:06.105000 22618185955136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank775]:W0429 13:58:06.105000 22482860742464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank814]:W0429 13:58:06.105000 23031310522176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank768]:W0429 13:58:06.105000 23028473091904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank771]:W0429 13:58:06.105000 22507125675840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank769]:W0429 13:58:06.105000 23095610795840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank811]:W0429 13:58:06.105000 23078087501632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank785]:W0429 13:58:06.105000 23134760998720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank815]:W0429 13:58:06.105000 22411102697280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank773]:W0429 13:58:06.105000 22837105370944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank774]:W0429 13:58:06.105000 22721912244032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank770]:W0429 13:58:06.105000 22759967074112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank772]:W0429 13:58:06.106000 22544536000320 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank812]:W0429 13:58:06.106000 23320168867648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank768]:W0429 13:58:06.106000 23028473091904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank814]:W0429 13:58:06.106000 23031310522176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank769]:W0429 13:58:06.106000 23095610795840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank815]:W0429 13:58:06.107000 22411102697280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank773]:W0429 13:58:06.107000 22837105370944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank774]:W0429 13:58:06.107000 22721912244032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank770]:W0429 13:58:06.107000 22759967074112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank813]:W0429 13:58:06.107000 23390003689280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank772]:W0429 13:58:06.107000 22544536000320 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank812]:W0429 13:58:06.107000 23320168867648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank813]:W0429 13:58:06.108000 23390003689280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank797]:W0429 13:58:06.109000 23430557505344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank799]:W0429 13:58:06.109000 23401575118656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank796]:W0429 13:58:06.110000 23422297270080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank797]:W0429 13:58:06.110000 23430557505344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank799]:W0429 13:58:06.111000 23401575118656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank798]:W0429 13:58:06.111000 23270422480704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank796]:W0429 13:58:06.111000 23422297270080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank798]:W0429 13:58:06.112000 23270422480704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank827]:W0429 13:58:06.119000 22522735339328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank824]:W0429 13:58:06.119000 23060925871936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank825]:W0429 13:58:06.120000 23346514876224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank826]:W0429 13:58:06.120000 22536101156672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank827]:W0429 13:58:06.120000 22522735339328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank824]:W0429 13:58:06.120000 23060925871936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank825]:W0429 13:58:06.121000 23346514876224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank826]:W0429 13:58:06.121000 22536101156672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank789]:W0429 13:58:06.123000 23073957353280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank819]:W0429 13:58:06.123000 22574078154560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank788]:W0429 13:58:06.124000 22453156910912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank790]:W0429 13:58:06.124000 23408990082880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank816]:W0429 13:58:06.124000 22636896905024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank789]:W0429 13:58:06.124000 23073957353280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank791]:W0429 13:58:06.125000 23273373370176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank803]:W0429 13:58:06.125000 22543688345408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank819]:W0429 13:58:06.125000 22574078154560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank817]:W0429 13:58:06.125000 22727004968768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank801]:W0429 13:58:06.125000 23120765495104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank802]:W0429 13:58:06.125000 23148012209984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank788]:W0429 13:58:06.125000 22453156910912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank818]:W0429 13:58:06.125000 23344249673536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank821]:W0429 13:58:06.125000 23082485663552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank790]:W0429 13:58:06.125000 23408990082880 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank822]:W0429 13:58:06.125000 22890255869760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank820]:W0429 13:58:06.125000 22896429434688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank831]:W0429 13:58:06.126000 22748608661312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank816]:W0429 13:58:06.126000 22636896905024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank823]:W0429 13:58:06.126000 23314345244480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank791]:W0429 13:58:06.126000 23273373370176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank800]:W0429 13:58:06.126000 22503706830656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank779]:W0429 13:58:06.126000 22446705723200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank803]:W0429 13:58:06.126000 22543688345408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank817]:W0429 13:58:06.126000 22727004968768 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank795]:W0429 13:58:06.126000 23347344480064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank801]:W0429 13:58:06.126000 23120765495104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank792]:W0429 13:58:06.126000 23213449094976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank802]:W0429 13:58:06.126000 23148012209984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank818]:W0429 13:58:06.127000 23344249673536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank794]:W0429 13:58:06.126000 22455880943424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank821]:W0429 13:58:06.127000 23082485663552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank829]:W0429 13:58:06.127000 22574389172032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank822]:W0429 13:58:06.127000 22890255869760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank793]:W0429 13:58:06.127000 23282207594304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank820]:W0429 13:58:06.127000 22896429434688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank778]:W0429 13:58:06.127000 22473889339200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank831]:W0429 13:58:06.127000 22748608661312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank776]:W0429 13:58:06.127000 23026485540672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank823]:W0429 13:58:06.127000 23314345244480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank783]:W0429 13:58:06.127000 23302576711488 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank780]:W0429 13:58:06.127000 22945714526016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank800]:W0429 13:58:06.127000 22503706830656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank779]:W0429 13:58:06.127000 22446705723200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank781]:W0429 13:58:06.127000 22676889581376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank830]:W0429 13:58:06.127000 23078621550400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank792]:W0429 13:58:06.128000 23213449094976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank795]:W0429 13:58:06.128000 23347344480064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank777]:W0429 13:58:06.128000 23034635990848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank828]:W0429 13:58:06.128000 22396546631488 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank794]:W0429 13:58:06.128000 22455880943424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank829]:W0429 13:58:06.128000 22574389172032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank782]:W0429 13:58:06.128000 23356045911872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank793]:W0429 13:58:06.128000 23282207594304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank778]:W0429 13:58:06.128000 22473889339200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank776]:W0429 13:58:06.128000 23026485540672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank783]:W0429 13:58:06.128000 23302576711488 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank780]:W0429 13:58:06.129000 22945714526016 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank781]:W0429 13:58:06.129000 22676889581376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank806]:W0429 13:58:06.129000 22516294928192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank830]:W0429 13:58:06.129000 23078621550400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank777]:W0429 13:58:06.129000 23034635990848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank828]:W0429 13:58:06.129000 22396546631488 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank782]:W0429 13:58:06.129000 23356045911872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank804]:W0429 13:58:06.130000 23167002027840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank805]:W0429 13:58:06.130000 23006929237824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank806]:W0429 13:58:06.130000 22516294928192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank807]:W0429 13:58:06.130000 22381591254848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank804]:W0429 13:58:06.131000 23167002027840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank805]:W0429 13:58:06.131000 23006929237824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank807]:W0429 13:58:06.132000 22381591254848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank787]:W0429 13:58:06.132000 23058257061696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank786]:W0429 13:58:06.132000 23283476203328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank808]:W0429 13:58:06.133000 22955019372352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank784]:W0429 13:58:06.133000 23368013875008 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank775]:W0429 13:58:06.133000 22482860742464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank771]:W0429 13:58:06.133000 22507125675840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank810]:W0429 13:58:06.133000 22753685813056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank809]:W0429 13:58:06.133000 22618185955136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank787]:W0429 13:58:06.134000 23058257061696 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank786]:W0429 13:58:06.134000 23283476203328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank811]:W0429 13:58:06.134000 23078087501632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank785]:W0429 13:58:06.134000 23134760998720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank808]:W0429 13:58:06.134000 22955019372352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank784]:W0429 13:58:06.134000 23368013875008 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank775]:W0429 13:58:06.134000 22482860742464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank768]:W0429 13:58:06.135000 23028473091904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank771]:W0429 13:58:06.135000 22507125675840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank810]:W0429 13:58:06.135000 22753685813056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank814]:W0429 13:58:06.135000 23031310522176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank809]:W0429 13:58:06.135000 22618185955136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank815]:W0429 13:58:06.135000 22411102697280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank773]:W0429 13:58:06.135000 22837105370944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank811]:W0429 13:58:06.135000 23078087501632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank769]:W0429 13:58:06.135000 23095610795840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank785]:W0429 13:58:06.135000 23134760998720 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank774]:W0429 13:58:06.135000 22721912244032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank812]:W0429 13:58:06.135000 23320168867648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank770]:W0429 13:58:06.136000 22759967074112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank772]:W0429 13:58:06.136000 22544536000320 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank768]:W0429 13:58:06.136000 23028473091904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank814]:W0429 13:58:06.136000 23031310522176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank815]:W0429 13:58:06.136000 22411102697280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank773]:W0429 13:58:06.136000 22837105370944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank769]:W0429 13:58:06.136000 23095610795840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank774]:W0429 13:58:06.136000 22721912244032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank813]:W0429 13:58:06.136000 23390003689280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank812]:W0429 13:58:06.137000 23320168867648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank770]:W0429 13:58:06.137000 22759967074112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank772]:W0429 13:58:06.137000 22544536000320 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank813]:W0429 13:58:06.138000 23390003689280 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank797]:W0429 13:58:06.138000 23430557505344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank799]:W0429 13:58:06.140000 23401575118656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank797]:W0429 13:58:06.140000 23430557505344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank796]:W0429 13:58:06.140000 23422297270080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank799]:W0429 13:58:06.141000 23401575118656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank798]:W0429 13:58:06.141000 23270422480704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank796]:W0429 13:58:06.141000 23422297270080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank798]:W0429 13:58:06.142000 23270422480704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank827]:W0429 13:58:06.149000 22522735339328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank824]:W0429 13:58:06.149000 23060925871936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank825]:W0429 13:58:06.149000 23346514876224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank826]:W0429 13:58:06.150000 22536101156672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank827]:W0429 13:58:06.150000 22522735339328 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank824]:W0429 13:58:06.151000 23060925871936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank825]:W0429 13:58:06.151000 23346514876224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank826]:W0429 13:58:06.151000 22536101156672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank821]:W0429 13:58:06.155000 23082485663552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank820]:W0429 13:58:06.155000 22896429434688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank822]:W0429 13:58:06.155000 22890255869760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank823]:W0429 13:58:06.155000 23314345244480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank779]:W0429 13:58:06.156000 22446705723200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank821]:W0429 13:58:06.156000 23082485663552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank820]:W0429 13:58:06.156000 22896429434688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank822]:W0429 13:58:06.156000 22890255869760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank823]:W0429 13:58:06.157000 23314345244480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank778]:W0429 13:58:06.157000 22473889339200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank776]:W0429 13:58:06.157000 23026485540672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank779]:W0429 13:58:06.157000 22446705723200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank777]:W0429 13:58:06.158000 23034635990848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank778]:W0429 13:58:06.158000 22473889339200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank776]:W0429 13:58:06.158000 23026485540672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank806]:W0429 13:58:06.158000 22516294928192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank777]:W0429 13:58:06.159000 23034635990848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank804]:W0429 13:58:06.159000 23167002027840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank806]:W0429 13:58:06.160000 22516294928192 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank805]:W0429 13:58:06.160000 23006929237824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank804]:W0429 13:58:06.161000 23167002027840 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank807]:W0429 13:58:06.161000 22381591254848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank805]:W0429 13:58:06.161000 23006929237824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank807]:W0429 13:58:06.163000 22381591254848 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
[rank852]:W0429 13:58:23.120000 22699394413568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank854]:W0429 13:58:23.120000 22829625181184 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank855]:W0429 13:58:23.121000 22965652505600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank852]:W0429 13:58:23.121000 22699394413568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank854]:W0429 13:58:23.122000 22829625181184 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank853]:W0429 13:58:23.122000 22608505652224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank855]:W0429 13:58:23.122000 22965652505600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank853]:W0429 13:58:23.123000 22608505652224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank835]:W0429 13:58:23.125000 22707314362176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank832]:W0429 13:58:23.125000 22809478039360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank847]:W0429 13:58:23.126000 22399796749312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank845]:W0429 13:58:23.126000 23262231270400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank844]:W0429 13:58:23.126000 22931603690496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank833]:W0429 13:58:23.126000 22618889299776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank835]:W0429 13:58:23.126000 22707314362176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank846]:W0429 13:58:23.126000 23008300094464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank832]:W0429 13:58:23.126000 22809478039360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank834]:W0429 13:58:23.126000 23453282465600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank847]:W0429 13:58:23.127000 22399796749312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank863]:W0429 13:58:23.126000 23097138701312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank889]:W0429 13:58:23.127000 22750308484096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank845]:W0429 13:58:23.127000 23262231270400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank884]:W0429 13:58:23.127000 23311081465664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank844]:W0429 13:58:23.127000 22931603690496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank833]:W0429 13:58:23.127000 22618889299776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank861]:W0429 13:58:23.127000 23300022789120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank846]:W0429 13:58:23.128000 23008300094464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank834]:W0429 13:58:23.128000 23453282465600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank888]:W0429 13:58:23.128000 23105447822336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank862]:W0429 13:58:23.127000 22720517260288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank886]:W0429 13:58:23.128000 22752896284480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank890]:W0429 13:58:23.128000 22646504805376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank885]:W0429 13:58:23.128000 23158484838208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank860]:W0429 13:58:23.128000 22399275426816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank891]:W0429 13:58:23.128000 23363075339264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank863]:W0429 13:58:23.128000 23097138701312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank887]:W0429 13:58:23.128000 23260950427456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank892]:W0429 13:58:23.128000 23204897645568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank889]:W0429 13:58:23.128000 22750308484096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank884]:W0429 13:58:23.128000 23311081465664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank895]:W0429 13:58:23.129000 22732907115520 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank861]:W0429 13:58:23.129000 23300022789120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank893]:W0429 13:58:23.129000 22697267590144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank880]:W0429 13:58:23.129000 22431395067712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank888]:W0429 13:58:23.129000 23105447822336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank862]:W0429 13:58:23.129000 22720517260288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank886]:W0429 13:58:23.129000 22752896284480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank890]:W0429 13:58:23.129000 22646504805376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank885]:W0429 13:58:23.129000 23158484838208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank894]:W0429 13:58:23.129000 22722102424576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank860]:W0429 13:58:23.129000 22399275426816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank891]:W0429 13:58:23.129000 23363075339264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank887]:W0429 13:58:23.129000 23260950427456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank892]:W0429 13:58:23.130000 23204897645568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank883]:W0429 13:58:23.129000 22782114146112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank881]:W0429 13:58:23.130000 23355134183232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank895]:W0429 13:58:23.130000 22732907115520 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank839]:W0429 13:58:23.130000 23343520589632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank838]:W0429 13:58:23.130000 23070144530240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank893]:W0429 13:58:23.130000 22697267590144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank868]:W0429 13:58:23.130000 22843928777728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank870]:W0429 13:58:23.130000 22869906023424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank836]:W0429 13:58:23.130000 23402988275520 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank880]:W0429 13:58:23.130000 22431395067712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank882]:W0429 13:58:23.130000 22832741963584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank837]:W0429 13:58:23.130000 22692121212736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank894]:W0429 13:58:23.131000 22722102424576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank871]:W0429 13:58:23.131000 22805645235200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank883]:W0429 13:58:23.131000 22782114146112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank881]:W0429 13:58:23.131000 23355134183232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank839]:W0429 13:58:23.131000 23343520589632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank868]:W0429 13:58:23.131000 22843928777728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank838]:W0429 13:58:23.132000 23070144530240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank869]:W0429 13:58:23.131000 22433972274176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank870]:W0429 13:58:23.132000 22869906023424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank836]:W0429 13:58:23.132000 23402988275520 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank882]:W0429 13:58:23.132000 22832741963584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank837]:W0429 13:58:23.132000 22692121212736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank871]:W0429 13:58:23.132000 22805645235200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank877]:W0429 13:58:23.132000 23022231761920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank878]:W0429 13:58:23.132000 22931216282624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank879]:W0429 13:58:23.132000 23053423150080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank876]:W0429 13:58:23.132000 23327272961024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank869]:W0429 13:58:23.133000 22433972274176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank877]:W0429 13:58:23.133000 23022231761920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank878]:W0429 13:58:23.133000 22931216282624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank879]:W0429 13:58:23.134000 23053423150080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank876]:W0429 13:58:23.134000 23327272961024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank859]:W0429 13:58:23.140000 23315380634624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank856]:W0429 13:58:23.141000 23431466267648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank857]:W0429 13:58:23.141000 22375100109824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank858]:W0429 13:58:23.141000 22386083034112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank850]:W0429 13:58:23.141000 22599052010496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank859]:W0429 13:58:23.141000 23315380634624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank851]:W0429 13:58:23.141000 23133163787264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank849]:W0429 13:58:23.142000 22504685487104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank848]:W0429 13:58:23.142000 23374121534464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank856]:W0429 13:58:23.142000 23431466267648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank857]:W0429 13:58:23.142000 22375100109824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank858]:W0429 13:58:23.142000 22386083034112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank873]:W0429 13:58:23.142000 22418516624384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank850]:W0429 13:58:23.143000 22599052010496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank851]:W0429 13:58:23.143000 23133163787264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank875]:W0429 13:58:23.143000 22577996039168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank849]:W0429 13:58:23.143000 22504685487104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank866]:W0429 13:58:23.143000 22816343897088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank848]:W0429 13:58:23.143000 23374121534464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank864]:W0429 13:58:23.144000 23130260276224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank874]:W0429 13:58:23.144000 22769966257152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank872]:W0429 13:58:23.144000 22909754635264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank867]:W0429 13:58:23.144000 22846747169792 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank873]:W0429 13:58:23.144000 22418516624384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank875]:W0429 13:58:23.144000 22577996039168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank866]:W0429 13:58:23.145000 22816343897088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank865]:W0429 13:58:23.145000 22493255771136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank872]:W0429 13:58:23.145000 22909754635264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank864]:W0429 13:58:23.145000 23130260276224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank874]:W0429 13:58:23.145000 22769966257152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank867]:W0429 13:58:23.145000 22846747169792 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank865]:W0429 13:58:23.146000 22493255771136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank840]:W0429 13:58:23.149000 23337563440128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank841]:W0429 13:58:23.149000 22544575788032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank852]:W0429 13:58:23.150000 22699394413568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank843]:W0429 13:58:23.150000 23154309743616 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank842]:W0429 13:58:23.150000 22832683090944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank840]:W0429 13:58:23.150000 23337563440128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank854]:W0429 13:58:23.150000 22829625181184 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank841]:W0429 13:58:23.150000 22544575788032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank855]:W0429 13:58:23.150000 22965652505600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank852]:W0429 13:58:23.151000 22699394413568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank843]:W0429 13:58:23.151000 23154309743616 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank854]:W0429 13:58:23.151000 22829625181184 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank842]:W0429 13:58:23.152000 22832683090944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank855]:W0429 13:58:23.152000 22965652505600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank853]:W0429 13:58:23.152000 22608505652224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank853]:W0429 13:58:23.153000 22608505652224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank835]:W0429 13:58:23.155000 22707314362176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank832]:W0429 13:58:23.155000 22809478039360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank847]:W0429 13:58:23.155000 22399796749312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank845]:W0429 13:58:23.156000 23262231270400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank833]:W0429 13:58:23.156000 22618889299776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank846]:W0429 13:58:23.156000 23008300094464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank844]:W0429 13:58:23.156000 22931603690496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank835]:W0429 13:58:23.156000 22707314362176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank832]:W0429 13:58:23.156000 22809478039360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank834]:W0429 13:58:23.156000 23453282465600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank863]:W0429 13:58:23.156000 23097138701312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank847]:W0429 13:58:23.157000 22399796749312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank889]:W0429 13:58:23.157000 22750308484096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank884]:W0429 13:58:23.157000 23311081465664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank845]:W0429 13:58:23.157000 23262231270400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank833]:W0429 13:58:23.157000 22618889299776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank846]:W0429 13:58:23.157000 23008300094464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank844]:W0429 13:58:23.157000 22931603690496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank861]:W0429 13:58:23.157000 23300022789120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank834]:W0429 13:58:23.157000 23453282465600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank862]:W0429 13:58:23.157000 22720517260288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank886]:W0429 13:58:23.157000 22752896284480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank885]:W0429 13:58:23.157000 23158484838208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank888]:W0429 13:58:23.157000 23105447822336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank890]:W0429 13:58:23.158000 22646504805376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank860]:W0429 13:58:23.158000 22399275426816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank887]:W0429 13:58:23.158000 23260950427456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank863]:W0429 13:58:23.158000 23097138701312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank892]:W0429 13:58:23.158000 23204897645568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank889]:W0429 13:58:23.158000 22750308484096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank884]:W0429 13:58:23.158000 23311081465664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank891]:W0429 13:58:23.158000 23363075339264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank895]:W0429 13:58:23.158000 22732907115520 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank893]:W0429 13:58:23.158000 22697267590144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank861]:W0429 13:58:23.158000 23300022789120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank886]:W0429 13:58:23.159000 22752896284480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank862]:W0429 13:58:23.159000 22720517260288 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank885]:W0429 13:58:23.159000 23158484838208 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank894]:W0429 13:58:23.159000 22722102424576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank880]:W0429 13:58:23.159000 22431395067712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank888]:W0429 13:58:23.159000 23105447822336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank890]:W0429 13:58:23.159000 22646504805376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank860]:W0429 13:58:23.159000 22399275426816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank887]:W0429 13:58:23.159000 23260950427456 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank892]:W0429 13:58:23.159000 23204897645568 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank883]:W0429 13:58:23.159000 22782114146112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank891]:W0429 13:58:23.159000 23363075339264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank895]:W0429 13:58:23.159000 22732907115520 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank893]:W0429 13:58:23.160000 22697267590144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank839]:W0429 13:58:23.160000 23343520589632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank868]:W0429 13:58:23.160000 22843928777728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank838]:W0429 13:58:23.160000 23070144530240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank870]:W0429 13:58:23.160000 22869906023424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank881]:W0429 13:58:23.160000 23355134183232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank836]:W0429 13:58:23.160000 23402988275520 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank894]:W0429 13:58:23.160000 22722102424576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank880]:W0429 13:58:23.160000 22431395067712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank882]:W0429 13:58:23.160000 22832741963584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank837]:W0429 13:58:23.160000 22692121212736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank871]:W0429 13:58:23.160000 22805645235200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank883]:W0429 13:58:23.160000 22782114146112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank839]:W0429 13:58:23.161000 23343520589632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank868]:W0429 13:58:23.161000 22843928777728 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank838]:W0429 13:58:23.161000 23070144530240 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank870]:W0429 13:58:23.161000 22869906023424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank881]:W0429 13:58:23.161000 23355134183232 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank836]:W0429 13:58:23.161000 23402988275520 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank882]:W0429 13:58:23.161000 22832741963584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank837]:W0429 13:58:23.162000 22692121212736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank877]:W0429 13:58:23.162000 23022231761920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank869]:W0429 13:58:23.162000 22433972274176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank878]:W0429 13:58:23.162000 22931216282624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank871]:W0429 13:58:23.162000 22805645235200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank876]:W0429 13:58:23.162000 23327272961024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank879]:W0429 13:58:23.162000 23053423150080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank877]:W0429 13:58:23.163000 23022231761920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank869]:W0429 13:58:23.163000 22433972274176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank876]:W0429 13:58:23.163000 23327272961024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank878]:W0429 13:58:23.163000 22931216282624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank879]:W0429 13:58:23.163000 23053423150080 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank859]:W0429 13:58:23.170000 23315380634624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank856]:W0429 13:58:23.171000 23431466267648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank858]:W0429 13:58:23.171000 22386083034112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank857]:W0429 13:58:23.171000 22375100109824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank859]:W0429 13:58:23.171000 23315380634624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank850]:W0429 13:58:23.171000 22599052010496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank851]:W0429 13:58:23.171000 23133163787264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank849]:W0429 13:58:23.171000 22504685487104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank848]:W0429 13:58:23.172000 23374121534464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank856]:W0429 13:58:23.172000 23431466267648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank873]:W0429 13:58:23.172000 22418516624384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank858]:W0429 13:58:23.172000 22386083034112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank857]:W0429 13:58:23.172000 22375100109824 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank875]:W0429 13:58:23.172000 22577996039168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank850]:W0429 13:58:23.173000 22599052010496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank851]:W0429 13:58:23.173000 23133163787264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank849]:W0429 13:58:23.173000 22504685487104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank866]:W0429 13:58:23.173000 22816343897088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank848]:W0429 13:58:23.173000 23374121534464 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank872]:W0429 13:58:23.173000 22909754635264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank873]:W0429 13:58:23.174000 22418516624384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank864]:W0429 13:58:23.174000 23130260276224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank867]:W0429 13:58:23.174000 22846747169792 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank874]:W0429 13:58:23.174000 22769966257152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank875]:W0429 13:58:23.174000 22577996039168 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank866]:W0429 13:58:23.174000 22816343897088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank872]:W0429 13:58:23.175000 22909754635264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank864]:W0429 13:58:23.175000 23130260276224 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank865]:W0429 13:58:23.175000 22493255771136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank867]:W0429 13:58:23.175000 22846747169792 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank874]:W0429 13:58:23.175000 22769966257152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank865]:W0429 13:58:23.176000 22493255771136 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank840]:W0429 13:58:23.179000 23337563440128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank841]:W0429 13:58:23.179000 22544575788032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank840]:W0429 13:58:23.180000 23337563440128 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank841]:W0429 13:58:23.180000 22544575788032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank843]:W0429 13:58:23.180000 23154309743616 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank842]:W0429 13:58:23.180000 22832683090944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank843]:W0429 13:58:23.181000 23154309743616 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank842]:W0429 13:58:23.182000 22832683090944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
[rank953]:W0429 13:58:40.094000 23223695312704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank954]:W0429 13:58:40.094000 22797695973184 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank955]:W0429 13:58:40.095000 22890705356608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank952]:W0429 13:58:40.095000 22725868603200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank953]:W0429 13:58:40.096000 23223695312704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank954]:W0429 13:58:40.096000 22797695973184 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank955]:W0429 13:58:40.096000 22890705356608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank952]:W0429 13:58:40.096000 22725868603200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank935]:W0429 13:58:40.098000 22685452445504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank934]:W0429 13:58:40.098000 22784713582400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank932]:W0429 13:58:40.098000 22655721269056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank933]:W0429 13:58:40.098000 22623158191936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank935]:W0429 13:58:40.099000 22685452445504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank932]:W0429 13:58:40.100000 22655721269056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank934]:W0429 13:58:40.100000 22784713582400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank933]:W0429 13:58:40.100000 22623158191936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank924]:W0429 13:58:40.103000 23405940089856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank908]:W0429 13:58:40.103000 23441259521856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank927]:W0429 13:58:40.103000 23336345216000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank926]:W0429 13:58:40.103000 23407458108416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank924]:W0429 13:58:40.104000 23405940089856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank902]:W0429 13:58:40.104000 22679124031296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank958]:W0429 13:58:40.104000 22870853293888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank918]:W0429 13:58:40.104000 23020709418816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank925]:W0429 13:58:40.104000 22694552794112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank910]:W0429 13:58:40.104000 23383061079872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank908]:W0429 13:58:40.105000 23441259521856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank909]:W0429 13:58:40.104000 22935420569408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank916]:W0429 13:58:40.105000 23304456582976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank927]:W0429 13:58:40.105000 23336345216000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank903]:W0429 13:58:40.105000 22751880521536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank926]:W0429 13:58:40.105000 23407458108416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank900]:W0429 13:58:40.105000 22517473498944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank917]:W0429 13:58:40.105000 23289045014336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank957]:W0429 13:58:40.105000 23327077742400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank902]:W0429 13:58:40.105000 22679124031296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank919]:W0429 13:58:40.105000 23110418003776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank911]:W0429 13:58:40.105000 22717559043904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank918]:W0429 13:58:40.106000 23020709418816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank958]:W0429 13:58:40.105000 22870853293888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank956]:W0429 13:58:40.105000 22698717177664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank925]:W0429 13:58:40.106000 22694552794112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank909]:W0429 13:58:40.106000 22935420569408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank910]:W0429 13:58:40.106000 23383061079872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank901]:W0429 13:58:40.106000 22406951683904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank916]:W0429 13:58:40.106000 23304456582976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank903]:W0429 13:58:40.106000 22751880521536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank900]:W0429 13:58:40.106000 22517473498944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank959]:W0429 13:58:40.106000 22963720988480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank917]:W0429 13:58:40.106000 23289045014336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank957]:W0429 13:58:40.106000 23327077742400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank919]:W0429 13:58:40.107000 23110418003776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank911]:W0429 13:58:40.107000 22717559043904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank956]:W0429 13:58:40.107000 22698717177664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank901]:W0429 13:58:40.107000 22406951683904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank959]:W0429 13:58:40.107000 22963720988480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank912]:W0429 13:58:40.110000 23112400152384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank915]:W0429 13:58:40.110000 22364138645312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank914]:W0429 13:58:40.110000 22428392048448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank897]:W0429 13:58:40.110000 22565241096000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank898]:W0429 13:58:40.111000 22622955063104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank915]:W0429 13:58:40.111000 22364138645312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank912]:W0429 13:58:40.111000 23112400152384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank896]:W0429 13:58:40.111000 22371599644480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank914]:W0429 13:58:40.112000 22428392048448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank913]:W0429 13:58:40.112000 22931408488256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank950]:W0429 13:58:40.112000 23301453461312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank951]:W0429 13:58:40.112000 22722420795200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank897]:W0429 13:58:40.112000 22565241096000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank948]:W0429 13:58:40.112000 22990459074368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank898]:W0429 13:58:40.112000 22622955063104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank928]:W0429 13:58:40.112000 22934972323648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank949]:W0429 13:58:40.112000 23419075888960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank939]:W0429 13:58:40.112000 23362374690624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank899]:W0429 13:58:40.112000 22674843531072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank931]:W0429 13:58:40.112000 23186610702144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank938]:W0429 13:58:40.112000 23062963967808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank930]:W0429 13:58:40.112000 22547424954176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank937]:W0429 13:58:40.112000 22596117464896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank896]:W0429 13:58:40.113000 22371599644480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank913]:W0429 13:58:40.113000 22931408488256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank950]:W0429 13:58:40.113000 23301453461312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank929]:W0429 13:58:40.113000 22989739951936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank951]:W0429 13:58:40.113000 22722420795200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank936]:W0429 13:58:40.113000 22945763895104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank948]:W0429 13:58:40.113000 22990459074368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank939]:W0429 13:58:40.113000 23362374690624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank949]:W0429 13:58:40.114000 23419075888960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank938]:W0429 13:58:40.113000 23062963967808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank928]:W0429 13:58:40.114000 22934972323648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank899]:W0429 13:58:40.113000 22674843531072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank931]:W0429 13:58:40.114000 23186610702144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank937]:W0429 13:58:40.114000 22596117464896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank930]:W0429 13:58:40.114000 22547424954176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank929]:W0429 13:58:40.115000 22989739951936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank936]:W0429 13:58:40.115000 22945763895104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank941]:W0429 13:58:40.115000 23132725516096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank942]:W0429 13:58:40.115000 23053413537600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank943]:W0429 13:58:40.116000 22459090310976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank905]:W0429 13:58:40.116000 22397105874752 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank940]:W0429 13:58:40.116000 23000040372032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank904]:W0429 13:58:40.117000 22902121166656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank941]:W0429 13:58:40.117000 23132725516096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank942]:W0429 13:58:40.117000 23053413537600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank907]:W0429 13:58:40.117000 22471318013760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank906]:W0429 13:58:40.117000 22864993277760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank943]:W0429 13:58:40.117000 22459090310976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank905]:W0429 13:58:40.118000 22397105874752 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank940]:W0429 13:58:40.118000 23000040372032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank947]:W0429 13:58:40.118000 22981428143936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank904]:W0429 13:58:40.118000 22902121166656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank946]:W0429 13:58:40.119000 23196769806144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank907]:W0429 13:58:40.119000 22471318013760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank906]:W0429 13:58:40.119000 22864993277760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank945]:W0429 13:58:40.119000 22639531140928 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank944]:W0429 13:58:40.119000 23299478325056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank947]:W0429 13:58:40.119000 22981428143936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank946]:W0429 13:58:40.120000 23196769806144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank945]:W0429 13:58:40.120000 22639531140928 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank944]:W0429 13:58:40.120000 23299478325056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank920]:W0429 13:58:40.122000 23356293764096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank921]:W0429 13:58:40.124000 23339427132416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank920]:W0429 13:58:40.124000 23356293764096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank922]:W0429 13:58:40.124000 22982202336256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank923]:W0429 13:58:40.124000 22904657105920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank954]:W0429 13:58:40.124000 22797695973184 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank953]:W0429 13:58:40.124000 23223695312704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank955]:W0429 13:58:40.125000 22890705356608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank921]:W0429 13:58:40.125000 23339427132416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank952]:W0429 13:58:40.125000 22725868603200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank922]:W0429 13:58:40.125000 22982202336256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank923]:W0429 13:58:40.126000 22904657105920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank954]:W0429 13:58:40.126000 22797695973184 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank953]:W0429 13:58:40.126000 23223695312704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank955]:W0429 13:58:40.126000 22890705356608 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank952]:W0429 13:58:40.126000 22725868603200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank935]:W0429 13:58:40.128000 22685452445504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank932]:W0429 13:58:40.128000 22655721269056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank933]:W0429 13:58:40.128000 22623158191936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank934]:W0429 13:58:40.129000 22784713582400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank935]:W0429 13:58:40.129000 22685452445504 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank932]:W0429 13:58:40.130000 22655721269056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank933]:W0429 13:58:40.130000 22623158191936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank934]:W0429 13:58:40.130000 22784713582400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank924]:W0429 13:58:40.132000 23405940089856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank908]:W0429 13:58:40.133000 23441259521856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank927]:W0429 13:58:40.133000 23336345216000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank926]:W0429 13:58:40.133000 23407458108416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank924]:W0429 13:58:40.133000 23405940089856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank902]:W0429 13:58:40.133000 22679124031296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank918]:W0429 13:58:40.134000 23020709418816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank908]:W0429 13:58:40.134000 23441259521856 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank958]:W0429 13:58:40.134000 22870853293888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank909]:W0429 13:58:40.134000 22935420569408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank925]:W0429 13:58:40.134000 22694552794112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank910]:W0429 13:58:40.134000 23383061079872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank900]:W0429 13:58:40.134000 22517473498944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank927]:W0429 13:58:40.134000 23336345216000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank903]:W0429 13:58:40.134000 22751880521536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank926]:W0429 13:58:40.134000 23407458108416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank916]:W0429 13:58:40.135000 23304456582976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank902]:W0429 13:58:40.135000 22679124031296 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank917]:W0429 13:58:40.135000 23289045014336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank919]:W0429 13:58:40.135000 23110418003776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank918]:W0429 13:58:40.135000 23020709418816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank911]:W0429 13:58:40.135000 22717559043904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank957]:W0429 13:58:40.135000 23327077742400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank910]:W0429 13:58:40.135000 23383061079872 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank958]:W0429 13:58:40.135000 22870853293888 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank909]:W0429 13:58:40.135000 22935420569408 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank925]:W0429 13:58:40.135000 22694552794112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank900]:W0429 13:58:40.135000 22517473498944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank903]:W0429 13:58:40.135000 22751880521536 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank901]:W0429 13:58:40.135000 22406951683904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank956]:W0429 13:58:40.136000 22698717177664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank916]:W0429 13:58:40.136000 23304456582976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank917]:W0429 13:58:40.136000 23289045014336 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank919]:W0429 13:58:40.136000 23110418003776 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank959]:W0429 13:58:40.136000 22963720988480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank911]:W0429 13:58:40.136000 22717559043904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank957]:W0429 13:58:40.136000 23327077742400 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank901]:W0429 13:58:40.137000 22406951683904 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank956]:W0429 13:58:40.137000 22698717177664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank959]:W0429 13:58:40.137000 22963720988480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank915]:W0429 13:58:40.140000 22364138645312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank912]:W0429 13:58:40.140000 23112400152384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank914]:W0429 13:58:40.140000 22428392048448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank897]:W0429 13:58:40.140000 22565241096000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank898]:W0429 13:58:40.140000 22622955063104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank915]:W0429 13:58:40.141000 22364138645312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank912]:W0429 13:58:40.141000 23112400152384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank914]:W0429 13:58:40.141000 22428392048448 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank950]:W0429 13:58:40.141000 23301453461312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank896]:W0429 13:58:40.141000 22371599644480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank951]:W0429 13:58:40.142000 22722420795200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank897]:W0429 13:58:40.142000 22565241096000 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank948]:W0429 13:58:40.142000 22990459074368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank913]:W0429 13:58:40.142000 22931408488256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank898]:W0429 13:58:40.142000 22622955063104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank949]:W0429 13:58:40.142000 23419075888960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank899]:W0429 13:58:40.142000 22674843531072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank939]:W0429 13:58:40.142000 23362374690624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank938]:W0429 13:58:40.142000 23062963967808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank937]:W0429 13:58:40.142000 22596117464896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank928]:W0429 13:58:40.142000 22934972323648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank931]:W0429 13:58:40.143000 23186610702144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank950]:W0429 13:58:40.143000 23301453461312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank896]:W0429 13:58:40.143000 22371599644480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank930]:W0429 13:58:40.143000 22547424954176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank951]:W0429 13:58:40.143000 22722420795200 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank948]:W0429 13:58:40.143000 22990459074368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank913]:W0429 13:58:40.143000 22931408488256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank949]:W0429 13:58:40.143000 23419075888960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank899]:W0429 13:58:40.143000 22674843531072 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank939]:W0429 13:58:40.143000 23362374690624 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank938]:W0429 13:58:40.143000 23062963967808 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank936]:W0429 13:58:40.143000 22945763895104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank937]:W0429 13:58:40.144000 22596117464896 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank928]:W0429 13:58:40.144000 22934972323648 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank929]:W0429 13:58:40.144000 22989739951936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank931]:W0429 13:58:40.144000 23186610702144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank930]:W0429 13:58:40.144000 22547424954176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank936]:W0429 13:58:40.145000 22945763895104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank942]:W0429 13:58:40.145000 23053413537600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank929]:W0429 13:58:40.145000 22989739951936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank941]:W0429 13:58:40.145000 23132725516096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank905]:W0429 13:58:40.146000 22397105874752 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank943]:W0429 13:58:40.146000 22459090310976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank940]:W0429 13:58:40.146000 23000040372032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank942]:W0429 13:58:40.146000 23053413537600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank904]:W0429 13:58:40.147000 22902121166656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank941]:W0429 13:58:40.147000 23132725516096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank907]:W0429 13:58:40.147000 22471318013760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank906]:W0429 13:58:40.147000 22864993277760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank905]:W0429 13:58:40.147000 22397105874752 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank943]:W0429 13:58:40.147000 22459090310976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank940]:W0429 13:58:40.147000 23000040372032 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank947]:W0429 13:58:40.148000 22981428143936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank904]:W0429 13:58:40.148000 22902121166656 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank907]:W0429 13:58:40.148000 22471318013760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank946]:W0429 13:58:40.148000 23196769806144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank906]:W0429 13:58:40.149000 22864993277760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank945]:W0429 13:58:40.149000 22639531140928 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank944]:W0429 13:58:40.149000 23299478325056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank947]:W0429 13:58:40.149000 22981428143936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank946]:W0429 13:58:40.150000 23196769806144 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank945]:W0429 13:58:40.150000 22639531140928 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank944]:W0429 13:58:40.150000 23299478325056 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank920]:W0429 13:58:40.152000 23356293764096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank921]:W0429 13:58:40.153000 23339427132416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank920]:W0429 13:58:40.153000 23356293764096 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank922]:W0429 13:58:40.154000 22982202336256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank923]:W0429 13:58:40.154000 22904657105920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank921]:W0429 13:58:40.155000 23339427132416 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank922]:W0429 13:58:40.155000 22982202336256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank923]:W0429 13:58:40.156000 22904657105920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
[rank983]:W0429 13:58:58.958000 23390511535936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank982]:W0429 13:58:58.958000 22992638469952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank981]:W0429 13:58:58.958000 22381584279360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank983]:W0429 13:58:58.959000 23390511535936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank982]:W0429 13:58:58.960000 22992638469952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank981]:W0429 13:58:58.960000 22381584279360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank980]:W0429 13:58:58.960000 23023642474304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank980]:W0429 13:58:58.961000 23023642474304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank974]:W0429 13:58:58.962000 22530680424256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank973]:W0429 13:58:58.962000 22709272409920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank989]:W0429 13:58:58.963000 23149217986368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank975]:W0429 13:58:58.964000 22471110117184 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank974]:W0429 13:58:58.964000 22530680424256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank972]:W0429 13:58:58.964000 22947873224512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank973]:W0429 13:58:58.964000 22709272409920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank997]:W0429 13:58:58.964000 23319370315584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank998]:W0429 13:58:58.964000 23251748144960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank989]:W0429 13:58:58.964000 23149217986368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank991]:W0429 13:58:58.964000 22461948413760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank999]:W0429 13:58:58.964000 22645721544512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank975]:W0429 13:58:58.965000 22471110117184 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank990]:W0429 13:58:58.965000 23331667580736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank988]:W0429 13:58:58.965000 23259824633664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank972]:W0429 13:58:58.965000 22947873224512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank997]:W0429 13:58:58.965000 23319370315584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank998]:W0429 13:58:58.965000 23251748144960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank991]:W0429 13:58:58.966000 22461948413760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank999]:W0429 13:58:58.966000 22645721544512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank996]:W0429 13:58:58.966000 22693047498560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank990]:W0429 13:58:58.966000 23331667580736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank988]:W0429 13:58:58.966000 23259824633664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank996]:W0429 13:58:58.967000 22693047498560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1018]:W0429 13:58:58.973000 22471091808064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1019]:W0429 13:58:58.974000 22369298634560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1018]:W0429 13:58:58.975000 22471091808064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1019]:W0429 13:58:58.975000 22369298634560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1017]:W0429 13:58:58.975000 23145896904512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1016]:W0429 13:58:58.975000 22562114799424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1017]:W0429 13:58:58.977000 23145896904512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1016]:W0429 13:58:58.977000 22562114799424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank979]:W0429 13:58:58.981000 23127567460160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank967]:W0429 13:58:58.982000 22496164706112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank979]:W0429 13:58:58.983000 23127567460160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank978]:W0429 13:58:58.983000 23351234529088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank995]:W0429 13:58:58.983000 22703062665024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank967]:W0429 13:58:58.984000 22496164706112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank994]:W0429 13:58:58.984000 23159564437312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank977]:W0429 13:58:58.984000 23019045050176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank966]:W0429 13:58:58.984000 22930020632384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank993]:W0429 13:58:58.984000 23249233692480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank978]:W0429 13:58:58.985000 23351234529088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank995]:W0429 13:58:58.985000 22703062665024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank962]:W0429 13:58:58.985000 23373573269312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank965]:W0429 13:58:58.985000 22879340152640 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank963]:W0429 13:58:58.985000 23086182278976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank994]:W0429 13:58:58.985000 23159564437312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank977]:W0429 13:58:58.985000 23019045050176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1006]:W0429 13:58:58.986000 22640241010496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank966]:W0429 13:58:58.986000 22930020632384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank964]:W0429 13:58:58.986000 22653189793600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank961]:W0429 13:58:58.986000 23014074238784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1005]:W0429 13:58:58.986000 22909662533440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank993]:W0429 13:58:58.986000 23249233692480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1021]:W0429 13:58:58.986000 23040113157952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank962]:W0429 13:58:58.986000 23373573269312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank965]:W0429 13:58:58.986000 22879340152640 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank963]:W0429 13:58:58.986000 23086182278976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1007]:W0429 13:58:58.986000 22507261069120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank992]:W0429 13:58:58.986000 22901287376704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1022]:W0429 13:58:58.987000 22820806195008 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1006]:W0429 13:58:58.987000 22640241010496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank976]:W0429 13:58:58.987000 23031650916160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1023]:W0429 13:58:58.987000 22806638126912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank964]:W0429 13:58:58.987000 22653189793600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank961]:W0429 13:58:58.987000 23014074238784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1021]:W0429 13:58:58.987000 23040113157952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1005]:W0429 13:58:58.987000 22909662533440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank983]:W0429 13:58:58.988000 23390511535936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1004]:W0429 13:58:58.987000 22730411841344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1007]:W0429 13:58:58.988000 22507261069120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank992]:W0429 13:58:58.988000 22901287376704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank982]:W0429 13:58:58.988000 22992638469952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1022]:W0429 13:58:58.988000 22820806195008 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank981]:W0429 13:58:58.988000 22381584279360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1023]:W0429 13:58:58.988000 22806638126912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank976]:W0429 13:58:58.988000 23031650916160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1020]:W0429 13:58:58.988000 22570581280576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1001]:W0429 13:58:58.989000 22933358393152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank970]:W0429 13:58:58.989000 22821059569472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank971]:W0429 13:58:58.989000 22378238203712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank983]:W0429 13:58:58.989000 23390511535936 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1002]:W0429 13:58:58.989000 22968743380800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1004]:W0429 13:58:58.989000 22730411841344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank960]:W0429 13:58:58.989000 23389612365632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank982]:W0429 13:58:58.989000 22992638469952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank969]:W0429 13:58:58.990000 22991424177984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank981]:W0429 13:58:58.990000 22381584279360 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1003]:W0429 13:58:58.989000 23184322938688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1020]:W0429 13:58:58.990000 22570581280576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank980]:W0429 13:58:58.990000 23023642474304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank986]:W0429 13:58:58.990000 23364643370816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1001]:W0429 13:58:58.990000 22933358393152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank985]:W0429 13:58:58.990000 23092463683392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank971]:W0429 13:58:58.990000 22378238203712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank970]:W0429 13:58:58.990000 22821059569472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank987]:W0429 13:58:58.990000 23436289308480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1002]:W0429 13:58:58.990000 22968743380800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank960]:W0429 13:58:58.990000 23389612365632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank968]:W0429 13:58:58.991000 22403374061376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank969]:W0429 13:58:58.991000 22991424177984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1003]:W0429 13:58:58.991000 23184322938688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank980]:W0429 13:58:58.991000 23023642474304 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank986]:W0429 13:58:58.991000 23364643370816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank984]:W0429 13:58:58.991000 23337355274048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank985]:W0429 13:58:58.992000 23092463683392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank987]:W0429 13:58:58.992000 23436289308480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank968]:W0429 13:58:58.992000 22403374061376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank973]:W0429 13:58:58.992000 22709272409920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank974]:W0429 13:58:58.992000 22530680424256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1000]:W0429 13:58:58.992000 22917274732352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank989]:W0429 13:58:58.993000 23149217986368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank984]:W0429 13:58:58.993000 23337355274048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank974]:W0429 13:58:58.993000 22530680424256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank973]:W0429 13:58:58.993000 22709272409920 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank972]:W0429 13:58:58.993000 22947873224512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank975]:W0429 13:58:58.994000 22471110117184 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank997]:W0429 13:58:58.993000 23319370315584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1000]:W0429 13:58:58.993000 22917274732352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank998]:W0429 13:58:58.994000 23251748144960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank989]:W0429 13:58:58.994000 23149217986368 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank991]:W0429 13:58:58.994000 22461948413760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank999]:W0429 13:58:58.994000 22645721544512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank972]:W0429 13:58:58.995000 22947873224512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank990]:W0429 13:58:58.995000 23331667580736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank988]:W0429 13:58:58.995000 23259824633664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank975]:W0429 13:58:58.995000 22471110117184 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank997]:W0429 13:58:58.995000 23319370315584 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank998]:W0429 13:58:58.995000 23251748144960 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank991]:W0429 13:58:58.995000 22461948413760 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank999]:W0429 13:58:58.996000 22645721544512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank990]:W0429 13:58:58.996000 23331667580736 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank996]:W0429 13:58:58.996000 22693047498560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank988]:W0429 13:58:58.996000 23259824633664 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank996]:W0429 13:58:58.997000 22693047498560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1011]:W0429 13:58:59.002000 22569550829376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1009]:W0429 13:58:59.002000 22566853011264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1019]:W0429 13:58:59.004000 22369298634560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1018]:W0429 13:58:59.004000 22471091808064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1011]:W0429 13:58:59.004000 22569550829376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1009]:W0429 13:58:59.004000 22566853011264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1008]:W0429 13:58:59.004000 22799488567104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1015]:W0429 13:58:59.004000 22488913418048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1019]:W0429 13:58:59.005000 22369298634560 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1018]:W0429 13:58:59.005000 22471091808064 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1017]:W0429 13:58:59.005000 23145896904512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1016]:W0429 13:58:59.006000 22562114799424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1008]:W0429 13:58:59.006000 22799488567104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1015]:W0429 13:58:59.006000 22488913418048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1010]:W0429 13:58:59.006000 23393751144256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1012]:W0429 13:58:59.006000 23029165860672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1017]:W0429 13:58:59.007000 23145896904512 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1016]:W0429 13:58:59.007000 22562114799424 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1012]:W0429 13:58:59.007000 23029165860672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1010]:W0429 13:58:59.007000 23393751144256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1014]:W0429 13:58:59.009000 23039560922944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1013]:W0429 13:58:59.009000 22817462175552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1014]:W0429 13:58:59.010000 23039560922944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank1013]:W0429 13:58:59.011000 22817462175552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] indirect0 is not in var_ranges, defaulting to unknown range.
[rank979]:W0429 13:58:59.011000 23127567460160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank967]:W0429 13:58:59.012000 22496164706112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank979]:W0429 13:58:59.013000 23127567460160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank967]:W0429 13:58:59.013000 22496164706112 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank978]:W0429 13:58:59.014000 23351234529088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank995]:W0429 13:58:59.013000 22703062665024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank994]:W0429 13:58:59.014000 23159564437312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank977]:W0429 13:58:59.014000 23019045050176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank966]:W0429 13:58:59.014000 22930020632384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank965]:W0429 13:58:59.014000 22879340152640 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank993]:W0429 13:58:59.014000 23249233692480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank962]:W0429 13:58:59.015000 23373573269312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank963]:W0429 13:58:59.015000 23086182278976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank995]:W0429 13:58:59.015000 22703062665024 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank978]:W0429 13:58:59.015000 23351234529088 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank994]:W0429 13:58:59.015000 23159564437312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank977]:W0429 13:58:59.015000 23019045050176 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank966]:W0429 13:58:59.015000 22930020632384 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1005]:W0429 13:58:59.016000 22909662533440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank964]:W0429 13:58:59.016000 22653189793600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1006]:W0429 13:58:59.016000 22640241010496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank965]:W0429 13:58:59.016000 22879340152640 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1021]:W0429 13:58:59.016000 23040113157952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank993]:W0429 13:58:59.016000 23249233692480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank962]:W0429 13:58:59.016000 23373573269312 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank961]:W0429 13:58:59.016000 23014074238784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank963]:W0429 13:58:59.016000 23086182278976 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1007]:W0429 13:58:59.017000 22507261069120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1022]:W0429 13:58:59.017000 22820806195008 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank992]:W0429 13:58:59.017000 22901287376704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1023]:W0429 13:58:59.017000 22806638126912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1005]:W0429 13:58:59.017000 22909662533440 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank964]:W0429 13:58:59.017000 22653189793600 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1006]:W0429 13:58:59.017000 22640241010496 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1021]:W0429 13:58:59.017000 23040113157952 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank961]:W0429 13:58:59.017000 23014074238784 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1004]:W0429 13:58:59.018000 22730411841344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank976]:W0429 13:58:59.018000 23031650916160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1007]:W0429 13:58:59.018000 22507261069120 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1022]:W0429 13:58:59.018000 22820806195008 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank992]:W0429 13:58:59.018000 22901287376704 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1023]:W0429 13:58:59.018000 22806638126912 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1020]:W0429 13:58:59.018000 22570581280576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank971]:W0429 13:58:59.019000 22378238203712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1001]:W0429 13:58:59.019000 22933358393152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank970]:W0429 13:58:59.019000 22821059569472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank960]:W0429 13:58:59.019000 23389612365632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1004]:W0429 13:58:59.019000 22730411841344 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank976]:W0429 13:58:59.019000 23031650916160 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1002]:W0429 13:58:59.019000 22968743380800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1020]:W0429 13:58:59.020000 22570581280576 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank969]:W0429 13:58:59.020000 22991424177984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank971]:W0429 13:58:59.020000 22378238203712 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1001]:W0429 13:58:59.020000 22933358393152 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1003]:W0429 13:58:59.020000 23184322938688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank985]:W0429 13:58:59.020000 23092463683392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank986]:W0429 13:58:59.020000 23364643370816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank960]:W0429 13:58:59.020000 23389612365632 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank970]:W0429 13:58:59.020000 22821059569472 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank987]:W0429 13:58:59.020000 23436289308480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank968]:W0429 13:58:59.021000 22403374061376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1002]:W0429 13:58:59.021000 22968743380800 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank969]:W0429 13:58:59.021000 22991424177984 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank984]:W0429 13:58:59.021000 23337355274048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank985]:W0429 13:58:59.021000 23092463683392 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank986]:W0429 13:58:59.021000 23364643370816 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1003]:W0429 13:58:59.021000 23184322938688 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank987]:W0429 13:58:59.022000 23436289308480 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank968]:W0429 13:58:59.022000 22403374061376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1000]:W0429 13:58:59.022000 22917274732352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank984]:W0429 13:58:59.023000 23337355274048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1000]:W0429 13:58:59.024000 22917274732352 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1009]:W0429 13:58:59.032000 22566853011264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1011]:W0429 13:58:59.032000 22569550829376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1009]:W0429 13:58:59.033000 22566853011264 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1011]:W0429 13:58:59.034000 22569550829376 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1015]:W0429 13:58:59.034000 22488913418048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1008]:W0429 13:58:59.034000 22799488567104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1015]:W0429 13:58:59.035000 22488913418048 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1008]:W0429 13:58:59.036000 22799488567104 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1012]:W0429 13:58:59.036000 23029165860672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1010]:W0429 13:58:59.036000 23393751144256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1012]:W0429 13:58:59.037000 23029165860672 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1010]:W0429 13:58:59.037000 23393751144256 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1014]:W0429 13:58:59.038000 23039560922944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1013]:W0429 13:58:59.039000 22817462175552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1014]:W0429 13:58:59.040000 23039560922944 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
[rank1013]:W0429 13:58:59.040000 22817462175552 torch/fx/experimental/symbolic_shapes.py:4449] [7/0] tmp6 is not in var_ranges, defaulting to unknown range.
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py:151: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /data/jenkins_workspace/workspace/pytorch@6/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
/public/home/xingjl/dependency/anaconda3/envs/megatron_train/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py:3121: UserWarning: window_size should be (-1, 0) or (>=0, 0) for attn_mask_type=causal
  warnings.warn(
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
[rank418]: Traceback (most recent call last):
[rank418]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank418]:     pretrain(
[rank418]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank418]:     iteration, num_floating_point_operations_so_far = train(
[rank418]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank418]:     train_step(forward_step_func,
[rank418]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank418]:     losses_reduced = forward_backward_func(
[rank418]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank418]:     config.grad_sync_func(model.parameters())
[rank418]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank418]:     bucket_group.start_grad_sync()
[rank418]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank418]:     self.check_grads(
[rank418]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank418]:     rerun_state_machine.validate_result(
[rank418]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank418]:     raise RuntimeError(full_message)
[rank418]: RuntimeError: Rank 418, node node053, device 2, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank422]: Traceback (most recent call last):
[rank422]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank422]:     pretrain(
[rank422]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank422]:     iteration, num_floating_point_operations_so_far = train(
[rank422]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank422]:     train_step(forward_step_func,
[rank422]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank422]:     losses_reduced = forward_backward_func(
[rank422]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank422]:     config.grad_sync_func(model.parameters())
[rank422]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank422]:     bucket_group.start_grad_sync()
[rank422]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank422]:     self.check_grads(
[rank422]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank422]:     rerun_state_machine.validate_result(
[rank422]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank422]:     raise RuntimeError(full_message)
[rank422]: RuntimeError: Rank 422, node node053, device 6, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank416]: Traceback (most recent call last):
[rank416]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank416]:     pretrain(
[rank416]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank416]:     iteration, num_floating_point_operations_so_far = train(
[rank416]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank416]:     train_step(forward_step_func,
[rank416]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank416]:     losses_reduced = forward_backward_func(
[rank416]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank416]:     config.grad_sync_func(model.parameters())
[rank416]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank416]:     bucket_group.start_grad_sync()
[rank416]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank416]:     self.check_grads(
[rank416]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank416]:     rerun_state_machine.validate_result(
[rank416]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank416]:     raise RuntimeError(full_message)
[rank416]: RuntimeError: Rank 416, node node053, device 0, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank421]: Traceback (most recent call last):
[rank421]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank421]:     pretrain(
[rank421]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank421]:     iteration, num_floating_point_operations_so_far = train(
[rank421]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank421]:     train_step(forward_step_func,
[rank421]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank421]:     losses_reduced = forward_backward_func(
[rank421]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank421]:     config.grad_sync_func(model.parameters())
[rank421]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank421]:     bucket_group.start_grad_sync()
[rank421]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank421]:     self.check_grads(
[rank421]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank421]:     rerun_state_machine.validate_result(
[rank421]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank421]:     raise RuntimeError(full_message)
[rank421]: RuntimeError: Rank 421, node node053, device 5, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank420]: Traceback (most recent call last):
[rank420]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank420]:     pretrain(
[rank420]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank420]:     iteration, num_floating_point_operations_so_far = train(
[rank420]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank420]:     train_step(forward_step_func,
[rank420]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank420]:     losses_reduced = forward_backward_func(
[rank420]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank420]:     config.grad_sync_func(model.parameters())
[rank420]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank420]:     bucket_group.start_grad_sync()
[rank420]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank420]:     self.check_grads(
[rank420]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank420]:     rerun_state_machine.validate_result(
[rank420]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank420]:     raise RuntimeError(full_message)
[rank420]: RuntimeError: Rank 420, node node053, device 4, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank419]: Traceback (most recent call last):
[rank419]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank419]:     pretrain(
[rank419]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank419]:     iteration, num_floating_point_operations_so_far = train(
[rank419]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank419]:     train_step(forward_step_func,
[rank419]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank419]:     losses_reduced = forward_backward_func(
[rank419]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank419]:     config.grad_sync_func(model.parameters())
[rank419]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank419]:     bucket_group.start_grad_sync()
[rank419]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank419]:     self.check_grads(
[rank419]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank419]:     rerun_state_machine.validate_result(
[rank419]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank419]:     raise RuntimeError(full_message)
[rank419]: RuntimeError: Rank 419, node node053, device 3, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank417]: Traceback (most recent call last):
[rank417]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank417]:     pretrain(
[rank417]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank417]:     iteration, num_floating_point_operations_so_far = train(
[rank417]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank417]:     train_step(forward_step_func,
[rank417]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank417]:     losses_reduced = forward_backward_func(
[rank417]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank417]:     config.grad_sync_func(model.parameters())
[rank417]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank417]:     bucket_group.start_grad_sync()
[rank417]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank417]:     self.check_grads(
[rank417]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank417]:     rerun_state_machine.validate_result(
[rank417]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank417]:     raise RuntimeError(full_message)
[rank417]: RuntimeError: Rank 417, node node053, device 1, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank423]: Traceback (most recent call last):
[rank423]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank423]:     pretrain(
[rank423]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank423]:     iteration, num_floating_point_operations_so_far = train(
[rank423]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank423]:     train_step(forward_step_func,
[rank423]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank423]:     losses_reduced = forward_backward_func(
[rank423]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank423]:     config.grad_sync_func(model.parameters())
[rank423]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank423]:     bucket_group.start_grad_sync()
[rank423]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank423]:     self.check_grads(
[rank423]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank423]:     rerun_state_machine.validate_result(
[rank423]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank423]:     raise RuntimeError(full_message)
[rank423]: RuntimeError: Rank 423, node node053, device 7, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
[rank356]: Traceback (most recent call last):
[rank356]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank356]:     pretrain(
[rank356]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank356]:     iteration, num_floating_point_operations_so_far = train(
[rank356]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank356]:     train_step(forward_step_func,
[rank356]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank356]:     losses_reduced = forward_backward_func(
[rank356]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank356]:     config.grad_sync_func(model.parameters())
[rank356]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank356]:     bucket_group.start_grad_sync()
[rank356]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank356]:     self.check_grads(
[rank356]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank356]:     rerun_state_machine.validate_result(
[rank356]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank356]:     raise RuntimeError(full_message)
[rank356]: RuntimeError: Rank 356, node node045, device 4, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank352]: Traceback (most recent call last):
[rank352]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank352]:     pretrain(
[rank352]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank352]:     iteration, num_floating_point_operations_so_far = train(
[rank352]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank352]:     train_step(forward_step_func,
[rank352]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank352]:     losses_reduced = forward_backward_func(
[rank352]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank352]:     config.grad_sync_func(model.parameters())
[rank352]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank352]:     bucket_group.start_grad_sync()
[rank352]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank352]:     self.check_grads(
[rank352]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank352]:     rerun_state_machine.validate_result(
[rank352]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank352]:     raise RuntimeError(full_message)
[rank352]: RuntimeError: Rank 352, node node045, device 0, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank354]: Traceback (most recent call last):
[rank354]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank354]:     pretrain(
[rank354]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank354]:     iteration, num_floating_point_operations_so_far = train(
[rank354]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank354]:     train_step(forward_step_func,
[rank354]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank354]:     losses_reduced = forward_backward_func(
[rank354]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank354]:     config.grad_sync_func(model.parameters())
[rank354]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank354]:     bucket_group.start_grad_sync()
[rank354]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank354]:     self.check_grads(
[rank354]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank354]:     rerun_state_machine.validate_result(
[rank354]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank354]:     raise RuntimeError(full_message)
[rank354]: RuntimeError: Rank 354, node node045, device 2, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank359]: Traceback (most recent call last):
[rank359]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank359]:     pretrain(
[rank359]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank359]:     iteration, num_floating_point_operations_so_far = train(
[rank359]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank359]:     train_step(forward_step_func,
[rank359]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank359]:     losses_reduced = forward_backward_func(
[rank359]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank359]:     config.grad_sync_func(model.parameters())
[rank359]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank359]:     bucket_group.start_grad_sync()
[rank359]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank359]:     self.check_grads(
[rank359]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank359]:     rerun_state_machine.validate_result(
[rank359]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank359]:     raise RuntimeError(full_message)
[rank359]: RuntimeError: Rank 359, node node045, device 7, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank358]: Traceback (most recent call last):
[rank358]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank358]:     pretrain(
[rank358]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank358]:     iteration, num_floating_point_operations_so_far = train(
[rank358]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank358]:     train_step(forward_step_func,
[rank358]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank358]:     losses_reduced = forward_backward_func(
[rank358]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank358]:     config.grad_sync_func(model.parameters())
[rank358]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank358]:     bucket_group.start_grad_sync()
[rank358]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank358]:     self.check_grads(
[rank358]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank358]:     rerun_state_machine.validate_result(
[rank358]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank358]:     raise RuntimeError(full_message)
[rank358]: RuntimeError: Rank 358, node node045, device 6, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
[rank355]: Traceback (most recent call last):
[rank355]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank355]:     pretrain(
[rank355]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank355]:     iteration, num_floating_point_operations_so_far = train(
[rank355]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank355]:     train_step(forward_step_func,
[rank355]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank355]:     losses_reduced = forward_backward_func(
[rank355]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank355]:     config.grad_sync_func(model.parameters())
[rank355]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank355]:     bucket_group.start_grad_sync()
[rank355]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank355]:     self.check_grads(
[rank355]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank355]:     rerun_state_machine.validate_result(
[rank355]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank355]:     raise RuntimeError(full_message)
[rank355]: RuntimeError: Rank 355, node node045, device 3, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank357]: Traceback (most recent call last):
[rank357]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank357]:     pretrain(
[rank357]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank357]:     iteration, num_floating_point_operations_so_far = train(
[rank357]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank357]:     train_step(forward_step_func,
[rank357]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank357]:     losses_reduced = forward_backward_func(
[rank357]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank357]:     config.grad_sync_func(model.parameters())
[rank357]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank357]:     bucket_group.start_grad_sync()
[rank357]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank357]:     self.check_grads(
[rank357]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank357]:     rerun_state_machine.validate_result(
[rank357]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank357]:     raise RuntimeError(full_message)
[rank357]: RuntimeError: Rank 357, node node045, device 5, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
[rank353]: Traceback (most recent call last):
[rank353]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank353]:     pretrain(
[rank353]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank353]:     iteration, num_floating_point_operations_so_far = train(
[rank353]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank353]:     train_step(forward_step_func,
[rank353]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank353]:     losses_reduced = forward_backward_func(
[rank353]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank353]:     config.grad_sync_func(model.parameters())
[rank353]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank353]:     bucket_group.start_grad_sync()
[rank353]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank353]:     self.check_grads(
[rank353]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank353]:     rerun_state_machine.validate_result(
[rank353]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank353]:     raise RuntimeError(full_message)
[rank353]: RuntimeError: Rank 353, node node045, device 1, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
[rank295]: Traceback (most recent call last):
[rank295]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank295]:     pretrain(
[rank295]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank295]:     iteration, num_floating_point_operations_so_far = train(
[rank295]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank295]:     train_step(forward_step_func,
[rank295]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank295]:     losses_reduced = forward_backward_func(
[rank295]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank295]:     config.grad_sync_func(model.parameters())
[rank295]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank295]:     bucket_group.start_grad_sync()
[rank295]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank295]:     self.check_grads(
[rank295]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank295]:     rerun_state_machine.validate_result(
[rank295]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank295]:     raise RuntimeError(full_message)
[rank295]: RuntimeError: Rank 295, node node037, device 7, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank293]: Traceback (most recent call last):
[rank293]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank293]:     pretrain(
[rank293]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank293]:     iteration, num_floating_point_operations_so_far = train(
[rank293]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank293]:     train_step(forward_step_func,
[rank293]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank293]:     losses_reduced = forward_backward_func(
[rank293]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank293]:     config.grad_sync_func(model.parameters())
[rank293]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank293]:     bucket_group.start_grad_sync()
[rank293]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank293]:     self.check_grads(
[rank293]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank293]:     rerun_state_machine.validate_result(
[rank293]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank293]:     raise RuntimeError(full_message)
[rank293]: RuntimeError: Rank 293, node node037, device 5, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank292]: Traceback (most recent call last):
[rank292]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank292]:     pretrain(
[rank292]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank292]:     iteration, num_floating_point_operations_so_far = train(
[rank292]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank292]:     train_step(forward_step_func,
[rank292]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank292]:     losses_reduced = forward_backward_func(
[rank292]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank292]:     config.grad_sync_func(model.parameters())
[rank292]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank292]:     bucket_group.start_grad_sync()
[rank292]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank292]:     self.check_grads(
[rank292]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank292]:     rerun_state_machine.validate_result(
[rank292]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank292]:     raise RuntimeError(full_message)
[rank292]: RuntimeError: Rank 292, node node037, device 4, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank289]: Traceback (most recent call last):
[rank289]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank289]:     pretrain(
[rank289]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank289]:     iteration, num_floating_point_operations_so_far = train(
[rank289]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank289]:     train_step(forward_step_func,
[rank289]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank289]:     losses_reduced = forward_backward_func(
[rank289]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank289]:     config.grad_sync_func(model.parameters())
[rank289]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank289]:     bucket_group.start_grad_sync()
[rank289]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank289]:     self.check_grads(
[rank289]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank289]:     rerun_state_machine.validate_result(
[rank289]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank289]:     raise RuntimeError(full_message)
[rank289]: RuntimeError: Rank 289, node node037, device 1, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank288]: Traceback (most recent call last):
[rank288]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank288]:     pretrain(
[rank288]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank288]:     iteration, num_floating_point_operations_so_far = train(
[rank288]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank288]:     train_step(forward_step_func,
[rank288]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank288]:     losses_reduced = forward_backward_func(
[rank288]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank288]:     config.grad_sync_func(model.parameters())
[rank288]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank288]:     bucket_group.start_grad_sync()
[rank288]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank288]:     self.check_grads(
[rank288]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank288]:     rerun_state_machine.validate_result(
[rank288]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank288]:     raise RuntimeError(full_message)
[rank288]: RuntimeError: Rank 288, node node037, device 0, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank290]: Traceback (most recent call last):
[rank290]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank290]:     pretrain(
[rank290]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank290]:     iteration, num_floating_point_operations_so_far = train(
[rank290]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank290]:     train_step(forward_step_func,
[rank290]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank290]:     losses_reduced = forward_backward_func(
[rank290]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank290]:     config.grad_sync_func(model.parameters())
[rank290]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank290]:     bucket_group.start_grad_sync()
[rank290]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank290]:     self.check_grads(
[rank290]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank290]:     rerun_state_machine.validate_result(
[rank290]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank290]:     raise RuntimeError(full_message)
[rank290]: RuntimeError: Rank 290, node node037, device 2, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank294]: Traceback (most recent call last):
[rank294]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank294]:     pretrain(
[rank294]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank294]:     iteration, num_floating_point_operations_so_far = train(
[rank294]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank294]:     train_step(forward_step_func,
[rank294]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank294]:     losses_reduced = forward_backward_func(
[rank294]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank294]:     config.grad_sync_func(model.parameters())
[rank294]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank294]:     bucket_group.start_grad_sync()
[rank294]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank294]:     self.check_grads(
[rank294]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank294]:     rerun_state_machine.validate_result(
[rank294]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank294]:     raise RuntimeError(full_message)
[rank294]: RuntimeError: Rank 294, node node037, device 6, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank291]: Traceback (most recent call last):
[rank291]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank291]:     pretrain(
[rank291]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank291]:     iteration, num_floating_point_operations_so_far = train(
[rank291]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank291]:     train_step(forward_step_func,
[rank291]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank291]:     losses_reduced = forward_backward_func(
[rank291]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank291]:     config.grad_sync_func(model.parameters())
[rank291]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank291]:     bucket_group.start_grad_sync()
[rank291]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank291]:     self.check_grads(
[rank291]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank291]:     rerun_state_machine.validate_result(
[rank291]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank291]:     raise RuntimeError(full_message)
[rank291]: RuntimeError: Rank 291, node node037, device 3, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
[rank229]: Traceback (most recent call last):
[rank229]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank229]:     pretrain(
[rank229]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank229]:     iteration, num_floating_point_operations_so_far = train(
[rank229]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank229]:     train_step(forward_step_func,
[rank229]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank229]:     losses_reduced = forward_backward_func(
[rank229]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank229]:     config.grad_sync_func(model.parameters())
[rank229]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank229]:     bucket_group.start_grad_sync()
[rank229]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank229]:     self.check_grads(
[rank229]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank229]:     rerun_state_machine.validate_result(
[rank229]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank229]:     raise RuntimeError(full_message)
[rank229]: RuntimeError: Rank 229, node node029, device 5, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank230]: Traceback (most recent call last):
[rank230]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank230]:     pretrain(
[rank230]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank230]:     iteration, num_floating_point_operations_so_far = train(
[rank230]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank230]:     train_step(forward_step_func,
[rank230]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank230]:     losses_reduced = forward_backward_func(
[rank230]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank230]:     config.grad_sync_func(model.parameters())
[rank230]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank230]:     bucket_group.start_grad_sync()
[rank230]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank230]:     self.check_grads(
[rank230]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank230]:     rerun_state_machine.validate_result(
[rank230]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank230]:     raise RuntimeError(full_message)
[rank230]: RuntimeError: Rank 230, node node029, device 6, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank228]: Traceback (most recent call last):
[rank228]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank228]:     pretrain(
[rank228]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank228]:     iteration, num_floating_point_operations_so_far = train(
[rank228]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank228]:     train_step(forward_step_func,
[rank228]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank228]:     losses_reduced = forward_backward_func(
[rank228]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank228]:     config.grad_sync_func(model.parameters())
[rank228]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank228]:     bucket_group.start_grad_sync()
[rank228]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank228]:     self.check_grads(
[rank228]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank228]:     rerun_state_machine.validate_result(
[rank228]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank228]:     raise RuntimeError(full_message)
[rank228]: RuntimeError: Rank 228, node node029, device 4, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank231]: Traceback (most recent call last):
[rank231]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank231]:     pretrain(
[rank231]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank231]:     iteration, num_floating_point_operations_so_far = train(
[rank231]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank231]:     train_step(forward_step_func,
[rank231]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank231]:     losses_reduced = forward_backward_func(
[rank231]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank231]:     config.grad_sync_func(model.parameters())
[rank231]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank231]:     bucket_group.start_grad_sync()
[rank231]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank231]:     self.check_grads(
[rank231]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank231]:     rerun_state_machine.validate_result(
[rank231]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank231]:     raise RuntimeError(full_message)
[rank231]: RuntimeError: Rank 231, node node029, device 7, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank224]: Traceback (most recent call last):
[rank224]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank224]:     pretrain(
[rank224]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank224]:     iteration, num_floating_point_operations_so_far = train(
[rank224]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank224]:     train_step(forward_step_func,
[rank224]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank224]:     losses_reduced = forward_backward_func(
[rank224]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank224]:     config.grad_sync_func(model.parameters())
[rank224]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank224]:     bucket_group.start_grad_sync()
[rank224]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank224]:     self.check_grads(
[rank224]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank224]:     rerun_state_machine.validate_result(
[rank224]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank224]:     raise RuntimeError(full_message)
[rank224]: RuntimeError: Rank 224, node node029, device 0, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank225]: Traceback (most recent call last):
[rank225]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank225]:     pretrain(
[rank225]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank225]:     iteration, num_floating_point_operations_so_far = train(
[rank225]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank225]:     train_step(forward_step_func,
[rank225]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank225]:     losses_reduced = forward_backward_func(
[rank225]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank225]:     config.grad_sync_func(model.parameters())
[rank225]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank225]:     bucket_group.start_grad_sync()
[rank225]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank225]:     self.check_grads(
[rank225]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank225]:     rerun_state_machine.validate_result(
[rank225]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank225]:     raise RuntimeError(full_message)
[rank225]: RuntimeError: Rank 225, node node029, device 1, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank226]: Traceback (most recent call last):
[rank226]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank226]:     pretrain(
[rank226]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank226]:     iteration, num_floating_point_operations_so_far = train(
[rank226]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank226]:     train_step(forward_step_func,
[rank226]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank226]:     losses_reduced = forward_backward_func(
[rank226]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank226]:     config.grad_sync_func(model.parameters())
[rank226]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank226]:     bucket_group.start_grad_sync()
[rank226]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank226]:     self.check_grads(
[rank226]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank226]:     rerun_state_machine.validate_result(
[rank226]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank226]:     raise RuntimeError(full_message)
[rank226]: RuntimeError: Rank 226, node node029, device 2, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank227]: Traceback (most recent call last):
[rank227]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank227]:     pretrain(
[rank227]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank227]:     iteration, num_floating_point_operations_so_far = train(
[rank227]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank227]:     train_step(forward_step_func,
[rank227]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank227]:     losses_reduced = forward_backward_func(
[rank227]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank227]:     config.grad_sync_func(model.parameters())
[rank227]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank227]:     bucket_group.start_grad_sync()
[rank227]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank227]:     self.check_grads(
[rank227]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank227]:     rerun_state_machine.validate_result(
[rank227]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank227]:     raise RuntimeError(full_message)
[rank227]: RuntimeError: Rank 227, node node029, device 3, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
[rank166]: Traceback (most recent call last):
[rank166]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank166]:     pretrain(
[rank166]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank166]:     iteration, num_floating_point_operations_so_far = train(
[rank166]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank166]:     train_step(forward_step_func,
[rank166]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank166]:     losses_reduced = forward_backward_func(
[rank166]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank166]:     config.grad_sync_func(model.parameters())
[rank166]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank166]:     bucket_group.start_grad_sync()
[rank166]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank166]:     self.check_grads(
[rank166]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank166]:     rerun_state_machine.validate_result(
[rank166]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank166]:     raise RuntimeError(full_message)
[rank166]: RuntimeError: Rank 166, node node021, device 6, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank161]: Traceback (most recent call last):
[rank161]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank161]:     pretrain(
[rank161]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank161]:     iteration, num_floating_point_operations_so_far = train(
[rank161]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank161]:     train_step(forward_step_func,
[rank161]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank161]:     losses_reduced = forward_backward_func(
[rank161]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank161]:     config.grad_sync_func(model.parameters())
[rank161]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank161]:     bucket_group.start_grad_sync()
[rank161]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank161]:     self.check_grads(
[rank161]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank161]:     rerun_state_machine.validate_result(
[rank161]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank161]:     raise RuntimeError(full_message)
[rank161]: RuntimeError: Rank 161, node node021, device 1, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank162]: Traceback (most recent call last):
[rank162]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank162]:     pretrain(
[rank162]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank162]:     iteration, num_floating_point_operations_so_far = train(
[rank162]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank162]:     train_step(forward_step_func,
[rank162]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank162]:     losses_reduced = forward_backward_func(
[rank162]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank162]:     config.grad_sync_func(model.parameters())
[rank162]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank162]:     bucket_group.start_grad_sync()
[rank162]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank162]:     self.check_grads(
[rank162]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank162]:     rerun_state_machine.validate_result(
[rank162]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank162]:     raise RuntimeError(full_message)
[rank162]: RuntimeError: Rank 162, node node021, device 2, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank163]: Traceback (most recent call last):
[rank163]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank163]:     pretrain(
[rank163]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank163]:     iteration, num_floating_point_operations_so_far = train(
[rank163]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank163]:     train_step(forward_step_func,
[rank163]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank163]:     losses_reduced = forward_backward_func(
[rank163]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank163]:     config.grad_sync_func(model.parameters())
[rank163]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank163]:     bucket_group.start_grad_sync()
[rank163]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank163]:     self.check_grads(
[rank163]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank163]:     rerun_state_machine.validate_result(
[rank163]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank163]:     raise RuntimeError(full_message)
[rank163]: RuntimeError: Rank 163, node node021, device 3, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank164]: Traceback (most recent call last):
[rank164]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank164]:     pretrain(
[rank164]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank164]:     iteration, num_floating_point_operations_so_far = train(
[rank164]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank164]:     train_step(forward_step_func,
[rank164]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank164]:     losses_reduced = forward_backward_func(
[rank164]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank164]:     config.grad_sync_func(model.parameters())
[rank164]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank164]:     bucket_group.start_grad_sync()
[rank164]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank164]:     self.check_grads(
[rank164]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank164]:     rerun_state_machine.validate_result(
[rank164]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank164]:     raise RuntimeError(full_message)
[rank164]: RuntimeError: Rank 164, node node021, device 4, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank165]: Traceback (most recent call last):
[rank165]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank165]:     pretrain(
[rank165]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank165]:     iteration, num_floating_point_operations_so_far = train(
[rank165]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank165]:     train_step(forward_step_func,
[rank165]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank165]:     losses_reduced = forward_backward_func(
[rank165]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank165]:     config.grad_sync_func(model.parameters())
[rank165]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank165]:     bucket_group.start_grad_sync()
[rank165]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank165]:     self.check_grads(
[rank165]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank165]:     rerun_state_machine.validate_result(
[rank165]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank165]:     raise RuntimeError(full_message)
[rank165]: RuntimeError: Rank 165, node node021, device 5, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank160]: Traceback (most recent call last):
[rank160]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank160]:     pretrain(
[rank160]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank160]:     iteration, num_floating_point_operations_so_far = train(
[rank160]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank160]:     train_step(forward_step_func,
[rank160]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank160]:     losses_reduced = forward_backward_func(
[rank160]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank160]:     config.grad_sync_func(model.parameters())
[rank160]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank160]:     bucket_group.start_grad_sync()
[rank160]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank160]:     self.check_grads(
[rank160]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank160]:     rerun_state_machine.validate_result(
[rank160]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank160]:     raise RuntimeError(full_message)
[rank160]: RuntimeError: Rank 160, node node021, device 0, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
[rank167]: Traceback (most recent call last):
[rank167]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank167]:     pretrain(
[rank167]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank167]:     iteration, num_floating_point_operations_so_far = train(
[rank167]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank167]:     train_step(forward_step_func,
[rank167]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank167]:     losses_reduced = forward_backward_func(
[rank167]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank167]:     config.grad_sync_func(model.parameters())
[rank167]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank167]:     bucket_group.start_grad_sync()
[rank167]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank167]:     self.check_grads(
[rank167]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank167]:     rerun_state_machine.validate_result(
[rank167]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank167]:     raise RuntimeError(full_message)
[rank167]: RuntimeError: Rank 167, node node021, device 7, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.216394 824196 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.230665 824165 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
[rank99]: Traceback (most recent call last):
[rank99]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank99]:     pretrain(
[rank99]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank99]:     iteration, num_floating_point_operations_so_far = train(
[rank99]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank99]:     train_step(forward_step_func,
[rank99]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank99]:     losses_reduced = forward_backward_func(
[rank99]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank99]:     config.grad_sync_func(model.parameters())
[rank99]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank99]:     bucket_group.start_grad_sync()
[rank99]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank99]:     self.check_grads(
[rank99]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank99]:     rerun_state_machine.validate_result(
[rank99]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank99]:     raise RuntimeError(full_message)
[rank99]: RuntimeError: Rank 99, node node013, device 3, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank100]: Traceback (most recent call last):
[rank100]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank100]:     pretrain(
[rank100]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank100]:     iteration, num_floating_point_operations_so_far = train(
[rank100]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank100]:     train_step(forward_step_func,
[rank100]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank100]:     losses_reduced = forward_backward_func(
[rank100]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank100]:     config.grad_sync_func(model.parameters())
[rank100]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank100]:     bucket_group.start_grad_sync()
[rank100]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank100]:     self.check_grads(
[rank100]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank100]:     rerun_state_machine.validate_result(
[rank100]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank100]:     raise RuntimeError(full_message)
[rank100]: RuntimeError: Rank 100, node node013, device 4, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank96]: Traceback (most recent call last):
[rank96]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank96]:     pretrain(
[rank96]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank96]:     iteration, num_floating_point_operations_so_far = train(
[rank96]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank96]:     train_step(forward_step_func,
[rank96]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank96]:     losses_reduced = forward_backward_func(
[rank96]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank96]:     config.grad_sync_func(model.parameters())
[rank96]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank96]:     bucket_group.start_grad_sync()
[rank96]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank96]:     self.check_grads(
[rank96]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank96]:     rerun_state_machine.validate_result(
[rank96]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank96]:     raise RuntimeError(full_message)
[rank96]: RuntimeError: Rank 96, node node013, device 0, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank97]: Traceback (most recent call last):
[rank97]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank97]:     pretrain(
[rank97]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank97]:     iteration, num_floating_point_operations_so_far = train(
[rank97]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank97]:     train_step(forward_step_func,
[rank97]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank97]:     losses_reduced = forward_backward_func(
[rank97]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank97]:     config.grad_sync_func(model.parameters())
[rank97]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank97]:     bucket_group.start_grad_sync()
[rank97]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank97]:     self.check_grads(
[rank97]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank97]:     rerun_state_machine.validate_result(
[rank97]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank97]:     raise RuntimeError(full_message)
[rank97]: RuntimeError: Rank 97, node node013, device 1, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank103]: Traceback (most recent call last):
[rank103]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank103]:     pretrain(
[rank103]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank103]:     iteration, num_floating_point_operations_so_far = train(
[rank103]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank103]:     train_step(forward_step_func,
[rank103]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank103]:     losses_reduced = forward_backward_func(
[rank103]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank103]:     config.grad_sync_func(model.parameters())
[rank103]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank103]:     bucket_group.start_grad_sync()
[rank103]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank103]:     self.check_grads(
[rank103]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank103]:     rerun_state_machine.validate_result(
[rank103]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank103]:     raise RuntimeError(full_message)
[rank103]: RuntimeError: Rank 103, node node013, device 7, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank101]: Traceback (most recent call last):
[rank101]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank101]:     pretrain(
[rank101]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank101]:     iteration, num_floating_point_operations_so_far = train(
[rank101]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank101]:     train_step(forward_step_func,
[rank101]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank101]:     losses_reduced = forward_backward_func(
[rank101]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank101]:     config.grad_sync_func(model.parameters())
[rank101]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank101]:     bucket_group.start_grad_sync()
[rank101]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank101]:     self.check_grads(
[rank101]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank101]:     rerun_state_machine.validate_result(
[rank101]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank101]:     raise RuntimeError(full_message)
[rank101]: RuntimeError: Rank 101, node node013, device 5, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank102]: Traceback (most recent call last):
[rank102]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank102]:     pretrain(
[rank102]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank102]:     iteration, num_floating_point_operations_so_far = train(
[rank102]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank102]:     train_step(forward_step_func,
[rank102]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank102]:     losses_reduced = forward_backward_func(
[rank102]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank102]:     config.grad_sync_func(model.parameters())
[rank102]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank102]:     bucket_group.start_grad_sync()
[rank102]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank102]:     self.check_grads(
[rank102]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank102]:     rerun_state_machine.validate_result(
[rank102]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank102]:     raise RuntimeError(full_message)
[rank102]: RuntimeError: Rank 102, node node013, device 6, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank98]: Traceback (most recent call last):
[rank98]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank98]:     pretrain(
[rank98]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank98]:     iteration, num_floating_point_operations_so_far = train(
[rank98]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank98]:     train_step(forward_step_func,
[rank98]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank98]:     losses_reduced = forward_backward_func(
[rank98]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1958, in forward_backward_pipelining_without_interleaving
[rank98]:     config.grad_sync_func(model.parameters())
[rank98]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 468, in start_grad_sync
[rank98]:     bucket_group.start_grad_sync()
[rank98]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank98]:     self.check_grads(
[rank98]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank98]:     rerun_state_machine.validate_result(
[rank98]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank98]:     raise RuntimeError(full_message)
[rank98]: RuntimeError: Rank 98, node node013, device 2, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
[rank23]: Traceback (most recent call last):
[rank23]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank23]:     pretrain(
[rank23]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank23]:     iteration, num_floating_point_operations_so_far = train(
[rank23]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank23]:     train_step(forward_step_func,
[rank23]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank23]:     losses_reduced = forward_backward_func(
[rank23]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank23]:     input_tensor_grad = backward_step(
[rank23]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank23]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank23]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank23]:     Variable._execution_engine.run_backward(
[rank23]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank23]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank23]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank23]:     self.start_grad_sync()
[rank23]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank23]:     self.check_grads(
[rank23]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank23]:     rerun_state_machine.validate_result(
[rank23]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank23]:     raise RuntimeError(full_message)
[rank23]: RuntimeError: Rank 23, node node003, device 7, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
[rank17]: Traceback (most recent call last):
[rank17]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank17]:     pretrain(
[rank17]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank17]:     iteration, num_floating_point_operations_so_far = train(
[rank17]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank17]:     train_step(forward_step_func,
[rank17]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank17]:     losses_reduced = forward_backward_func(
[rank17]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank17]:     input_tensor_grad = backward_step(
[rank17]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank17]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank17]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank17]:     Variable._execution_engine.run_backward(
[rank17]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank17]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank17]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank17]:     self.start_grad_sync()
[rank17]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank17]:     self.check_grads(
[rank17]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank17]:     rerun_state_machine.validate_result(
[rank17]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank17]:     raise RuntimeError(full_message)
[rank17]: RuntimeError: Rank 17, node node003, device 1, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank19]: Traceback (most recent call last):
[rank19]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank19]:     pretrain(
[rank19]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank19]:     iteration, num_floating_point_operations_so_far = train(
[rank19]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank19]:     train_step(forward_step_func,
[rank19]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank19]:     losses_reduced = forward_backward_func(
[rank19]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank19]:     input_tensor_grad = backward_step(
[rank19]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank19]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank19]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank19]:     Variable._execution_engine.run_backward(
[rank19]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank19]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank19]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank19]:     self.start_grad_sync()
[rank19]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank19]:     self.check_grads(
[rank19]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank19]:     rerun_state_machine.validate_result(
[rank19]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank19]:     raise RuntimeError(full_message)
[rank19]: RuntimeError: Rank 19, node node003, device 3, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank34]: Traceback (most recent call last):
[rank34]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank34]:     pretrain(
[rank34]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank34]:     iteration, num_floating_point_operations_so_far = train(
[rank34]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank34]:     train_step(forward_step_func,
[rank34]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank34]:     losses_reduced = forward_backward_func(
[rank34]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank34]:     input_tensor_grad = backward_step(
[rank34]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank34]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank34]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank34]:     Variable._execution_engine.run_backward(
[rank34]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank34]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank34]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank34]:     self.start_grad_sync()
[rank34]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank34]:     self.check_grads(
[rank34]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank34]:     rerun_state_machine.validate_result(
[rank34]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank34]:     raise RuntimeError(full_message)
[rank34]: RuntimeError: Rank 34, node node005, device 2, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank18]: Traceback (most recent call last):
[rank18]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank18]:     pretrain(
[rank18]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank18]:     iteration, num_floating_point_operations_so_far = train(
[rank18]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank18]:     train_step(forward_step_func,
[rank18]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank18]:     losses_reduced = forward_backward_func(
[rank18]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank18]:     input_tensor_grad = backward_step(
[rank18]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank18]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank18]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank18]:     Variable._execution_engine.run_backward(
[rank18]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank18]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank18]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank18]:     self.start_grad_sync()
[rank18]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank18]:     self.check_grads(
[rank18]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank18]:     rerun_state_machine.validate_result(
[rank18]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank18]:     raise RuntimeError(full_message)
[rank18]: RuntimeError: Rank 18, node node003, device 2, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank33]: Traceback (most recent call last):
[rank33]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank33]:     pretrain(
[rank33]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank33]:     iteration, num_floating_point_operations_so_far = train(
[rank33]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank33]:     train_step(forward_step_func,
[rank33]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank33]:     losses_reduced = forward_backward_func(
[rank33]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank33]:     input_tensor_grad = backward_step(
[rank33]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank33]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank33]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank33]:     Variable._execution_engine.run_backward(
[rank33]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank33]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank33]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank33]:     self.start_grad_sync()
[rank33]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank33]:     self.check_grads(
[rank33]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank33]:     rerun_state_machine.validate_result(
[rank33]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank33]:     raise RuntimeError(full_message)
[rank33]: RuntimeError: Rank 33, node node005, device 1, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank36]: Traceback (most recent call last):
[rank36]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank36]:     pretrain(
[rank36]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank36]:     iteration, num_floating_point_operations_so_far = train(
[rank36]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank36]:     train_step(forward_step_func,
[rank36]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank36]:     losses_reduced = forward_backward_func(
[rank36]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank36]:     input_tensor_grad = backward_step(
[rank36]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank36]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank36]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank36]:     Variable._execution_engine.run_backward(
[rank36]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank36]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank36]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank36]:     self.start_grad_sync()
[rank36]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank36]:     self.check_grads(
[rank36]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank36]:     rerun_state_machine.validate_result(
[rank36]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank36]:     raise RuntimeError(full_message)
[rank36]: RuntimeError: Rank 36, node node005, device 4, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank22]: Traceback (most recent call last):
[rank22]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank22]:     pretrain(
[rank22]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank22]:     iteration, num_floating_point_operations_so_far = train(
[rank22]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank22]:     train_step(forward_step_func,
[rank22]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank22]:     losses_reduced = forward_backward_func(
[rank22]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank22]:     input_tensor_grad = backward_step(
[rank22]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank22]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank22]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank22]:     Variable._execution_engine.run_backward(
[rank22]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank22]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank22]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank22]:     self.start_grad_sync()
[rank22]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank22]:     self.check_grads(
[rank22]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank22]:     rerun_state_machine.validate_result(
[rank22]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank22]:     raise RuntimeError(full_message)
[rank22]: RuntimeError: Rank 22, node node003, device 6, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank39]: Traceback (most recent call last):
[rank39]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank39]:     pretrain(
[rank39]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank39]:     iteration, num_floating_point_operations_so_far = train(
[rank39]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank39]:     train_step(forward_step_func,
[rank39]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank39]:     losses_reduced = forward_backward_func(
[rank39]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank39]:     input_tensor_grad = backward_step(
[rank39]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank39]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank39]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank39]:     Variable._execution_engine.run_backward(
[rank39]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank39]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank39]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank39]:     self.start_grad_sync()
[rank39]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank39]:     self.check_grads(
[rank39]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank39]:     rerun_state_machine.validate_result(
[rank39]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank39]:     raise RuntimeError(full_message)
[rank39]: RuntimeError: Rank 39, node node005, device 7, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank16]: Traceback (most recent call last):
[rank16]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank16]:     pretrain(
[rank16]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank16]:     iteration, num_floating_point_operations_so_far = train(
[rank16]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank16]:     train_step(forward_step_func,
[rank16]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank16]:     losses_reduced = forward_backward_func(
[rank16]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank16]:     input_tensor_grad = backward_step(
[rank16]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank16]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank16]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank16]:     Variable._execution_engine.run_backward(
[rank16]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank16]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank16]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank16]:     self.start_grad_sync()
[rank16]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank16]:     self.check_grads(
[rank16]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank16]:     rerun_state_machine.validate_result(
[rank16]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank16]:     raise RuntimeError(full_message)
[rank16]: RuntimeError: Rank 16, node node003, device 0, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank35]: Traceback (most recent call last):
[rank35]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank35]:     pretrain(
[rank35]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank35]:     iteration, num_floating_point_operations_so_far = train(
[rank35]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank35]:     train_step(forward_step_func,
[rank35]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank35]:     losses_reduced = forward_backward_func(
[rank35]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank35]:     input_tensor_grad = backward_step(
[rank35]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank35]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank35]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank35]:     Variable._execution_engine.run_backward(
[rank35]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank35]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank35]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank35]:     self.start_grad_sync()
[rank35]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank35]:     self.check_grads(
[rank35]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank35]:     rerun_state_machine.validate_result(
[rank35]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank35]:     raise RuntimeError(full_message)
[rank35]: RuntimeError: Rank 35, node node005, device 3, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank32]: Traceback (most recent call last):
[rank32]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank32]:     pretrain(
[rank32]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank32]:     iteration, num_floating_point_operations_so_far = train(
[rank32]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank32]:     train_step(forward_step_func,
[rank32]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank32]:     losses_reduced = forward_backward_func(
[rank32]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank32]:     input_tensor_grad = backward_step(
[rank32]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank32]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank32]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank32]:     Variable._execution_engine.run_backward(
[rank32]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank32]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank32]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank32]:     self.start_grad_sync()
[rank32]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank32]:     self.check_grads(
[rank32]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank32]:     rerun_state_machine.validate_result(
[rank32]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank32]:     raise RuntimeError(full_message)
[rank32]: RuntimeError: Rank 32, node node005, device 0, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank38]: Traceback (most recent call last):
[rank38]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank38]:     pretrain(
[rank38]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank38]:     iteration, num_floating_point_operations_so_far = train(
[rank38]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank38]:     train_step(forward_step_func,
[rank38]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank38]:     losses_reduced = forward_backward_func(
[rank38]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank38]:     input_tensor_grad = backward_step(
[rank38]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank38]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank38]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank38]:     Variable._execution_engine.run_backward(
[rank38]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank38]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank38]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank38]:     self.start_grad_sync()
[rank38]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank38]:     self.check_grads(
[rank38]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank38]:     rerun_state_machine.validate_result(
[rank38]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank38]:     raise RuntimeError(full_message)
[rank38]: RuntimeError: Rank 38, node node005, device 6, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank20]: Traceback (most recent call last):
[rank20]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank20]:     pretrain(
[rank20]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank20]:     iteration, num_floating_point_operations_so_far = train(
[rank20]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank20]:     train_step(forward_step_func,
[rank20]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank20]:     losses_reduced = forward_backward_func(
[rank20]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank20]:     input_tensor_grad = backward_step(
[rank20]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank20]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank20]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank20]:     Variable._execution_engine.run_backward(
[rank20]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank20]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank20]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank20]:     self.start_grad_sync()
[rank20]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank20]:     self.check_grads(
[rank20]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank20]:     rerun_state_machine.validate_result(
[rank20]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank20]:     raise RuntimeError(full_message)
[rank20]: RuntimeError: Rank 20, node node003, device 4, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank37]: Traceback (most recent call last):
[rank37]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank37]:     pretrain(
[rank37]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank37]:     iteration, num_floating_point_operations_so_far = train(
[rank37]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank37]:     train_step(forward_step_func,
[rank37]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank37]:     losses_reduced = forward_backward_func(
[rank37]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank37]:     input_tensor_grad = backward_step(
[rank37]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank37]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank37]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank37]:     Variable._execution_engine.run_backward(
[rank37]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank37]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank37]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank37]:     self.start_grad_sync()
[rank37]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank37]:     self.check_grads(
[rank37]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank37]:     rerun_state_machine.validate_result(
[rank37]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank37]:     raise RuntimeError(full_message)
[rank37]: RuntimeError: Rank 37, node node005, device 5, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank21]: Traceback (most recent call last):
[rank21]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank21]:     pretrain(
[rank21]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank21]:     iteration, num_floating_point_operations_so_far = train(
[rank21]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank21]:     train_step(forward_step_func,
[rank21]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank21]:     losses_reduced = forward_backward_func(
[rank21]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank21]:     input_tensor_grad = backward_step(
[rank21]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank21]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank21]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank21]:     Variable._execution_engine.run_backward(
[rank21]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank21]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank21]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank21]:     self.start_grad_sync()
[rank21]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank21]:     self.check_grads(
[rank21]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank21]:     rerun_state_machine.validate_result(
[rank21]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank21]:     raise RuntimeError(full_message)
[rank21]: RuntimeError: Rank 21, node node003, device 5, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
[rank57]: Traceback (most recent call last):
[rank57]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank57]:     pretrain(
[rank57]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank57]:     iteration, num_floating_point_operations_so_far = train(
[rank57]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank57]:     train_step(forward_step_func,
[rank57]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank57]:     losses_reduced = forward_backward_func(
[rank57]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank57]:     input_tensor_grad = backward_step(
[rank57]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank57]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank57]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank57]:     Variable._execution_engine.run_backward(
[rank57]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank57]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank57]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank57]:     self.start_grad_sync()
[rank57]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank57]:     self.check_grads(
[rank57]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank57]:     rerun_state_machine.validate_result(
[rank57]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank57]:     raise RuntimeError(full_message)
[rank57]: RuntimeError: Rank 57, node node008, device 1, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank60]: Traceback (most recent call last):
[rank60]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank60]:     pretrain(
[rank60]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank60]:     iteration, num_floating_point_operations_so_far = train(
[rank60]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank60]:     train_step(forward_step_func,
[rank60]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank60]:     losses_reduced = forward_backward_func(
[rank60]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank60]:     input_tensor_grad = backward_step(
[rank60]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank60]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank60]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank60]:     Variable._execution_engine.run_backward(
[rank60]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank60]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank60]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank60]:     self.start_grad_sync()
[rank60]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank60]:     self.check_grads(
[rank60]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank60]:     rerun_state_machine.validate_result(
[rank60]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank60]:     raise RuntimeError(full_message)
[rank60]: RuntimeError: Rank 60, node node008, device 4, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank59]: Traceback (most recent call last):
[rank59]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank59]:     pretrain(
[rank59]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank59]:     iteration, num_floating_point_operations_so_far = train(
[rank59]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank59]:     train_step(forward_step_func,
[rank59]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank59]:     losses_reduced = forward_backward_func(
[rank59]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank59]:     input_tensor_grad = backward_step(
[rank59]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank59]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank59]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank59]:     Variable._execution_engine.run_backward(
[rank59]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank59]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank59]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank59]:     self.start_grad_sync()
[rank59]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank59]:     self.check_grads(
[rank59]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank59]:     rerun_state_machine.validate_result(
[rank59]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank59]:     raise RuntimeError(full_message)
[rank59]: RuntimeError: Rank 59, node node008, device 3, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank61]: Traceback (most recent call last):
[rank61]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank61]:     pretrain(
[rank61]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank61]:     iteration, num_floating_point_operations_so_far = train(
[rank61]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank61]:     train_step(forward_step_func,
[rank61]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank61]:     losses_reduced = forward_backward_func(
[rank61]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank61]:     input_tensor_grad = backward_step(
[rank61]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank61]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank61]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank61]:     Variable._execution_engine.run_backward(
[rank61]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank61]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank61]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank61]:     self.start_grad_sync()
[rank61]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank61]:     self.check_grads(
[rank61]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank61]:     rerun_state_machine.validate_result(
[rank61]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank61]:     raise RuntimeError(full_message)
[rank61]: RuntimeError: Rank 61, node node008, device 5, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank63]: Traceback (most recent call last):
[rank63]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank63]:     pretrain(
[rank63]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank63]:     iteration, num_floating_point_operations_so_far = train(
[rank63]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank63]:     train_step(forward_step_func,
[rank63]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank63]:     losses_reduced = forward_backward_func(
[rank63]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank63]:     input_tensor_grad = backward_step(
[rank63]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank63]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank63]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank63]:     Variable._execution_engine.run_backward(
[rank63]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank63]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank63]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank63]:     self.start_grad_sync()
[rank63]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank63]:     self.check_grads(
[rank63]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank63]:     rerun_state_machine.validate_result(
[rank63]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank63]:     raise RuntimeError(full_message)
[rank63]: RuntimeError: Rank 63, node node008, device 7, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank56]: Traceback (most recent call last):
[rank56]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank56]:     pretrain(
[rank56]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank56]:     iteration, num_floating_point_operations_so_far = train(
[rank56]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank56]:     train_step(forward_step_func,
[rank56]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank56]:     losses_reduced = forward_backward_func(
[rank56]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank56]:     input_tensor_grad = backward_step(
[rank56]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank56]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank56]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank56]:     Variable._execution_engine.run_backward(
[rank56]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank56]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank56]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank56]:     self.start_grad_sync()
[rank56]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank56]:     self.check_grads(
[rank56]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank56]:     rerun_state_machine.validate_result(
[rank56]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank56]:     raise RuntimeError(full_message)
[rank56]: RuntimeError: Rank 56, node node008, device 0, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank62]: Traceback (most recent call last):
[rank62]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank62]:     pretrain(
[rank62]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank62]:     iteration, num_floating_point_operations_so_far = train(
[rank62]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank62]:     train_step(forward_step_func,
[rank62]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank62]:     losses_reduced = forward_backward_func(
[rank62]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank62]:     input_tensor_grad = backward_step(
[rank62]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank62]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank62]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank62]:     Variable._execution_engine.run_backward(
[rank62]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank62]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank62]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank62]:     self.start_grad_sync()
[rank62]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank62]:     self.check_grads(
[rank62]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank62]:     rerun_state_machine.validate_result(
[rank62]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank62]:     raise RuntimeError(full_message)
[rank62]: RuntimeError: Rank 62, node node008, device 6, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank58]: Traceback (most recent call last):
[rank58]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank58]:     pretrain(
[rank58]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank58]:     iteration, num_floating_point_operations_so_far = train(
[rank58]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank58]:     train_step(forward_step_func,
[rank58]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank58]:     losses_reduced = forward_backward_func(
[rank58]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank58]:     input_tensor_grad = backward_step(
[rank58]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank58]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank58]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank58]:     Variable._execution_engine.run_backward(
[rank58]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank58]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank58]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank58]:     self.start_grad_sync()
[rank58]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank58]:     self.check_grads(
[rank58]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank58]:     rerun_state_machine.validate_result(
[rank58]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank58]:     raise RuntimeError(full_message)
[rank58]: RuntimeError: Rank 58, node node008, device 2, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank31]: Traceback (most recent call last):
[rank31]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank31]:     pretrain(
[rank31]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank31]:     iteration, num_floating_point_operations_so_far = train(
[rank31]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank31]:     train_step(forward_step_func,
[rank31]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank31]:     losses_reduced = forward_backward_func(
[rank31]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank31]:     input_tensor_grad = backward_step(
[rank31]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank31]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank31]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank31]:     Variable._execution_engine.run_backward(
[rank31]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank31]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank31]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank31]:     self.start_grad_sync()
[rank31]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank31]:     self.check_grads(
[rank31]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank31]:     rerun_state_machine.validate_result(
[rank31]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank31]:     raise RuntimeError(full_message)
[rank31]: RuntimeError: Rank 31, node node004, device 7, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank27]: Traceback (most recent call last):
[rank27]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank27]:     pretrain(
[rank27]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank27]:     iteration, num_floating_point_operations_so_far = train(
[rank27]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank27]:     train_step(forward_step_func,
[rank27]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank27]:     losses_reduced = forward_backward_func(
[rank27]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank27]:     input_tensor_grad = backward_step(
[rank27]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank27]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank27]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank27]:     Variable._execution_engine.run_backward(
[rank27]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank27]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank27]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank27]:     self.start_grad_sync()
[rank27]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank27]:     self.check_grads(
[rank27]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank27]:     rerun_state_machine.validate_result(
[rank27]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank27]:     raise RuntimeError(full_message)
[rank27]: RuntimeError: Rank 27, node node004, device 3, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank25]: Traceback (most recent call last):
[rank25]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank25]:     pretrain(
[rank25]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank25]:     iteration, num_floating_point_operations_so_far = train(
[rank25]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank25]:     train_step(forward_step_func,
[rank25]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank25]:     losses_reduced = forward_backward_func(
[rank25]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank25]:     input_tensor_grad = backward_step(
[rank25]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank25]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank25]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank25]:     Variable._execution_engine.run_backward(
[rank25]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank25]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank25]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank25]:     self.start_grad_sync()
[rank25]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank25]:     self.check_grads(
[rank25]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank25]:     rerun_state_machine.validate_result(
[rank25]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank25]:     raise RuntimeError(full_message)
[rank25]: RuntimeError: Rank 25, node node004, device 1, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank30]: Traceback (most recent call last):
[rank30]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank30]:     pretrain(
[rank30]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank30]:     iteration, num_floating_point_operations_so_far = train(
[rank30]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank30]:     train_step(forward_step_func,
[rank30]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank30]:     losses_reduced = forward_backward_func(
[rank30]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank30]:     input_tensor_grad = backward_step(
[rank30]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank30]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank30]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank30]:     Variable._execution_engine.run_backward(
[rank30]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank30]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank30]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank30]:     self.start_grad_sync()
[rank30]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank30]:     self.check_grads(
[rank30]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank30]:     rerun_state_machine.validate_result(
[rank30]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank30]:     raise RuntimeError(full_message)
[rank30]: RuntimeError: Rank 30, node node004, device 6, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank28]: Traceback (most recent call last):
[rank28]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank28]:     pretrain(
[rank28]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank28]:     iteration, num_floating_point_operations_so_far = train(
[rank28]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank28]:     train_step(forward_step_func,
[rank28]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank28]:     losses_reduced = forward_backward_func(
[rank28]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank28]:     input_tensor_grad = backward_step(
[rank28]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank28]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank28]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank28]:     Variable._execution_engine.run_backward(
[rank28]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank28]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank28]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank28]:     self.start_grad_sync()
[rank28]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank28]:     self.check_grads(
[rank28]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank28]:     rerun_state_machine.validate_result(
[rank28]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank28]:     raise RuntimeError(full_message)
[rank28]: RuntimeError: Rank 28, node node004, device 4, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank26]: Traceback (most recent call last):
[rank26]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank26]:     pretrain(
[rank26]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank26]:     iteration, num_floating_point_operations_so_far = train(
[rank26]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank26]:     train_step(forward_step_func,
[rank26]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank26]:     losses_reduced = forward_backward_func(
[rank26]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank26]:     input_tensor_grad = backward_step(
[rank26]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank26]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank26]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank26]:     Variable._execution_engine.run_backward(
[rank26]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank26]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank26]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank26]:     self.start_grad_sync()
[rank26]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank26]:     self.check_grads(
[rank26]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank26]:     rerun_state_machine.validate_result(
[rank26]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank26]:     raise RuntimeError(full_message)
[rank26]: RuntimeError: Rank 26, node node004, device 2, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank24]: Traceback (most recent call last):
[rank24]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank24]:     pretrain(
[rank24]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank24]:     iteration, num_floating_point_operations_so_far = train(
[rank24]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank24]:     train_step(forward_step_func,
[rank24]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank24]:     losses_reduced = forward_backward_func(
[rank24]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank24]:     input_tensor_grad = backward_step(
[rank24]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank24]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank24]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank24]:     Variable._execution_engine.run_backward(
[rank24]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank24]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank24]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank24]:     self.start_grad_sync()
[rank24]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank24]:     self.check_grads(
[rank24]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank24]:     rerun_state_machine.validate_result(
[rank24]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank24]:     raise RuntimeError(full_message)
[rank24]: RuntimeError: Rank 24, node node004, device 0, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank29]: Traceback (most recent call last):
[rank29]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank29]:     pretrain(
[rank29]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank29]:     iteration, num_floating_point_operations_so_far = train(
[rank29]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank29]:     train_step(forward_step_func,
[rank29]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank29]:     losses_reduced = forward_backward_func(
[rank29]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank29]:     input_tensor_grad = backward_step(
[rank29]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank29]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank29]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank29]:     Variable._execution_engine.run_backward(
[rank29]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank29]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank29]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank29]:     self.start_grad_sync()
[rank29]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank29]:     self.check_grads(
[rank29]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank29]:     rerun_state_machine.validate_result(
[rank29]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank29]:     raise RuntimeError(full_message)
[rank29]: RuntimeError: Rank 29, node node004, device 5, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank43]: Traceback (most recent call last):
[rank43]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank43]:     pretrain(
[rank43]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank43]:     iteration, num_floating_point_operations_so_far = train(
[rank43]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank43]:     train_step(forward_step_func,
[rank43]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank43]:     losses_reduced = forward_backward_func(
[rank43]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank43]:     input_tensor_grad = backward_step(
[rank43]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank43]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank43]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank43]:     Variable._execution_engine.run_backward(
[rank43]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank43]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank43]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank43]:     self.start_grad_sync()
[rank43]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank43]:     self.check_grads(
[rank43]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank43]:     rerun_state_machine.validate_result(
[rank43]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank43]:     raise RuntimeError(full_message)
[rank43]: RuntimeError: Rank 43, node node006, device 3, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank44]: Traceback (most recent call last):
[rank44]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank44]:     pretrain(
[rank44]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank44]:     iteration, num_floating_point_operations_so_far = train(
[rank44]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank44]:     train_step(forward_step_func,
[rank44]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank44]:     losses_reduced = forward_backward_func(
[rank44]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank44]:     input_tensor_grad = backward_step(
[rank44]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank44]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank44]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank44]:     Variable._execution_engine.run_backward(
[rank44]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank44]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank44]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank44]:     self.start_grad_sync()
[rank44]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank44]:     self.check_grads(
[rank44]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank44]:     rerun_state_machine.validate_result(
[rank44]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank44]:     raise RuntimeError(full_message)
[rank44]: RuntimeError: Rank 44, node node006, device 4, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank47]: Traceback (most recent call last):
[rank47]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank47]:     pretrain(
[rank47]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank47]:     iteration, num_floating_point_operations_so_far = train(
[rank47]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank47]:     train_step(forward_step_func,
[rank47]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank47]:     losses_reduced = forward_backward_func(
[rank47]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank47]:     input_tensor_grad = backward_step(
[rank47]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank47]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank47]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank47]:     Variable._execution_engine.run_backward(
[rank47]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank47]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank47]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank47]:     self.start_grad_sync()
[rank47]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank47]:     self.check_grads(
[rank47]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank47]:     rerun_state_machine.validate_result(
[rank47]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank47]:     raise RuntimeError(full_message)
[rank47]: RuntimeError: Rank 47, node node006, device 7, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank40]: Traceback (most recent call last):
[rank40]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank40]:     pretrain(
[rank40]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank40]:     iteration, num_floating_point_operations_so_far = train(
[rank40]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank40]:     train_step(forward_step_func,
[rank40]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank40]:     losses_reduced = forward_backward_func(
[rank40]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank40]:     input_tensor_grad = backward_step(
[rank40]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank40]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank40]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank40]:     Variable._execution_engine.run_backward(
[rank40]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank40]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank40]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank40]:     self.start_grad_sync()
[rank40]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank40]:     self.check_grads(
[rank40]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank40]:     rerun_state_machine.validate_result(
[rank40]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank40]:     raise RuntimeError(full_message)
[rank40]: RuntimeError: Rank 40, node node006, device 0, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank41]: Traceback (most recent call last):
[rank41]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank41]:     pretrain(
[rank41]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank41]:     iteration, num_floating_point_operations_so_far = train(
[rank41]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank41]:     train_step(forward_step_func,
[rank41]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank41]:     losses_reduced = forward_backward_func(
[rank41]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank41]:     input_tensor_grad = backward_step(
[rank41]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank41]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank41]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank41]:     Variable._execution_engine.run_backward(
[rank41]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank41]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank41]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank41]:     self.start_grad_sync()
[rank41]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank41]:     self.check_grads(
[rank41]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank41]:     rerun_state_machine.validate_result(
[rank41]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank41]:     raise RuntimeError(full_message)
[rank41]: RuntimeError: Rank 41, node node006, device 1, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank46]: Traceback (most recent call last):
[rank46]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank46]:     pretrain(
[rank46]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank46]:     iteration, num_floating_point_operations_so_far = train(
[rank46]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank46]:     train_step(forward_step_func,
[rank46]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank46]:     losses_reduced = forward_backward_func(
[rank46]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank46]:     input_tensor_grad = backward_step(
[rank46]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank46]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank46]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank46]:     Variable._execution_engine.run_backward(
[rank46]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank46]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank46]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank46]:     self.start_grad_sync()
[rank46]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank46]:     self.check_grads(
[rank46]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank46]:     rerun_state_machine.validate_result(
[rank46]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank46]:     raise RuntimeError(full_message)
[rank46]: RuntimeError: Rank 46, node node006, device 6, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank45]: Traceback (most recent call last):
[rank45]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank45]:     pretrain(
[rank45]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank45]:     iteration, num_floating_point_operations_so_far = train(
[rank45]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank45]:     train_step(forward_step_func,
[rank45]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank45]:     losses_reduced = forward_backward_func(
[rank45]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank45]:     input_tensor_grad = backward_step(
[rank45]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank45]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank45]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank45]:     Variable._execution_engine.run_backward(
[rank45]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank45]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank45]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank45]:     self.start_grad_sync()
[rank45]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank45]:     self.check_grads(
[rank45]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank45]:     rerun_state_machine.validate_result(
[rank45]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank45]:     raise RuntimeError(full_message)
[rank45]: RuntimeError: Rank 45, node node006, device 5, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank42]: Traceback (most recent call last):
[rank42]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank42]:     pretrain(
[rank42]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank42]:     iteration, num_floating_point_operations_so_far = train(
[rank42]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank42]:     train_step(forward_step_func,
[rank42]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank42]:     losses_reduced = forward_backward_func(
[rank42]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank42]:     input_tensor_grad = backward_step(
[rank42]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank42]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank42]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank42]:     Variable._execution_engine.run_backward(
[rank42]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank42]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank42]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank42]:     self.start_grad_sync()
[rank42]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank42]:     self.check_grads(
[rank42]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank42]:     rerun_state_machine.validate_result(
[rank42]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank42]:     raise RuntimeError(full_message)
[rank42]: RuntimeError: Rank 42, node node006, device 2, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
[rank54]: Traceback (most recent call last):
[rank54]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank54]:     pretrain(
[rank54]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank54]:     iteration, num_floating_point_operations_so_far = train(
[rank54]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank54]:     train_step(forward_step_func,
[rank54]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank54]:     losses_reduced = forward_backward_func(
[rank54]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank54]:     input_tensor_grad = backward_step(
[rank54]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank54]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank54]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank54]:     Variable._execution_engine.run_backward(
[rank54]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank54]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank54]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank54]:     self.start_grad_sync()
[rank54]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank54]:     self.check_grads(
[rank54]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank54]:     rerun_state_machine.validate_result(
[rank54]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank54]:     raise RuntimeError(full_message)
[rank54]: RuntimeError: Rank 54, node node007, device 6, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank51]: Traceback (most recent call last):
[rank51]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank51]:     pretrain(
[rank51]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank51]:     iteration, num_floating_point_operations_so_far = train(
[rank51]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank51]:     train_step(forward_step_func,
[rank51]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank51]:     losses_reduced = forward_backward_func(
[rank51]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank51]:     input_tensor_grad = backward_step(
[rank51]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank51]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank51]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank51]:     Variable._execution_engine.run_backward(
[rank51]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank51]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank51]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank51]:     self.start_grad_sync()
[rank51]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank51]:     self.check_grads(
[rank51]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank51]:     rerun_state_machine.validate_result(
[rank51]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank51]:     raise RuntimeError(full_message)
[rank51]: RuntimeError: Rank 51, node node007, device 3, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
[rank53]: Traceback (most recent call last):
[rank53]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank53]:     pretrain(
[rank53]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank53]:     iteration, num_floating_point_operations_so_far = train(
[rank53]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank53]:     train_step(forward_step_func,
[rank53]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank53]:     losses_reduced = forward_backward_func(
[rank53]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank53]:     input_tensor_grad = backward_step(
[rank53]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank53]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank53]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank53]:     Variable._execution_engine.run_backward(
[rank53]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank53]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank53]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank53]:     self.start_grad_sync()
[rank53]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank53]:     self.check_grads(
[rank53]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank53]:     rerun_state_machine.validate_result(
[rank53]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank53]:     raise RuntimeError(full_message)
[rank53]: RuntimeError: Rank 53, node node007, device 5, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank48]: Traceback (most recent call last):
[rank48]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank48]:     pretrain(
[rank48]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank48]:     iteration, num_floating_point_operations_so_far = train(
[rank48]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank48]:     train_step(forward_step_func,
[rank48]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank48]:     losses_reduced = forward_backward_func(
[rank48]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank48]:     input_tensor_grad = backward_step(
[rank48]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank48]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank48]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank48]:     Variable._execution_engine.run_backward(
[rank48]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank48]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank48]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank48]:     self.start_grad_sync()
[rank48]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank48]:     self.check_grads(
[rank48]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank48]:     rerun_state_machine.validate_result(
[rank48]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank48]:     raise RuntimeError(full_message)
[rank48]: RuntimeError: Rank 48, node node007, device 0, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank52]: Traceback (most recent call last):
[rank52]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank52]:     pretrain(
[rank52]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank52]:     iteration, num_floating_point_operations_so_far = train(
[rank52]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank52]:     train_step(forward_step_func,
[rank52]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank52]:     losses_reduced = forward_backward_func(
[rank52]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank52]:     input_tensor_grad = backward_step(
[rank52]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank52]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank52]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank52]:     Variable._execution_engine.run_backward(
[rank52]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank52]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank52]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank52]:     self.start_grad_sync()
[rank52]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank52]:     self.check_grads(
[rank52]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank52]:     rerun_state_machine.validate_result(
[rank52]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank52]:     raise RuntimeError(full_message)
[rank52]: RuntimeError: Rank 52, node node007, device 4, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank55]: Traceback (most recent call last):
[rank55]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank55]:     pretrain(
[rank55]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank55]:     iteration, num_floating_point_operations_so_far = train(
[rank55]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank55]:     train_step(forward_step_func,
[rank55]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank55]:     losses_reduced = forward_backward_func(
[rank55]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank55]:     input_tensor_grad = backward_step(
[rank55]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank55]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank55]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank55]:     Variable._execution_engine.run_backward(
[rank55]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank55]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank55]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank55]:     self.start_grad_sync()
[rank55]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank55]:     self.check_grads(
[rank55]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank55]:     rerun_state_machine.validate_result(
[rank55]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank55]:     raise RuntimeError(full_message)
[rank55]: RuntimeError: Rank 55, node node007, device 7, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank50]: Traceback (most recent call last):
[rank50]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank50]:     pretrain(
[rank50]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank50]:     iteration, num_floating_point_operations_so_far = train(
[rank50]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank50]:     train_step(forward_step_func,
[rank50]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank50]:     losses_reduced = forward_backward_func(
[rank50]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank50]:     input_tensor_grad = backward_step(
[rank50]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank50]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank50]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank50]:     Variable._execution_engine.run_backward(
[rank50]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank50]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank50]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank50]:     self.start_grad_sync()
[rank50]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank50]:     self.check_grads(
[rank50]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank50]:     rerun_state_machine.validate_result(
[rank50]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank50]:     raise RuntimeError(full_message)
[rank50]: RuntimeError: Rank 50, node node007, device 2, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank49]: Traceback (most recent call last):
[rank49]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank49]:     pretrain(
[rank49]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank49]:     iteration, num_floating_point_operations_so_far = train(
[rank49]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank49]:     train_step(forward_step_func,
[rank49]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank49]:     losses_reduced = forward_backward_func(
[rank49]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank49]:     input_tensor_grad = backward_step(
[rank49]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank49]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank49]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank49]:     Variable._execution_engine.run_backward(
[rank49]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank49]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank49]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank49]:     self.start_grad_sync()
[rank49]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank49]:     self.check_grads(
[rank49]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank49]:     rerun_state_machine.validate_result(
[rank49]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank49]:     raise RuntimeError(full_message)
[rank49]: RuntimeError: Rank 49, node node007, device 1, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
[rank1]: Traceback (most recent call last):
[rank1]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank1]:     pretrain(
[rank1]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank1]:     iteration, num_floating_point_operations_so_far = train(
[rank1]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank1]:     train_step(forward_step_func,
[rank1]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank1]:     losses_reduced = forward_backward_func(
[rank1]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank1]:     input_tensor_grad = backward_step(
[rank1]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank1]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank1]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank1]:     Variable._execution_engine.run_backward(
[rank1]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank1]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank1]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank1]:     self.start_grad_sync()
[rank1]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank1]:     self.check_grads(
[rank1]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank1]:     rerun_state_machine.validate_result(
[rank1]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank1]:     raise RuntimeError(full_message)
[rank1]: RuntimeError: Rank 1, node node001, device 1, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
[rank4]: Traceback (most recent call last):
[rank4]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank4]:     pretrain(
[rank4]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank4]:     iteration, num_floating_point_operations_so_far = train(
[rank4]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank4]:     train_step(forward_step_func,
[rank4]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank4]:     losses_reduced = forward_backward_func(
[rank4]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank4]:     input_tensor_grad = backward_step(
[rank4]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank4]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank4]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank4]:     Variable._execution_engine.run_backward(
[rank4]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank4]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank4]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank4]:     self.start_grad_sync()
[rank4]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank4]:     self.check_grads(
[rank4]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank4]:     rerun_state_machine.validate_result(
[rank4]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank4]:     raise RuntimeError(full_message)
[rank4]: RuntimeError: Rank 4, node node001, device 4, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank2]: Traceback (most recent call last):
[rank2]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank2]:     pretrain(
[rank2]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank2]:     iteration, num_floating_point_operations_so_far = train(
[rank2]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank2]:     train_step(forward_step_func,
[rank2]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank2]:     losses_reduced = forward_backward_func(
[rank2]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank2]:     input_tensor_grad = backward_step(
[rank2]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank2]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank2]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank2]:     Variable._execution_engine.run_backward(
[rank2]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank2]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank2]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank2]:     self.start_grad_sync()
[rank2]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank2]:     self.check_grads(
[rank2]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank2]:     rerun_state_machine.validate_result(
[rank2]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank2]:     raise RuntimeError(full_message)
[rank2]: RuntimeError: Rank 2, node node001, device 2, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank0]: Traceback (most recent call last):
[rank0]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank0]:     pretrain(
[rank0]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank0]:     iteration, num_floating_point_operations_so_far = train(
[rank0]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank0]:     train_step(forward_step_func,
[rank0]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank0]:     losses_reduced = forward_backward_func(
[rank0]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank0]:     input_tensor_grad = backward_step(
[rank0]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank0]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank0]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank0]:     Variable._execution_engine.run_backward(
[rank0]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank0]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank0]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank0]:     self.start_grad_sync()
[rank0]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank0]:     self.check_grads(
[rank0]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank0]:     rerun_state_machine.validate_result(
[rank0]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank0]:     raise RuntimeError(full_message)
[rank0]: RuntimeError: Rank 0, node node001, device 0, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank7]: Traceback (most recent call last):
[rank7]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank7]:     pretrain(
[rank7]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank7]:     iteration, num_floating_point_operations_so_far = train(
[rank7]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank7]:     train_step(forward_step_func,
[rank7]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank7]:     losses_reduced = forward_backward_func(
[rank7]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank7]:     input_tensor_grad = backward_step(
[rank7]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank7]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank7]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank7]:     Variable._execution_engine.run_backward(
[rank7]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank7]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank7]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank7]:     self.start_grad_sync()
[rank7]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank7]:     self.check_grads(
[rank7]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank7]:     rerun_state_machine.validate_result(
[rank7]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank7]:     raise RuntimeError(full_message)
[rank7]: RuntimeError: Rank 7, node node001, device 7, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank10]: Traceback (most recent call last):
[rank10]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank10]:     pretrain(
[rank10]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank10]:     iteration, num_floating_point_operations_so_far = train(
[rank10]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank10]:     train_step(forward_step_func,
[rank10]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank10]:     losses_reduced = forward_backward_func(
[rank10]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank10]:     input_tensor_grad = backward_step(
[rank10]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank10]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank10]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank10]:     Variable._execution_engine.run_backward(
[rank10]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank10]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank10]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank10]:     self.start_grad_sync()
[rank10]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank10]:     self.check_grads(
[rank10]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank10]:     rerun_state_machine.validate_result(
[rank10]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank10]:     raise RuntimeError(full_message)
[rank10]: RuntimeError: Rank 10, node node002, device 2, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank3]: Traceback (most recent call last):
[rank3]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank3]:     pretrain(
[rank3]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank3]:     iteration, num_floating_point_operations_so_far = train(
[rank3]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank3]:     train_step(forward_step_func,
[rank3]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank3]:     losses_reduced = forward_backward_func(
[rank3]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank3]:     input_tensor_grad = backward_step(
[rank3]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank3]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank3]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank3]:     Variable._execution_engine.run_backward(
[rank3]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank3]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank3]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank3]:     self.start_grad_sync()
[rank3]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank3]:     self.check_grads(
[rank3]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank3]:     rerun_state_machine.validate_result(
[rank3]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank3]:     raise RuntimeError(full_message)
[rank3]: RuntimeError: Rank 3, node node001, device 3, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank6]: Traceback (most recent call last):
[rank6]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank6]:     pretrain(
[rank6]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank6]:     iteration, num_floating_point_operations_so_far = train(
[rank6]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank6]:     train_step(forward_step_func,
[rank6]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank6]:     losses_reduced = forward_backward_func(
[rank6]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank6]:     input_tensor_grad = backward_step(
[rank6]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank6]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank6]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank6]:     Variable._execution_engine.run_backward(
[rank6]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank6]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank6]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank6]:     self.start_grad_sync()
[rank6]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank6]:     self.check_grads(
[rank6]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank6]:     rerun_state_machine.validate_result(
[rank6]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank6]:     raise RuntimeError(full_message)
[rank6]: RuntimeError: Rank 6, node node001, device 6, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank5]: Traceback (most recent call last):
[rank5]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank5]:     pretrain(
[rank5]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank5]:     iteration, num_floating_point_operations_so_far = train(
[rank5]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank5]:     train_step(forward_step_func,
[rank5]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank5]:     losses_reduced = forward_backward_func(
[rank5]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank5]:     input_tensor_grad = backward_step(
[rank5]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank5]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank5]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank5]:     Variable._execution_engine.run_backward(
[rank5]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank5]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank5]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank5]:     self.start_grad_sync()
[rank5]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank5]:     self.check_grads(
[rank5]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank5]:     rerun_state_machine.validate_result(
[rank5]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank5]:     raise RuntimeError(full_message)
[rank5]: RuntimeError: Rank 5, node node001, device 5, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank15]: Traceback (most recent call last):
[rank15]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank15]:     pretrain(
[rank15]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank15]:     iteration, num_floating_point_operations_so_far = train(
[rank15]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank15]:     train_step(forward_step_func,
[rank15]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank15]:     losses_reduced = forward_backward_func(
[rank15]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank15]:     input_tensor_grad = backward_step(
[rank15]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank15]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank15]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank15]:     Variable._execution_engine.run_backward(
[rank15]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank15]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank15]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank15]:     self.start_grad_sync()
[rank15]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank15]:     self.check_grads(
[rank15]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank15]:     rerun_state_machine.validate_result(
[rank15]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank15]:     raise RuntimeError(full_message)
[rank15]: RuntimeError: Rank 15, node node002, device 7, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank14]: Traceback (most recent call last):
[rank14]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank14]:     pretrain(
[rank14]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank14]:     iteration, num_floating_point_operations_so_far = train(
[rank14]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank14]:     train_step(forward_step_func,
[rank14]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank14]:     losses_reduced = forward_backward_func(
[rank14]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank14]:     input_tensor_grad = backward_step(
[rank14]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank14]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank14]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank14]:     Variable._execution_engine.run_backward(
[rank14]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank14]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank14]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank14]:     self.start_grad_sync()
[rank14]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank14]:     self.check_grads(
[rank14]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank14]:     rerun_state_machine.validate_result(
[rank14]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank14]:     raise RuntimeError(full_message)
[rank14]: RuntimeError: Rank 14, node node002, device 6, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank9]: Traceback (most recent call last):
[rank9]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank9]:     pretrain(
[rank9]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank9]:     iteration, num_floating_point_operations_so_far = train(
[rank9]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank9]:     train_step(forward_step_func,
[rank9]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank9]:     losses_reduced = forward_backward_func(
[rank9]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank9]:     input_tensor_grad = backward_step(
[rank9]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank9]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank9]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank9]:     Variable._execution_engine.run_backward(
[rank9]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank9]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank9]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank9]:     self.start_grad_sync()
[rank9]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank9]:     self.check_grads(
[rank9]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank9]:     rerun_state_machine.validate_result(
[rank9]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank9]:     raise RuntimeError(full_message)
[rank9]: RuntimeError: Rank 9, node node002, device 1, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank11]: Traceback (most recent call last):
[rank11]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank11]:     pretrain(
[rank11]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank11]:     iteration, num_floating_point_operations_so_far = train(
[rank11]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank11]:     train_step(forward_step_func,
[rank11]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank11]:     losses_reduced = forward_backward_func(
[rank11]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank11]:     input_tensor_grad = backward_step(
[rank11]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank11]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank11]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank11]:     Variable._execution_engine.run_backward(
[rank11]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank11]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank11]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank11]:     self.start_grad_sync()
[rank11]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank11]:     self.check_grads(
[rank11]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank11]:     rerun_state_machine.validate_result(
[rank11]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank11]:     raise RuntimeError(full_message)
[rank11]: RuntimeError: Rank 11, node node002, device 3, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank13]: Traceback (most recent call last):
[rank13]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank13]:     pretrain(
[rank13]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank13]:     iteration, num_floating_point_operations_so_far = train(
[rank13]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank13]:     train_step(forward_step_func,
[rank13]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank13]:     losses_reduced = forward_backward_func(
[rank13]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank13]:     input_tensor_grad = backward_step(
[rank13]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank13]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank13]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank13]:     Variable._execution_engine.run_backward(
[rank13]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank13]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank13]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank13]:     self.start_grad_sync()
[rank13]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank13]:     self.check_grads(
[rank13]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank13]:     rerun_state_machine.validate_result(
[rank13]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank13]:     raise RuntimeError(full_message)
[rank13]: RuntimeError: Rank 13, node node002, device 5, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank12]: Traceback (most recent call last):
[rank12]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank12]:     pretrain(
[rank12]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank12]:     iteration, num_floating_point_operations_so_far = train(
[rank12]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank12]:     train_step(forward_step_func,
[rank12]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank12]:     losses_reduced = forward_backward_func(
[rank12]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank12]:     input_tensor_grad = backward_step(
[rank12]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank12]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank12]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank12]:     Variable._execution_engine.run_backward(
[rank12]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank12]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank12]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank12]:     self.start_grad_sync()
[rank12]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank12]:     self.check_grads(
[rank12]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank12]:     rerun_state_machine.validate_result(
[rank12]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank12]:     raise RuntimeError(full_message)
[rank12]: RuntimeError: Rank 12, node node002, device 4, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
[rank8]: Traceback (most recent call last):
[rank8]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/pretrain_gpt.py", line 332, in <module>
[rank8]:     pretrain(
[rank8]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 726, in pretrain
[rank8]:     iteration, num_floating_point_operations_so_far = train(
[rank8]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1933, in train
[rank8]:     train_step(forward_step_func,
[rank8]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/training/training.py", line 1172, in train_step
[rank8]:     losses_reduced = forward_backward_func(
[rank8]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 1948, in forward_backward_pipelining_without_interleaving
[rank8]:     input_tensor_grad = backward_step(
[rank8]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 390, in backward_step
[rank8]:     custom_backward(output_tensor[0], output_tensor_grad[0])
[rank8]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/pipeline_parallel/schedules.py", line 151, in custom_backward
[rank8]:     Variable._execution_engine.run_backward(
[rank8]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/distributed_data_parallel.py", line 419, in hook
[rank8]:     self.param_to_bucket_group[param].register_grad_ready(param)
[rank8]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 436, in register_grad_ready
[rank8]:     self.start_grad_sync()
[rank8]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 293, in start_grad_sync
[rank8]:     self.check_grads(
[rank8]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/distributed/param_and_grad_buffer.py", line 165, in check_grads
[rank8]:     rerun_state_machine.validate_result(
[rank8]:   File "/public/home/xingjl/Pai-Megatron-Patch/Megatron-LM-250328/megatron/core/rerun_state_machine.py", line 505, in validate_result
[rank8]:     raise RuntimeError(full_message)
[rank8]: RuntimeError: Rank 8, node node002, device 0, iteration 1: Unexpected result nan (message='found NaN in local grad norm for bucket #0 in backward pass before data-parallel communication collective')
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.439374 798579 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.450114 798552 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
Launch params (512, 1, 1) are larger than launch bounds (256) for kernel _Z7cleanupPfS_S_S_bi please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! 
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.673267 795496 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.675290 795460 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.728755 835081 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.731657 835073 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.736871 835064 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.737082 835076 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.737226 835083 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.737305 835082 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.770220 817887 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.771901 817886 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.771907 805678 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.773792 817868 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.776507 817885 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.780223 805679 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.780740 805677 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.780773 805387 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.781816 805388 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.786180 817880 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.789774 817870 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.789984 805362 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.789937 805655 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.790418 805389 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.790532 241793 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.791110 805669 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.795140 241792 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.800132 241770 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.800652 810851 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.801040 241788 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.802098 810850 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.802829 241774 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.802799 810836 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.802853 805370 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.802915 805663 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.803643 810840 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.803714 805378 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.803769 241786 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.804901 806753 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.809046 810849 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.810297 831349 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.810786 831347 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.811847 831348 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.812871 831334 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.813175 806736 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.813697 831339 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.814023 806746 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.816587 806752 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.818207 810826 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.819989 831325 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.820408 806754 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:08.824036 806728 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:09.107303 804776 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:09.113539 804737 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:09.403609 823643 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:09.404567 823617 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:09.581561 830616 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0429 14:00:09.582937 830636 ProcessGroupNCCL.cpp:1185] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[54091,1],422]
  Exit code:    1
--------------------------------------------------------------------------
