-------------------- end of arguments ---------------------
> building HuggingFaceTokenizer tokenizer ...
[WARNING | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode validate_results
[WARNING | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode validate_results
[WARNING | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode validate_results
[WARNING | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode validate_results
> padded vocab (size: 151669) with 395 dummy tokens (new size: 152064)
[WARNING | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode validate_results
> initializing torch distributed ...
[WARNING | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode validate_results
[WARNING | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode validate_results
WARNING: TensorBoard writing requested but is not available (are you using PyTorch 1.1.0 or later?), no TensorBoard logs will be written.
WARNING: one_logger package is required to enable e2e metrics tracking. please go to https://confluence.nvidia.com/display/MLWFO/Package+Repositories for details to install it
[WARNING | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode validate_results
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0211 17:48:16.207494 216834 ProcessGroupNCCL.cpp:4232] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0211 17:48:16.211829 216828 ProcessGroupNCCL.cpp:4232] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0211 17:48:16.313028 216826 ProcessGroupNCCL.cpp:4232] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0211 17:48:16.313149 216833 ProcessGroupNCCL.cpp:4232] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0211 17:48:16.380734 216832 ProcessGroupNCCL.cpp:4232] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0211 17:48:16.382851 216831 ProcessGroupNCCL.cpp:4232] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0211 17:48:16.387229 216830 ProcessGroupNCCL.cpp:4232] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
>>> done with dataset index builder. Compilation time: 0.039 seconds
> compiling and loading fused kernels ...
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0211 17:48:16.423564 216829 ProcessGroupNCCL.cpp:4232] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
>>> done with compiling and loading fused kernels. Compilation time: 1.197 seconds
time to initialize megatron (seconds): 5.946
[after megatron is initialized] datetime: 2026-02-11 17:48:18
building GPT model ...
> number of parameters on (tensor, pipeline) model parallel rank (2, 0): 218293248
W0211 18:11:46.632072 216828 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W0211 18:11:46.632174 216826 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W0211 18:11:46.656929 216831 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W0211 18:11:46.702957 216833 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W0211 18:11:46.704921 216834 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W0211 18:11:46.719092 216832 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W0211 18:11:46.756732 216829 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W0211 18:11:46.805349 216830 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
-------------------- end of arguments ---------------------
> building HuggingFaceTokenizer tokenizer ...
[WARNING | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode validate_results
[WARNING | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode validate_results
[WARNING | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode validate_results
> padded vocab (size: 151669) with 395 dummy tokens (new size: 152064)
[WARNING | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode validate_results
> initializing torch distributed ...
[WARNING | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode validate_results
[WARNING | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode validate_results
WARNING: TensorBoard writing requested but is not available (are you using PyTorch 1.1.0 or later?), no TensorBoard logs will be written.
WARNING: one_logger package is required to enable e2e metrics tracking. please go to https://confluence.nvidia.com/display/MLWFO/Package+Repositories for details to install it
[WARNING | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode validate_results
[WARNING | megatron.core.rerun_state_machine]: RerunStateMachine initialized in mode validate_results
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0211 16:50:02.731345 186405 ProcessGroupNCCL.cpp:4232] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0211 16:50:02.731526 186402 ProcessGroupNCCL.cpp:4232] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0211 16:50:03.134291 186407 ProcessGroupNCCL.cpp:4232] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0211 16:50:03.135790 186395 ProcessGroupNCCL.cpp:4232] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0211 16:50:03.230739 186403 ProcessGroupNCCL.cpp:4232] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0211 16:50:03.232905 186401 ProcessGroupNCCL.cpp:4232] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0211 16:50:03.237195 186406 ProcessGroupNCCL.cpp:4232] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
>>> done with dataset index builder. Compilation time: 0.040 seconds
> compiling and loading fused kernels ...
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0211 16:50:03.283905 186404 ProcessGroupNCCL.cpp:4232] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device,or call init_process_group() with a device_id.
>>> done with compiling and loading fused kernels. Compilation time: 1.224 seconds
time to initialize megatron (seconds): 7.497
[after megatron is initialized] datetime: 2026-02-11 16:50:07
building GPT model ...
> number of parameters on (tensor, pipeline) model parallel rank (1, 0): 2048177152
> number of parameters on (tensor, pipeline) model parallel rank (2, 0): 2048177152
> number of parameters on (tensor, pipeline) model parallel rank (3, 0): 2048177152
W0211 17:04:59.713297 186405 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W0211 17:04:59.717712 186403 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W0211 17:04:59.731681 186395 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W0211 17:04:59.748812 186401 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W0211 17:04:59.750737 186407 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W0211 17:04:59.753005 186402 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W0211 17:04:59.835309 186404 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W0211 17:04:59.837489 186406 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())