W0202 14:46:27.442000 25644 lib/python3.10/dist-packages/torch/distributed/run.py:793] 
W0202 14:46:27.442000 25644 lib/python3.10/dist-packages/torch/distributed/run.py:793] *****************************************
W0202 14:46:27.442000 25644 lib/python3.10/dist-packages/torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W0202 14:46:27.442000 25644 lib/python3.10/dist-packages/torch/distributed/run.py:793] *****************************************
Namespace(model='HYVideo-T/2-cfgdistill', latent_channels=16, precision='bf16', rope_theta=256, vae='884-16c-hy', vae_precision='fp16', vae_tiling=True, text_encoder='llm', text_encoder_precision='fp16', text_states_dim=4096, text_len=256, tokenizer='llm', prompt_template='dit-llm-encode', prompt_template_video='dit-llm-encode-video', hidden_state_skip_layer=2, apply_final_norm=False, text_encoder_2='clipL', text_encoder_precision_2='fp16', text_states_dim_2=768, tokenizer_2='clipL', text_len_2=77, denoise_type='flow', flow_shift=7.0, flow_reverse=True, flow_solver='euler', use_linear_quadratic_schedule=False, linear_schedule_end=25, model_base='ckpts', dit_weight='ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt', model_resolution='540p', load_key='module', use_cpu_offload=False, batch_size=1, infer_steps=20, disable_autocast=False, save_path='./results', save_path_suffix='', name_suffix='', num_videos=1, video_size=[1280, 720], video_length=129, prompt='A cat walks on the grass, realistic style.', seed_type='auto', seed=42, neg_prompt=None, cfg_scale=1.0, embedded_cfg_scale=6.0, use_fp8=False, reproduce=False, ulysses_degree=4, ring_degree=1)
2026-02-02 14:46:32.356 | INFO     | hyvideo.inference:from_pretrained:154 - Got text-to-video model root path: ckpts
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0202 14:46:32.376147 25681 ProcessGroupNCCL.cpp:934] [PG ID 0 PG GUID 0 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0202 14:46:32.376168 25681 ProcessGroupNCCL.cpp:943] [PG ID 0 PG GUID 0 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
DEBUG 02-02 14:46:32 [parallel_state.py:200] world_size=4 rank=3 local_rank=-1 distributed_init_method=env:// backend=nccl
I0202 14:46:32.376652 25681 ProcessGroupNCCL.cpp:934] [PG ID 1 PG GUID 1 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x5584ce191680, SPLIT_COLOR: 1008299991543067201, PG Name: 1
I0202 14:46:32.376662 25681 ProcessGroupNCCL.cpp:943] [PG ID 1 PG GUID 1 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
Namespace(model='HYVideo-T/2-cfgdistill', latent_channels=16, precision='bf16', rope_theta=256, vae='884-16c-hy', vae_precision='fp16', vae_tiling=True, text_encoder='llm', text_encoder_precision='fp16', text_states_dim=4096, text_len=256, tokenizer='llm', prompt_template='dit-llm-encode', prompt_template_video='dit-llm-encode-video', hidden_state_skip_layer=2, apply_final_norm=False, text_encoder_2='clipL', text_encoder_precision_2='fp16', text_states_dim_2=768, tokenizer_2='clipL', text_len_2=77, denoise_type='flow', flow_shift=7.0, flow_reverse=True, flow_solver='euler', use_linear_quadratic_schedule=False, linear_schedule_end=25, model_base='ckpts', dit_weight='ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt', model_resolution='540p', load_key='module', use_cpu_offload=False, batch_size=1, infer_steps=20, disable_autocast=False, save_path='./results', save_path_suffix='', name_suffix='', num_videos=1, video_size=[1280, 720], video_length=129, prompt='A cat walks on the grass, realistic style.', seed_type='auto', seed=42, neg_prompt=None, cfg_scale=1.0, embedded_cfg_scale=6.0, use_fp8=False, reproduce=False, ulysses_degree=4, ring_degree=1)
2026-02-02 14:46:32.520 | INFO     | hyvideo.inference:from_pretrained:154 - Got text-to-video model root path: ckpts
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0202 14:46:32.539709 25680 ProcessGroupNCCL.cpp:934] [PG ID 0 PG GUID 0 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0202 14:46:32.539728 25680 ProcessGroupNCCL.cpp:943] [PG ID 0 PG GUID 0 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
DEBUG 02-02 14:46:32 [parallel_state.py:200] world_size=4 rank=2 local_rank=-1 distributed_init_method=env:// backend=nccl
I0202 14:46:32.540225 25680 ProcessGroupNCCL.cpp:934] [PG ID 1 PG GUID 1 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x555b1213d760, SPLIT_COLOR: 1008299991543067201, PG Name: 1
I0202 14:46:32.540235 25680 ProcessGroupNCCL.cpp:943] [PG ID 1 PG GUID 1 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
Namespace(model='HYVideo-T/2-cfgdistill', latent_channels=16, precision='bf16', rope_theta=256, vae='884-16c-hy', vae_precision='fp16', vae_tiling=True, text_encoder='llm', text_encoder_precision='fp16', text_states_dim=4096, text_len=256, tokenizer='llm', prompt_template='dit-llm-encode', prompt_template_video='dit-llm-encode-video', hidden_state_skip_layer=2, apply_final_norm=False, text_encoder_2='clipL', text_encoder_precision_2='fp16', text_states_dim_2=768, tokenizer_2='clipL', text_len_2=77, denoise_type='flow', flow_shift=7.0, flow_reverse=True, flow_solver='euler', use_linear_quadratic_schedule=False, linear_schedule_end=25, model_base='ckpts', dit_weight='ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt', model_resolution='540p', load_key='module', use_cpu_offload=False, batch_size=1, infer_steps=20, disable_autocast=False, save_path='./results', save_path_suffix='', name_suffix='', num_videos=1, video_size=[1280, 720], video_length=129, prompt='A cat walks on the grass, realistic style.', seed_type='auto', seed=42, neg_prompt=None, cfg_scale=1.0, embedded_cfg_scale=6.0, use_fp8=False, reproduce=False, ulysses_degree=4, ring_degree=1)
2026-02-02 14:46:32.610 | INFO     | hyvideo.inference:from_pretrained:154 - Got text-to-video model root path: ckpts
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0202 14:46:32.629705 25678 ProcessGroupNCCL.cpp:934] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0202 14:46:32.629725 25678 ProcessGroupNCCL.cpp:943] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
DEBUG 02-02 14:46:32 [parallel_state.py:200] world_size=4 rank=0 local_rank=-1 distributed_init_method=env:// backend=nccl
I0202 14:46:32.630208 25678 ProcessGroupNCCL.cpp:934] [PG ID 1 PG GUID 1 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x5609d74ed4e0, SPLIT_COLOR: 1008299991543067201, PG Name: 1
I0202 14:46:32.630218 25678 ProcessGroupNCCL.cpp:943] [PG ID 1 PG GUID 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
Namespace(model='HYVideo-T/2-cfgdistill', latent_channels=16, precision='bf16', rope_theta=256, vae='884-16c-hy', vae_precision='fp16', vae_tiling=True, text_encoder='llm', text_encoder_precision='fp16', text_states_dim=4096, text_len=256, tokenizer='llm', prompt_template='dit-llm-encode', prompt_template_video='dit-llm-encode-video', hidden_state_skip_layer=2, apply_final_norm=False, text_encoder_2='clipL', text_encoder_precision_2='fp16', text_states_dim_2=768, tokenizer_2='clipL', text_len_2=77, denoise_type='flow', flow_shift=7.0, flow_reverse=True, flow_solver='euler', use_linear_quadratic_schedule=False, linear_schedule_end=25, model_base='ckpts', dit_weight='ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt', model_resolution='540p', load_key='module', use_cpu_offload=False, batch_size=1, infer_steps=20, disable_autocast=False, save_path='./results', save_path_suffix='', name_suffix='', num_videos=1, video_size=[1280, 720], video_length=129, prompt='A cat walks on the grass, realistic style.', seed_type='auto', seed=42, neg_prompt=None, cfg_scale=1.0, embedded_cfg_scale=6.0, use_fp8=False, reproduce=False, ulysses_degree=4, ring_degree=1)
2026-02-02 14:46:32.643 | INFO     | hyvideo.inference:from_pretrained:154 - Got text-to-video model root path: ckpts
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0202 14:46:32.656126 25679 ProcessGroupNCCL.cpp:934] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0202 14:46:32.656149 25679 ProcessGroupNCCL.cpp:943] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
DEBUG 02-02 14:46:32 [parallel_state.py:200] world_size=4 rank=1 local_rank=-1 distributed_init_method=env:// backend=nccl
I0202 14:46:32.656703 25679 ProcessGroupNCCL.cpp:934] [PG ID 1 PG GUID 1 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x55dc94f58ff0, SPLIT_COLOR: 1008299991543067201, PG Name: 1
I0202 14:46:32.656714 25679 ProcessGroupNCCL.cpp:943] [PG ID 1 PG GUID 1 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.663307 25679 ProcessGroupNCCL.cpp:934] [PG ID 2 PG GUID 5 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 5
I0202 14:46:32.663318 25679 ProcessGroupNCCL.cpp:943] [PG ID 2 PG GUID 5 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.663484 25681 ProcessGroupNCCL.cpp:934] [PG ID 2 PG GUID 9 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 9
I0202 14:46:32.663501 25681 ProcessGroupNCCL.cpp:943] [PG ID 2 PG GUID 9 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.664474 25679 ProcessGroupNCCL.cpp:934] [PG ID 3 PG GUID 13 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 13
I0202 14:46:32.664485 25679 ProcessGroupNCCL.cpp:943] [PG ID 3 PG GUID 13 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.664862 25681 ProcessGroupNCCL.cpp:934] [PG ID 3 PG GUID 17 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 17
I0202 14:46:32.664876 25681 ProcessGroupNCCL.cpp:943] [PG ID 3 PG GUID 17 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.665795 25679 ProcessGroupNCCL.cpp:934] [PG ID 4 PG GUID 21 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 21
I0202 14:46:32.665805 25679 ProcessGroupNCCL.cpp:943] [PG ID 4 PG GUID 21 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.666021 25681 ProcessGroupNCCL.cpp:934] [PG ID 4 PG GUID 25 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 25
I0202 14:46:32.666033 25681 ProcessGroupNCCL.cpp:943] [PG ID 4 PG GUID 25 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.666982 25679 ProcessGroupNCCL.cpp:934] [PG ID 5 PG GUID 28 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 28
I0202 14:46:32.666992 25679 ProcessGroupNCCL.cpp:943] [PG ID 5 PG GUID 28 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.667066 25681 ProcessGroupNCCL.cpp:934] [PG ID 5 PG GUID 30 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 30
I0202 14:46:32.667078 25681 ProcessGroupNCCL.cpp:943] [PG ID 5 PG GUID 30 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.667369 25681 ProcessGroupNCCL.cpp:934] [PG ID 6 PG GUID 31 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x5584ce191680, SPLIT_COLOR: 1008299991543067201, PG Name: 31
I0202 14:46:32.667380 25681 ProcessGroupNCCL.cpp:943] [PG ID 6 PG GUID 31 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.667392 25679 ProcessGroupNCCL.cpp:934] [PG ID 6 PG GUID 31 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x55dc94f58ff0, SPLIT_COLOR: 1008299991543067201, PG Name: 31
I0202 14:46:32.667403 25679 ProcessGroupNCCL.cpp:943] [PG ID 6 PG GUID 31 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.667716 25679 ProcessGroupNCCL.cpp:934] [PG ID 7 PG GUID 33 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 33
I0202 14:46:32.667724 25679 ProcessGroupNCCL.cpp:943] [PG ID 7 PG GUID 33 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.667734 25681 ProcessGroupNCCL.cpp:934] [PG ID 7 PG GUID 35 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 35
I0202 14:46:32.667743 25681 ProcessGroupNCCL.cpp:943] [PG ID 7 PG GUID 35 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.668346 25681 ProcessGroupNCCL.cpp:934] [PG ID 8 PG GUID 36 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x5584ce191680, SPLIT_COLOR: 1008299991543067201, PG Name: 36
I0202 14:46:32.668357 25681 ProcessGroupNCCL.cpp:943] [PG ID 8 PG GUID 36 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.668365 25679 ProcessGroupNCCL.cpp:934] [PG ID 8 PG GUID 36 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x55dc94f58ff0, SPLIT_COLOR: 1008299991543067201, PG Name: 36
I0202 14:46:32.668375 25679 ProcessGroupNCCL.cpp:943] [PG ID 8 PG GUID 36 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.669420 25680 ProcessGroupNCCL.cpp:934] [PG ID 2 PG GUID 7 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 7
I0202 14:46:32.669442 25680 ProcessGroupNCCL.cpp:943] [PG ID 2 PG GUID 7 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.671015 25680 ProcessGroupNCCL.cpp:934] [PG ID 3 PG GUID 15 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 15
I0202 14:46:32.671026 25680 ProcessGroupNCCL.cpp:943] [PG ID 3 PG GUID 15 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.672633 25680 ProcessGroupNCCL.cpp:934] [PG ID 4 PG GUID 23 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 23
I0202 14:46:32.672645 25680 ProcessGroupNCCL.cpp:943] [PG ID 4 PG GUID 23 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.674185 25680 ProcessGroupNCCL.cpp:934] [PG ID 5 PG GUID 29 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 29
I0202 14:46:32.674197 25680 ProcessGroupNCCL.cpp:943] [PG ID 5 PG GUID 29 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.674296 25678 ProcessGroupNCCL.cpp:934] [PG ID 2 PG GUID 3 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 3
I0202 14:46:32.674309 25678 ProcessGroupNCCL.cpp:943] [PG ID 2 PG GUID 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.674491 25680 ProcessGroupNCCL.cpp:934] [PG ID 6 PG GUID 31 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x555b1213d760, SPLIT_COLOR: 1008299991543067201, PG Name: 31
I0202 14:46:32.674500 25680 ProcessGroupNCCL.cpp:943] [PG ID 6 PG GUID 31 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.674768 25680 ProcessGroupNCCL.cpp:934] [PG ID 7 PG GUID 34 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 34
I0202 14:46:32.674780 25680 ProcessGroupNCCL.cpp:943] [PG ID 7 PG GUID 34 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.675452 25678 ProcessGroupNCCL.cpp:934] [PG ID 3 PG GUID 11 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 11
I0202 14:46:32.675464 25678 ProcessGroupNCCL.cpp:943] [PG ID 3 PG GUID 11 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.675571 25680 ProcessGroupNCCL.cpp:934] [PG ID 8 PG GUID 36 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x555b1213d760, SPLIT_COLOR: 1008299991543067201, PG Name: 36
I0202 14:46:32.675581 25680 ProcessGroupNCCL.cpp:943] [PG ID 8 PG GUID 36 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.676553 25678 ProcessGroupNCCL.cpp:934] [PG ID 4 PG GUID 19 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 19
I0202 14:46:32.676564 25678 ProcessGroupNCCL.cpp:943] [PG ID 4 PG GUID 19 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.677577 25678 ProcessGroupNCCL.cpp:934] [PG ID 5 PG GUID 27 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 27
I0202 14:46:32.677589 25678 ProcessGroupNCCL.cpp:943] [PG ID 5 PG GUID 27 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.677919 25678 ProcessGroupNCCL.cpp:934] [PG ID 6 PG GUID 31 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x5609d74ed4e0, SPLIT_COLOR: 1008299991543067201, PG Name: 31
I0202 14:46:32.677930 25678 ProcessGroupNCCL.cpp:943] [PG ID 6 PG GUID 31 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.678265 25678 ProcessGroupNCCL.cpp:934] [PG ID 7 PG GUID 32 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 32
I0202 14:46:32.678274 25678 ProcessGroupNCCL.cpp:943] [PG ID 7 PG GUID 32 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.678610 25678 ProcessGroupNCCL.cpp:934] [PG ID 8 PG GUID 36 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x5609d74ed4e0, SPLIT_COLOR: 1008299991543067201, PG Name: 36
I0202 14:46:32.678620 25678 ProcessGroupNCCL.cpp:943] [PG ID 8 PG GUID 36 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.680130 25678 ProcessGroupNCCL.cpp:934] [PG ID 9 PG GUID 38 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 38
I0202 14:46:32.680141 25678 ProcessGroupNCCL.cpp:943] [PG ID 9 PG GUID 38 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.680330 25680 ProcessGroupNCCL.cpp:934] [PG ID 9 PG GUID 42 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 42
I0202 14:46:32.680341 25680 ProcessGroupNCCL.cpp:943] [PG ID 9 PG GUID 42 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.680353 25679 ProcessGroupNCCL.cpp:934] [PG ID 9 PG GUID 40 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 40
I0202 14:46:32.680363 25679 ProcessGroupNCCL.cpp:943] [PG ID 9 PG GUID 40 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.680583 25681 ProcessGroupNCCL.cpp:934] [PG ID 9 PG GUID 44 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 44
I0202 14:46:32.680593 25681 ProcessGroupNCCL.cpp:943] [PG ID 9 PG GUID 44 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.691612 25679 ProcessGroupNCCL.cpp:934] [PG ID 10 PG GUID 46 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x55dc94f58ff0, SPLIT_COLOR: 1008299991543067201, PG Name: 46
I0202 14:46:32.691623 25679 ProcessGroupNCCL.cpp:943] [PG ID 10 PG GUID 46 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:46:32.691968 25680 ProcessGroupNCCL.cpp:934] [PG ID 10 PG GUID 46 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x555b1213d760, SPLIT_COLOR: 1008299991543067201, PG Name: 46
I0202 14:46:32.691979 25680 ProcessGroupNCCL.cpp:943] [PG ID 10 PG GUID 46 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
2026-02-02 14:46:32.691 | INFO     | hyvideo.inference:from_pretrained:189 - Building model...
2026-02-02 14:46:32.692 | INFO     | hyvideo.inference:from_pretrained:189 - Building model...
I0202 14:46:32.692821 25678 ProcessGroupNCCL.cpp:934] [PG ID 10 PG GUID 46 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x5609d74ed4e0, SPLIT_COLOR: 1008299991543067201, PG Name: 46
I0202 14:46:32.692833 25678 ProcessGroupNCCL.cpp:943] [PG ID 10 PG GUID 46 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
2026-02-02 14:46:32.692 | INFO     | hyvideo.inference:from_pretrained:189 - Building model...
I0202 14:46:32.693317 25681 ProcessGroupNCCL.cpp:934] [PG ID 10 PG GUID 46 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x5584ce191680, SPLIT_COLOR: 1008299991543067201, PG Name: 46
I0202 14:46:32.693328 25681 ProcessGroupNCCL.cpp:943] [PG ID 10 PG GUID 46 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
2026-02-02 14:46:32.693 | INFO     | hyvideo.inference:from_pretrained:189 - Building model...
2026-02-02 14:46:33.243 | INFO     | hyvideo.inference:load_state_dict:340 - Loading torch model ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt...
/workspace/cicd/HunyuanVideo-t2v/hyvideo/inference.py:341: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
2026-02-02 14:46:33.294 | INFO     | hyvideo.inference:load_state_dict:340 - Loading torch model ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt...
/workspace/cicd/HunyuanVideo-t2v/hyvideo/inference.py:341: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
2026-02-02 14:46:33.300 | INFO     | hyvideo.inference:load_state_dict:340 - Loading torch model ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt...
/workspace/cicd/HunyuanVideo-t2v/hyvideo/inference.py:341: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
2026-02-02 14:46:33.304 | INFO     | hyvideo.inference:load_state_dict:340 - Loading torch model ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt...
/workspace/cicd/HunyuanVideo-t2v/hyvideo/inference.py:341: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
2026-02-02 14:46:47.429 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-t2v-720p/vae
2026-02-02 14:46:47.966 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-t2v-720p/vae
/workspace/cicd/HunyuanVideo-t2v/hyvideo/vae/__init__.py:39: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(vae_ckpt, map_location=vae.device)
/workspace/cicd/HunyuanVideo-t2v/hyvideo/vae/__init__.py:39: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(vae_ckpt, map_location=vae.device)
2026-02-02 14:46:49.446 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
2026-02-02 14:46:49.584 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (llm) from: ./ckpts/text_encoder
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
2026-02-02 14:46:49.651 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]2026-02-02 14:46:49.731 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (llm) from: ./ckpts/text_encoder
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]2026-02-02 14:46:49.861 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-t2v-720p/vae
2026-02-02 14:46:50.011 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-t2v-720p/vae
/workspace/cicd/HunyuanVideo-t2v/hyvideo/vae/__init__.py:39: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(vae_ckpt, map_location=vae.device)
/workspace/cicd/HunyuanVideo-t2v/hyvideo/vae/__init__.py:39: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(vae_ckpt, map_location=vae.device)
2026-02-02 14:46:52.149 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
2026-02-02 14:46:52.228 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
2026-02-02 14:46:52.270 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (llm) from: ./ckpts/text_encoder
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
2026-02-02 14:46:52.375 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (llm) from: ./ckpts/text_encoder
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]Loading checkpoint shards:  25%|██▌       | 1/4 [00:03<00:11,  3.73s/it]Loading checkpoint shards:  25%|██▌       | 1/4 [00:03<00:11,  3.92s/it]Loading checkpoint shards:  25%|██▌       | 1/4 [00:03<00:11,  3.90s/it]Loading checkpoint shards:  25%|██▌       | 1/4 [00:04<00:12,  4.23s/it]Loading checkpoint shards:  50%|█████     | 2/4 [00:08<00:08,  4.20s/it]Loading checkpoint shards:  50%|█████     | 2/4 [00:08<00:08,  4.28s/it]Loading checkpoint shards:  50%|█████     | 2/4 [00:07<00:07,  3.92s/it]Loading checkpoint shards:  50%|█████     | 2/4 [00:08<00:08,  4.22s/it]Loading checkpoint shards:  75%|███████▌  | 3/4 [00:11<00:03,  3.88s/it]Loading checkpoint shards:  75%|███████▌  | 3/4 [00:11<00:03,  3.92s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:12<00:00,  2.48s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:12<00:00,  3.02s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:12<00:00,  2.51s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:12<00:00,  3.07s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:11<00:03,  3.71s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:11<00:00,  2.38s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:11<00:00,  2.91s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:12<00:03,  3.93s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:12<00:00,  2.58s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:12<00:00,  3.14s/it]
2026-02-02 14:47:07.814 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:47:08.353 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:47:10.186 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (llm) from: ./ckpts/text_encoder
2026-02-02 14:47:10.227 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:47:10.597 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:47:10.790 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:47:10.832 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:47:10.920 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 129)
2026-02-02 14:47:10.938 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (llm) from: ./ckpts/text_encoder
2026-02-02 14:47:11.011 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 129
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 2
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 118800
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
2026-02-02 14:47:11.347 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:47:11.533 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:47:11.579 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:47:11.664 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 129)
2026-02-02 14:47:11.750 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 129
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 2
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 118800
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
2026-02-02 14:47:11.759 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:47:12.472 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (llm) from: ./ckpts/text_encoder
2026-02-02 14:47:12.873 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:47:13.093 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:47:13.134 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:47:13.220 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 129)
2026-02-02 14:47:13.306 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 129
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 2
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 118800
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
2026-02-02 14:47:14.102 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (llm) from: ./ckpts/text_encoder
2026-02-02 14:47:14.513 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:47:14.734 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:47:14.775 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:47:14.864 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 129)
2026-02-02 14:47:14.950 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 129
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 2
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 118800
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:602: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at /home/pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:663.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
  0%|          | 0/2 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:602: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at /home/pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:663.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
  0%|          | 0/2 [00:00<?, ?it/s]I0202 14:47:18.168341 25678 ProcessGroupNCCL.cpp:2291] [PG ID 6 PG GUID 31 Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.09102 ms
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:602: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at /home/pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:663.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
  0%|          | 0/2 [00:00<?, ?it/s]I0202 14:47:19.977463 25680 ProcessGroupNCCL.cpp:2291] [PG ID 6 PG GUID 31 Rank 2] ProcessGroupNCCL broadcast unique ID through store took 0.275849 ms
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:602: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at /home/pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:663.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
  0%|          | 0/2 [00:00<?, ?it/s]I0202 14:47:21.456985 25679 ProcessGroupNCCL.cpp:2291] [PG ID 6 PG GUID 31 Rank 1] ProcessGroupNCCL broadcast unique ID through store took 0.19415 ms
I0202 14:47:22.974701 25681 ProcessGroupNCCL.cpp:2291] [PG ID 6 PG GUID 31 Rank 3] ProcessGroupNCCL broadcast unique ID through store took 0.16515 ms
I0202 14:47:23.765651 25680 ProcessGroupNCCL.cpp:2330] [PG ID 6 PG GUID 31 Rank 2] ProcessGroupNCCL created ncclComm_ 0x555b6297b0f0 on CUDA device: 
I0202 14:47:23.765668 25680 ProcessGroupNCCL.cpp:2335] [PG ID 6 PG GUID 31 Rank 2] NCCL_DEBUG: N/A
I0202 14:47:23.765698 25681 ProcessGroupNCCL.cpp:2330] [PG ID 6 PG GUID 31 Rank 3] ProcessGroupNCCL created ncclComm_ 0x558521c9dec0 on CUDA device: 
I0202 14:47:23.765753 25681 ProcessGroupNCCL.cpp:2335] [PG ID 6 PG GUID 31 Rank 3] NCCL_DEBUG: N/A
I0202 14:47:23.765947 25678 ProcessGroupNCCL.cpp:2330] [PG ID 6 PG GUID 31 Rank 0] ProcessGroupNCCL created ncclComm_ 0x560a227239b0 on CUDA device:  
I0202 14:47:23.765985 25678 ProcessGroupNCCL.cpp:2335] [PG ID 6 PG GUID 31 Rank 0] NCCL_DEBUG: N/A
I0202 14:47:23.766160 25679 ProcessGroupNCCL.cpp:2330] [PG ID 6 PG GUID 31 Rank 1] ProcessGroupNCCL created ncclComm_ 0x55dcf96f9560 on CUDA device: 
I0202 14:47:23.766203 25679 ProcessGroupNCCL.cpp:2335] [PG ID 6 PG GUID 31 Rank 1] NCCL_DEBUG: N/A
I0202 14:47:44.134783 25678 ProcessGroupNCCL.cpp:2291] [PG ID 8 PG GUID 36 Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.11513 ms
I0202 14:47:44.134876 25679 ProcessGroupNCCL.cpp:2291] [PG ID 8 PG GUID 36 Rank 1] ProcessGroupNCCL broadcast unique ID through store took 0.365629 ms
I0202 14:47:44.134918 25680 ProcessGroupNCCL.cpp:2291] [PG ID 8 PG GUID 36 Rank 2] ProcessGroupNCCL broadcast unique ID through store took 0.724987 ms
I0202 14:47:44.134902 25681 ProcessGroupNCCL.cpp:2291] [PG ID 8 PG GUID 36 Rank 3] ProcessGroupNCCL broadcast unique ID through store took 0.352799 ms
I0202 14:47:44.765512 25679 ProcessGroupNCCL.cpp:2330] [PG ID 8 PG GUID 36 Rank 1] ProcessGroupNCCL created ncclComm_ 0x55dcf9983810 on CUDA device: 
I0202 14:47:44.765527 25679 ProcessGroupNCCL.cpp:2335] [PG ID 8 PG GUID 36 Rank 1] NCCL_DEBUG: N/A
I0202 14:47:44.765556 25680 ProcessGroupNCCL.cpp:2330] [PG ID 8 PG GUID 36 Rank 2] ProcessGroupNCCL created ncclComm_ 0x555b62c06b80 on CUDA device: 
I0202 14:47:44.765563 25678 ProcessGroupNCCL.cpp:2330] [PG ID 8 PG GUID 36 Rank 0] ProcessGroupNCCL created ncclComm_ 0x560a229b00f0 on CUDA device:  
I0202 14:47:44.765569 25680 ProcessGroupNCCL.cpp:2335] [PG ID 8 PG GUID 36 Rank 2] NCCL_DEBUG: N/A
I0202 14:47:44.765575 25678 ProcessGroupNCCL.cpp:2335] [PG ID 8 PG GUID 36 Rank 0] NCCL_DEBUG: N/A
I0202 14:47:44.765568 25681 ProcessGroupNCCL.cpp:2330] [PG ID 8 PG GUID 36 Rank 3] ProcessGroupNCCL created ncclComm_ 0x558521f34090 on CUDA device: 
I0202 14:47:44.765604 25681 ProcessGroupNCCL.cpp:2335] [PG ID 8 PG GUID 36 Rank 3] NCCL_DEBUG: N/A
 50%|█████     | 1/2 [00:26<00:26, 26.74s/it] 50%|█████     | 1/2 [00:28<00:28, 28.48s/it] 50%|█████     | 1/2 [00:23<00:23, 23.79s/it] 50%|█████     | 1/2 [00:25<00:25, 25.33s/it]100%|██████████| 2/2 [00:46<00:00, 22.88s/it]100%|██████████| 2/2 [00:48<00:00, 23.59s/it]100%|██████████| 2/2 [00:46<00:00, 23.46s/it]
100%|██████████| 2/2 [00:45<00:00, 22.30s/it]100%|██████████| 2/2 [00:48<00:00, 24.33s/it]
100%|██████████| 2/2 [00:43<00:00, 21.66s/it]100%|██████████| 2/2 [00:45<00:00, 22.75s/it]
100%|██████████| 2/2 [00:43<00:00, 21.98s/it]
2026-02-02 14:49:33.988 | INFO     | hyvideo.inference:predict:671 - Success, time: 142.2374656200409
2026-02-02 14:49:33.989 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 129)
2026-02-02 14:49:34.120 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 129
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 20
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 118800
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
2026-02-02 14:49:34.334 | INFO     | hyvideo.inference:predict:671 - Success, time: 143.3227183818817
2026-02-02 14:49:34.335 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 129)
  0%|          | 0/20 [00:00<?, ?it/s]2026-02-02 14:49:34.370 | INFO     | hyvideo.inference:predict:671 - Success, time: 139.41966152191162
2026-02-02 14:49:34.371 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 129)
2026-02-02 14:49:34.426 | INFO     | hyvideo.inference:predict:671 - Success, time: 141.11875224113464
2026-02-02 14:49:34.426 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 129)
2026-02-02 14:49:34.485 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 129
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 20
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 118800
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
2026-02-02 14:49:34.498 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 129
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 20
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 118800
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
2026-02-02 14:49:34.551 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 129
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 20
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 118800
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
  0%|          | 0/20 [00:00<?, ?it/s]  0%|          | 0/20 [00:00<?, ?it/s]  0%|          | 0/20 [00:00<?, ?it/s]  5%|▌         | 1/20 [00:20<06:30, 20.55s/it]  5%|▌         | 1/20 [00:20<06:29, 20.48s/it]  5%|▌         | 1/20 [00:20<06:37, 20.93s/it]  5%|▌         | 1/20 [00:20<06:30, 20.54s/it] 10%|█         | 2/20 [00:40<06:05, 20.30s/it] 10%|█         | 2/20 [00:40<06:04, 20.27s/it] 10%|█         | 2/20 [00:40<06:05, 20.29s/it] 10%|█         | 2/20 [00:41<06:08, 20.45s/it] 15%|█▌        | 3/20 [01:00<05:44, 20.24s/it] 15%|█▌        | 3/20 [01:00<05:43, 20.22s/it] 15%|█▌        | 3/20 [01:00<05:43, 20.23s/it] 15%|█▌        | 3/20 [01:01<05:45, 20.32s/it] 20%|██        | 4/20 [01:21<05:24, 20.31s/it] 20%|██        | 4/20 [01:21<05:24, 20.31s/it] 20%|██        | 4/20 [01:21<05:24, 20.30s/it] 20%|██        | 4/20 [01:21<05:25, 20.36s/it] 25%|██▌       | 5/20 [01:41<05:03, 20.25s/it] 25%|██▌       | 5/20 [01:41<05:03, 20.24s/it] 25%|██▌       | 5/20 [01:41<05:03, 20.25s/it] 25%|██▌       | 5/20 [01:41<05:04, 20.28s/it] 30%|███       | 6/20 [02:01<04:43, 20.23s/it] 30%|███       | 6/20 [02:01<04:43, 20.25s/it] 30%|███       | 6/20 [02:01<04:43, 20.23s/it] 30%|███       | 6/20 [02:01<04:43, 20.23s/it] 35%|███▌      | 7/20 [02:21<04:23, 20.28s/it] 35%|███▌      | 7/20 [02:21<04:23, 20.28s/it] 35%|███▌      | 7/20 [02:21<04:23, 20.28s/it] 35%|███▌      | 7/20 [02:22<04:23, 20.29s/it] 40%|████      | 8/20 [02:42<04:02, 20.25s/it] 40%|████      | 8/20 [02:42<04:02, 20.25s/it] 40%|████      | 8/20 [02:42<04:02, 20.25s/it] 40%|████      | 8/20 [02:42<04:03, 20.26s/it] 45%|████▌     | 9/20 [03:02<03:42, 20.25s/it] 45%|████▌     | 9/20 [03:02<03:42, 20.25s/it] 45%|████▌     | 9/20 [03:02<03:42, 20.25s/it] 45%|████▌     | 9/20 [03:02<03:42, 20.25s/it] 50%|█████     | 10/20 [03:22<03:22, 20.26s/it] 50%|█████     | 10/20 [03:22<03:22, 20.25s/it] 50%|█████     | 10/20 [03:22<03:22, 20.26s/it] 50%|█████     | 10/20 [03:23<03:22, 20.26s/it] 55%|█████▌    | 11/20 [03:42<03:02, 20.23s/it] 55%|█████▌    | 11/20 [03:42<03:02, 20.23s/it] 55%|█████▌    | 11/20 [03:42<03:02, 20.23s/it] 55%|█████▌    | 11/20 [03:43<03:02, 20.23s/it] 60%|██████    | 12/20 [04:03<02:41, 20.21s/it] 60%|██████    | 12/20 [04:02<02:41, 20.21s/it] 60%|██████    | 12/20 [04:02<02:41, 20.21s/it] 60%|██████    | 12/20 [04:03<02:41, 20.21s/it] 65%|██████▌   | 13/20 [04:23<02:21, 20.20s/it] 65%|██████▌   | 13/20 [04:23<02:21, 20.20s/it] 65%|██████▌   | 13/20 [04:23<02:21, 20.20s/it] 65%|██████▌   | 13/20 [04:23<02:21, 20.20s/it] 70%|███████   | 14/20 [04:43<02:01, 20.21s/it] 70%|███████   | 14/20 [04:43<02:01, 20.21s/it] 70%|███████   | 14/20 [04:43<02:01, 20.21s/it] 70%|███████   | 14/20 [04:43<02:01, 20.21s/it] 75%|███████▌  | 15/20 [05:04<01:41, 20.24s/it] 75%|███████▌  | 15/20 [05:03<01:41, 20.24s/it] 75%|███████▌  | 15/20 [05:03<01:41, 20.24s/it] 75%|███████▌  | 15/20 [05:03<01:41, 20.24s/it] 80%|████████  | 16/20 [05:24<01:21, 20.26s/it] 80%|████████  | 16/20 [05:24<01:21, 20.26s/it] 80%|████████  | 16/20 [05:23<01:21, 20.26s/it] 80%|████████  | 16/20 [05:24<01:21, 20.26s/it] 85%|████████▌ | 17/20 [05:44<01:00, 20.22s/it] 85%|████████▌ | 17/20 [05:44<01:00, 20.22s/it] 85%|████████▌ | 17/20 [05:44<01:00, 20.22s/it] 85%|████████▌ | 17/20 [05:44<01:00, 20.22s/it] 90%|█████████ | 18/20 [06:04<00:40, 20.20s/it] 90%|█████████ | 18/20 [06:04<00:40, 20.20s/it] 90%|█████████ | 18/20 [06:04<00:40, 20.20s/it] 90%|█████████ | 18/20 [06:04<00:40, 20.20s/it] 95%|█████████▌| 19/20 [06:24<00:20, 20.18s/it] 95%|█████████▌| 19/20 [06:24<00:20, 20.18s/it] 95%|█████████▌| 19/20 [06:24<00:20, 20.18s/it] 95%|█████████▌| 19/20 [06:24<00:20, 20.18s/it]100%|██████████| 20/20 [06:44<00:00, 20.17s/it]100%|██████████| 20/20 [06:44<00:00, 20.17s/it]100%|██████████| 20/20 [06:44<00:00, 20.17s/it]100%|██████████| 20/20 [06:44<00:00, 20.25s/it]100%|██████████| 20/20 [06:44<00:00, 20.17s/it]
100%|██████████| 20/20 [06:44<00:00, 20.23s/it]
100%|██████████| 20/20 [06:44<00:00, 20.23s/it]
100%|██████████| 20/20 [06:44<00:00, 20.23s/it]
2026-02-02 14:57:47.664 | INFO     | hyvideo.inference:predict:671 - Success, time: 493.1126627922058
2026-02-02 14:57:47.718 | INFO     | hyvideo.inference:predict:671 - Success, time: 493.59782361984253
2026-02-02 14:57:47.891 | INFO     | hyvideo.inference:predict:671 - Success, time: 493.405969619751
2026-02-02 14:57:47.899 | INFO     | hyvideo.inference:predict:671 - Success, time: 493.40060687065125
I0202 14:57:48.608788 25679 ProcessGroupNCCL.cpp:1275] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL destructor entered.
I0202 14:57:48.608850 25679 ProcessGroupNCCL.cpp:1259] [PG ID 0 PG GUID 0 Rank 1] Launching ProcessGroupNCCL abort asynchrounously.
I0202 14:57:48.609077 25679 ProcessGroupNCCL.cpp:1145] [PG ID 0 PG GUID 0 Rank 1] future is successfully executed for: ProcessGroup abort
I0202 14:57:48.609086 25679 ProcessGroupNCCL.cpp:1266] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL aborts successfully.
I0202 14:57:48.609093 25679 ProcessGroupNCCL.cpp:1296] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL watchdog thread joined.
I0202 14:57:48.609217 25679 ProcessGroupNCCL.cpp:1300] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL heart beat monitor thread joined.
I0202 14:57:48.794883 25680 ProcessGroupNCCL.cpp:1275] [PG ID 0 PG GUID 0 Rank 2] ProcessGroupNCCL destructor entered.
I0202 14:57:48.794955 25680 ProcessGroupNCCL.cpp:1259] [PG ID 0 PG GUID 0 Rank 2] Launching ProcessGroupNCCL abort asynchrounously.
I0202 14:57:48.795169 25680 ProcessGroupNCCL.cpp:1145] [PG ID 0 PG GUID 0 Rank 2] future is successfully executed for: ProcessGroup abort
I0202 14:57:48.795177 25680 ProcessGroupNCCL.cpp:1266] [PG ID 0 PG GUID 0 Rank 2] ProcessGroupNCCL aborts successfully.
I0202 14:57:48.795186 25680 ProcessGroupNCCL.cpp:1296] [PG ID 0 PG GUID 0 Rank 2] ProcessGroupNCCL watchdog thread joined.
I0202 14:57:48.795332 25680 ProcessGroupNCCL.cpp:1300] [PG ID 0 PG GUID 0 Rank 2] ProcessGroupNCCL heart beat monitor thread joined.
I0202 14:57:49.184105 25681 ProcessGroupNCCL.cpp:1275] [PG ID 0 PG GUID 0 Rank 3] ProcessGroupNCCL destructor entered.
I0202 14:57:49.184168 25681 ProcessGroupNCCL.cpp:1259] [PG ID 0 PG GUID 0 Rank 3] Launching ProcessGroupNCCL abort asynchrounously.
I0202 14:57:49.184396 25681 ProcessGroupNCCL.cpp:1145] [PG ID 0 PG GUID 0 Rank 3] future is successfully executed for: ProcessGroup abort
I0202 14:57:49.184404 25681 ProcessGroupNCCL.cpp:1266] [PG ID 0 PG GUID 0 Rank 3] ProcessGroupNCCL aborts successfully.
I0202 14:57:49.184415 25681 ProcessGroupNCCL.cpp:1296] [PG ID 0 PG GUID 0 Rank 3] ProcessGroupNCCL watchdog thread joined.
I0202 14:57:49.184533 25681 ProcessGroupNCCL.cpp:1300] [PG ID 0 PG GUID 0 Rank 3] ProcessGroupNCCL heart beat monitor thread joined.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
2026-02-02 14:57:51.434 | INFO     | __main__:main:72 - Sample save to: ./results/2026-02-02-14:57:48_seed42_A cat walks on the grass, realistic style..mp4
I0202 14:57:52.289321 25678 ProcessGroupNCCL.cpp:1275] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL destructor entered.
W0202 14:57:52.289395 25678 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
I0202 14:57:52.289413 25678 ProcessGroupNCCL.cpp:1259] [PG ID 0 PG GUID 0 Rank 0] Launching ProcessGroupNCCL abort asynchrounously.
I0202 14:57:52.289623 25678 ProcessGroupNCCL.cpp:1145] [PG ID 0 PG GUID 0 Rank 0] future is successfully executed for: ProcessGroup abort
I0202 14:57:52.289633 25678 ProcessGroupNCCL.cpp:1266] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL aborts successfully.
I0202 14:57:52.289650 25678 ProcessGroupNCCL.cpp:1296] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL watchdog thread joined.
I0202 14:57:52.289772 25678 ProcessGroupNCCL.cpp:1300] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL heart beat monitor thread joined.