W0202 14:17:25.792000 12442 lib/python3.10/dist-packages/torch/distributed/run.py:793] 
W0202 14:17:25.792000 12442 lib/python3.10/dist-packages/torch/distributed/run.py:793] *****************************************
W0202 14:17:25.792000 12442 lib/python3.10/dist-packages/torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W0202 14:17:25.792000 12442 lib/python3.10/dist-packages/torch/distributed/run.py:793] *****************************************
Namespace(model='HYVideo-T/2-cfgdistill', latent_channels=16, precision='bf16', rope_theta=256, vae='884-16c-hy', vae_precision='fp16', vae_tiling=True, text_encoder='llm', text_encoder_precision='fp16', text_states_dim=4096, text_len=256, tokenizer='llm', prompt_template='dit-llm-encode', prompt_template_video='dit-llm-encode-video', hidden_state_skip_layer=2, apply_final_norm=False, text_encoder_2='clipL', text_encoder_precision_2='fp16', text_states_dim_2=768, tokenizer_2='clipL', text_len_2=77, denoise_type='flow', flow_shift=7.0, flow_reverse=True, flow_solver='euler', use_linear_quadratic_schedule=False, linear_schedule_end=25, model_base='ckpts', dit_weight='ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt', model_resolution='540p', load_key='module', use_cpu_offload=False, batch_size=1, infer_steps=20, disable_autocast=False, save_path='./results', save_path_suffix='', name_suffix='', num_videos=1, video_size=[1280, 720], video_length=33, prompt='A cat walks on the grass, realistic style.', seed_type='auto', seed=42, neg_prompt=None, cfg_scale=1.0, embedded_cfg_scale=6.0, use_fp8=False, reproduce=False, ulysses_degree=4, ring_degree=1)
2026-02-02 14:17:30.641 | INFO     | hyvideo.inference:from_pretrained:154 - Got text-to-video model root path: ckpts
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0202 14:17:30.656993 12478 ProcessGroupNCCL.cpp:934] [PG ID 0 PG GUID 0 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0202 14:17:30.657011 12478 ProcessGroupNCCL.cpp:943] [PG ID 0 PG GUID 0 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
DEBUG 02-02 14:17:30 [parallel_state.py:200] world_size=4 rank=2 local_rank=-1 distributed_init_method=env:// backend=nccl
I0202 14:17:30.657588 12478 ProcessGroupNCCL.cpp:934] [PG ID 1 PG GUID 1 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x5583223e2ae0, SPLIT_COLOR: 1008299991543067201, PG Name: 1
I0202 14:17:30.657598 12478 ProcessGroupNCCL.cpp:943] [PG ID 1 PG GUID 1 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
Namespace(model='HYVideo-T/2-cfgdistill', latent_channels=16, precision='bf16', rope_theta=256, vae='884-16c-hy', vae_precision='fp16', vae_tiling=True, text_encoder='llm', text_encoder_precision='fp16', text_states_dim=4096, text_len=256, tokenizer='llm', prompt_template='dit-llm-encode', prompt_template_video='dit-llm-encode-video', hidden_state_skip_layer=2, apply_final_norm=False, text_encoder_2='clipL', text_encoder_precision_2='fp16', text_states_dim_2=768, tokenizer_2='clipL', text_len_2=77, denoise_type='flow', flow_shift=7.0, flow_reverse=True, flow_solver='euler', use_linear_quadratic_schedule=False, linear_schedule_end=25, model_base='ckpts', dit_weight='ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt', model_resolution='540p', load_key='module', use_cpu_offload=False, batch_size=1, infer_steps=20, disable_autocast=False, save_path='./results', save_path_suffix='', name_suffix='', num_videos=1, video_size=[1280, 720], video_length=33, prompt='A cat walks on the grass, realistic style.', seed_type='auto', seed=42, neg_prompt=None, cfg_scale=1.0, embedded_cfg_scale=6.0, use_fp8=False, reproduce=False, ulysses_degree=4, ring_degree=1)
2026-02-02 14:17:30.683 | INFO     | hyvideo.inference:from_pretrained:154 - Got text-to-video model root path: ckpts
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0202 14:17:30.697316 12477 ProcessGroupNCCL.cpp:934] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0202 14:17:30.697335 12477 ProcessGroupNCCL.cpp:943] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
DEBUG 02-02 14:17:30 [parallel_state.py:200] world_size=4 rank=1 local_rank=-1 distributed_init_method=env:// backend=nccl
I0202 14:17:30.697809 12477 ProcessGroupNCCL.cpp:934] [PG ID 1 PG GUID 1 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x55ad816c3c10, SPLIT_COLOR: 1008299991543067201, PG Name: 1
I0202 14:17:30.697819 12477 ProcessGroupNCCL.cpp:943] [PG ID 1 PG GUID 1 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
Namespace(model='HYVideo-T/2-cfgdistill', latent_channels=16, precision='bf16', rope_theta=256, vae='884-16c-hy', vae_precision='fp16', vae_tiling=True, text_encoder='llm', text_encoder_precision='fp16', text_states_dim=4096, text_len=256, tokenizer='llm', prompt_template='dit-llm-encode', prompt_template_video='dit-llm-encode-video', hidden_state_skip_layer=2, apply_final_norm=False, text_encoder_2='clipL', text_encoder_precision_2='fp16', text_states_dim_2=768, tokenizer_2='clipL', text_len_2=77, denoise_type='flow', flow_shift=7.0, flow_reverse=True, flow_solver='euler', use_linear_quadratic_schedule=False, linear_schedule_end=25, model_base='ckpts', dit_weight='ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt', model_resolution='540p', load_key='module', use_cpu_offload=False, batch_size=1, infer_steps=20, disable_autocast=False, save_path='./results', save_path_suffix='', name_suffix='', num_videos=1, video_size=[1280, 720], video_length=33, prompt='A cat walks on the grass, realistic style.', seed_type='auto', seed=42, neg_prompt=None, cfg_scale=1.0, embedded_cfg_scale=6.0, use_fp8=False, reproduce=False, ulysses_degree=4, ring_degree=1)
2026-02-02 14:17:30.740 | INFO     | hyvideo.inference:from_pretrained:154 - Got text-to-video model root path: ckpts
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0202 14:17:30.754350 12479 ProcessGroupNCCL.cpp:934] [PG ID 0 PG GUID 0 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0202 14:17:30.754374 12479 ProcessGroupNCCL.cpp:943] [PG ID 0 PG GUID 0 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
DEBUG 02-02 14:17:30 [parallel_state.py:200] world_size=4 rank=3 local_rank=-1 distributed_init_method=env:// backend=nccl
I0202 14:17:30.754978 12479 ProcessGroupNCCL.cpp:934] [PG ID 1 PG GUID 1 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x5645d5e72940, SPLIT_COLOR: 1008299991543067201, PG Name: 1
I0202 14:17:30.754987 12479 ProcessGroupNCCL.cpp:943] [PG ID 1 PG GUID 1 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
Namespace(model='HYVideo-T/2-cfgdistill', latent_channels=16, precision='bf16', rope_theta=256, vae='884-16c-hy', vae_precision='fp16', vae_tiling=True, text_encoder='llm', text_encoder_precision='fp16', text_states_dim=4096, text_len=256, tokenizer='llm', prompt_template='dit-llm-encode', prompt_template_video='dit-llm-encode-video', hidden_state_skip_layer=2, apply_final_norm=False, text_encoder_2='clipL', text_encoder_precision_2='fp16', text_states_dim_2=768, tokenizer_2='clipL', text_len_2=77, denoise_type='flow', flow_shift=7.0, flow_reverse=True, flow_solver='euler', use_linear_quadratic_schedule=False, linear_schedule_end=25, model_base='ckpts', dit_weight='ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt', model_resolution='540p', load_key='module', use_cpu_offload=False, batch_size=1, infer_steps=20, disable_autocast=False, save_path='./results', save_path_suffix='', name_suffix='', num_videos=1, video_size=[1280, 720], video_length=33, prompt='A cat walks on the grass, realistic style.', seed_type='auto', seed=42, neg_prompt=None, cfg_scale=1.0, embedded_cfg_scale=6.0, use_fp8=False, reproduce=False, ulysses_degree=4, ring_degree=1)
2026-02-02 14:17:30.937 | INFO     | hyvideo.inference:from_pretrained:154 - Got text-to-video model root path: ckpts
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0202 14:17:30.950295 12476 ProcessGroupNCCL.cpp:934] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0202 14:17:30.950317 12476 ProcessGroupNCCL.cpp:943] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
DEBUG 02-02 14:17:30 [parallel_state.py:200] world_size=4 rank=0 local_rank=-1 distributed_init_method=env:// backend=nccl
I0202 14:17:30.950915 12476 ProcessGroupNCCL.cpp:934] [PG ID 1 PG GUID 1 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x557384e10e20, SPLIT_COLOR: 1008299991543067201, PG Name: 1
I0202 14:17:30.950924 12476 ProcessGroupNCCL.cpp:943] [PG ID 1 PG GUID 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.957278 12476 ProcessGroupNCCL.cpp:934] [PG ID 2 PG GUID 3 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 3
I0202 14:17:30.957289 12476 ProcessGroupNCCL.cpp:943] [PG ID 2 PG GUID 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.957341 12477 ProcessGroupNCCL.cpp:934] [PG ID 2 PG GUID 5 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 5
I0202 14:17:30.957355 12477 ProcessGroupNCCL.cpp:943] [PG ID 2 PG GUID 5 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.958575 12479 ProcessGroupNCCL.cpp:934] [PG ID 2 PG GUID 9 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 9
I0202 14:17:30.958618 12479 ProcessGroupNCCL.cpp:943] [PG ID 2 PG GUID 9 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.958647 12476 ProcessGroupNCCL.cpp:934] [PG ID 3 PG GUID 11 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 11
I0202 14:17:30.958657 12476 ProcessGroupNCCL.cpp:943] [PG ID 3 PG GUID 11 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.958659 12477 ProcessGroupNCCL.cpp:934] [PG ID 3 PG GUID 13 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 13
I0202 14:17:30.958671 12477 ProcessGroupNCCL.cpp:943] [PG ID 3 PG GUID 13 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.960106 12477 ProcessGroupNCCL.cpp:934] [PG ID 4 PG GUID 21 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 21
I0202 14:17:30.960116 12477 ProcessGroupNCCL.cpp:943] [PG ID 4 PG GUID 21 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.960223 12476 ProcessGroupNCCL.cpp:934] [PG ID 4 PG GUID 19 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 19
I0202 14:17:30.960233 12476 ProcessGroupNCCL.cpp:943] [PG ID 4 PG GUID 19 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.960553 12479 ProcessGroupNCCL.cpp:934] [PG ID 3 PG GUID 17 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 17
I0202 14:17:30.960573 12479 ProcessGroupNCCL.cpp:943] [PG ID 3 PG GUID 17 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.961128 12477 ProcessGroupNCCL.cpp:934] [PG ID 5 PG GUID 28 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 28
I0202 14:17:30.961139 12477 ProcessGroupNCCL.cpp:943] [PG ID 5 PG GUID 28 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.961426 12477 ProcessGroupNCCL.cpp:934] [PG ID 6 PG GUID 31 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x55ad816c3c10, SPLIT_COLOR: 1008299991543067201, PG Name: 31
I0202 14:17:30.961436 12477 ProcessGroupNCCL.cpp:943] [PG ID 6 PG GUID 31 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.961640 12477 ProcessGroupNCCL.cpp:934] [PG ID 7 PG GUID 33 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 33
I0202 14:17:30.961649 12477 ProcessGroupNCCL.cpp:943] [PG ID 7 PG GUID 33 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.961952 12477 ProcessGroupNCCL.cpp:934] [PG ID 8 PG GUID 36 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x55ad816c3c10, SPLIT_COLOR: 1008299991543067201, PG Name: 36
I0202 14:17:30.961962 12477 ProcessGroupNCCL.cpp:943] [PG ID 8 PG GUID 36 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.962002 12479 ProcessGroupNCCL.cpp:934] [PG ID 4 PG GUID 25 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 25
I0202 14:17:30.962016 12479 ProcessGroupNCCL.cpp:943] [PG ID 4 PG GUID 25 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.962226 12476 ProcessGroupNCCL.cpp:934] [PG ID 5 PG GUID 27 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 27
I0202 14:17:30.962236 12476 ProcessGroupNCCL.cpp:943] [PG ID 5 PG GUID 27 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.962574 12476 ProcessGroupNCCL.cpp:934] [PG ID 6 PG GUID 31 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x557384e10e20, SPLIT_COLOR: 1008299991543067201, PG Name: 31
I0202 14:17:30.962584 12476 ProcessGroupNCCL.cpp:943] [PG ID 6 PG GUID 31 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.962800 12476 ProcessGroupNCCL.cpp:934] [PG ID 7 PG GUID 32 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 32
I0202 14:17:30.962809 12476 ProcessGroupNCCL.cpp:943] [PG ID 7 PG GUID 32 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.963151 12476 ProcessGroupNCCL.cpp:934] [PG ID 8 PG GUID 36 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x557384e10e20, SPLIT_COLOR: 1008299991543067201, PG Name: 36
I0202 14:17:30.963160 12476 ProcessGroupNCCL.cpp:943] [PG ID 8 PG GUID 36 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.963318 12479 ProcessGroupNCCL.cpp:934] [PG ID 5 PG GUID 30 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 30
I0202 14:17:30.963336 12479 ProcessGroupNCCL.cpp:943] [PG ID 5 PG GUID 30 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.963627 12479 ProcessGroupNCCL.cpp:934] [PG ID 6 PG GUID 31 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x5645d5e72940, SPLIT_COLOR: 1008299991543067201, PG Name: 31
I0202 14:17:30.963651 12479 ProcessGroupNCCL.cpp:943] [PG ID 6 PG GUID 31 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.963711 12478 ProcessGroupNCCL.cpp:934] [PG ID 2 PG GUID 7 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 7
I0202 14:17:30.963738 12478 ProcessGroupNCCL.cpp:943] [PG ID 2 PG GUID 7 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.964040 12479 ProcessGroupNCCL.cpp:934] [PG ID 7 PG GUID 35 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 35
I0202 14:17:30.964059 12479 ProcessGroupNCCL.cpp:943] [PG ID 7 PG GUID 35 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.964432 12479 ProcessGroupNCCL.cpp:934] [PG ID 8 PG GUID 36 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x5645d5e72940, SPLIT_COLOR: 1008299991543067201, PG Name: 36
I0202 14:17:30.964452 12479 ProcessGroupNCCL.cpp:943] [PG ID 8 PG GUID 36 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.965525 12478 ProcessGroupNCCL.cpp:934] [PG ID 3 PG GUID 15 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 15
I0202 14:17:30.965536 12478 ProcessGroupNCCL.cpp:943] [PG ID 3 PG GUID 15 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.966742 12478 ProcessGroupNCCL.cpp:934] [PG ID 4 PG GUID 23 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 23
I0202 14:17:30.966751 12478 ProcessGroupNCCL.cpp:943] [PG ID 4 PG GUID 23 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.967670 12478 ProcessGroupNCCL.cpp:934] [PG ID 5 PG GUID 29 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 29
I0202 14:17:30.967681 12478 ProcessGroupNCCL.cpp:943] [PG ID 5 PG GUID 29 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.967969 12478 ProcessGroupNCCL.cpp:934] [PG ID 6 PG GUID 31 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x5583223e2ae0, SPLIT_COLOR: 1008299991543067201, PG Name: 31
I0202 14:17:30.967981 12478 ProcessGroupNCCL.cpp:943] [PG ID 6 PG GUID 31 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.968223 12478 ProcessGroupNCCL.cpp:934] [PG ID 7 PG GUID 34 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 34
I0202 14:17:30.968233 12478 ProcessGroupNCCL.cpp:943] [PG ID 7 PG GUID 34 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.968502 12478 ProcessGroupNCCL.cpp:934] [PG ID 8 PG GUID 36 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x5583223e2ae0, SPLIT_COLOR: 1008299991543067201, PG Name: 36
I0202 14:17:30.968514 12478 ProcessGroupNCCL.cpp:943] [PG ID 8 PG GUID 36 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.969952 12477 ProcessGroupNCCL.cpp:934] [PG ID 9 PG GUID 40 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 40
I0202 14:17:30.969964 12477 ProcessGroupNCCL.cpp:943] [PG ID 9 PG GUID 40 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.970055 12479 ProcessGroupNCCL.cpp:934] [PG ID 9 PG GUID 44 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 44
I0202 14:17:30.970070 12479 ProcessGroupNCCL.cpp:943] [PG ID 9 PG GUID 44 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.970217 12478 ProcessGroupNCCL.cpp:934] [PG ID 9 PG GUID 42 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 42
I0202 14:17:30.970228 12478 ProcessGroupNCCL.cpp:943] [PG ID 9 PG GUID 42 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:30.970767 12476 ProcessGroupNCCL.cpp:934] [PG ID 9 PG GUID 38 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 38
I0202 14:17:30.970777 12476 ProcessGroupNCCL.cpp:943] [PG ID 9 PG GUID 38 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:31.002758 12479 ProcessGroupNCCL.cpp:934] [PG ID 10 PG GUID 46 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x5645d5e72940, SPLIT_COLOR: 1008299991543067201, PG Name: 46
I0202 14:17:31.002769 12479 ProcessGroupNCCL.cpp:943] [PG ID 10 PG GUID 46 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:31.002895 12477 ProcessGroupNCCL.cpp:934] [PG ID 10 PG GUID 46 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x55ad816c3c10, SPLIT_COLOR: 1008299991543067201, PG Name: 46
I0202 14:17:31.002907 12477 ProcessGroupNCCL.cpp:943] [PG ID 10 PG GUID 46 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
I0202 14:17:31.003078 12476 ProcessGroupNCCL.cpp:934] [PG ID 10 PG GUID 46 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x557384e10e20, SPLIT_COLOR: 1008299991543067201, PG Name: 46
I0202 14:17:31.003089 12476 ProcessGroupNCCL.cpp:943] [PG ID 10 PG GUID 46 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
2026-02-02 14:17:31.003 | INFO     | hyvideo.inference:from_pretrained:189 - Building model...
2026-02-02 14:17:31.002 | INFO     | hyvideo.inference:from_pretrained:189 - Building model...
2026-02-02 14:17:31.003 | INFO     | hyvideo.inference:from_pretrained:189 - Building model...
I0202 14:17:31.003579 12478 ProcessGroupNCCL.cpp:934] [PG ID 10 PG GUID 46 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0x5583223e2ae0, SPLIT_COLOR: 1008299991543067201, PG Name: 46
I0202 14:17:31.003589 12478 ProcessGroupNCCL.cpp:943] [PG ID 10 PG GUID 46 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.22.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 1, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_CUDA_EVENT_CACHE: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
2026-02-02 14:17:31.003 | INFO     | hyvideo.inference:from_pretrained:189 - Building model...
2026-02-02 14:17:31.584 | INFO     | hyvideo.inference:load_state_dict:340 - Loading torch model ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt...
/workspace/cicd/HunyuanVideo-t2v/hyvideo/inference.py:341: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
2026-02-02 14:17:31.587 | INFO     | hyvideo.inference:load_state_dict:340 - Loading torch model ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt...
/workspace/cicd/HunyuanVideo-t2v/hyvideo/inference.py:341: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
2026-02-02 14:17:31.600 | INFO     | hyvideo.inference:load_state_dict:340 - Loading torch model ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt...
/workspace/cicd/HunyuanVideo-t2v/hyvideo/inference.py:341: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
2026-02-02 14:17:31.712 | INFO     | hyvideo.inference:load_state_dict:340 - Loading torch model ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt...
/workspace/cicd/HunyuanVideo-t2v/hyvideo/inference.py:341: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(model_path, map_location=lambda storage, loc: storage)
2026-02-02 14:17:46.074 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-t2v-720p/vae
2026-02-02 14:17:46.216 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-t2v-720p/vae
/workspace/cicd/HunyuanVideo-t2v/hyvideo/vae/__init__.py:39: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(vae_ckpt, map_location=vae.device)
/workspace/cicd/HunyuanVideo-t2v/hyvideo/vae/__init__.py:39: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(vae_ckpt, map_location=vae.device)
2026-02-02 14:17:48.094 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
2026-02-02 14:17:48.096 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-t2v-720p/vae
2026-02-02 14:17:48.223 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (llm) from: ./ckpts/text_encoder
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
2026-02-02 14:17:48.311 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]2026-02-02 14:17:48.449 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (llm) from: ./ckpts/text_encoder
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]/workspace/cicd/HunyuanVideo-t2v/hyvideo/vae/__init__.py:39: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(vae_ckpt, map_location=vae.device)
2026-02-02 14:17:50.314 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
2026-02-02 14:17:50.461 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (llm) from: ./ckpts/text_encoder
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]2026-02-02 14:17:51.299 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-t2v-720p/vae
Loading checkpoint shards:  25%|██▌       | 1/4 [00:03<00:10,  3.65s/it]Loading checkpoint shards:  25%|██▌       | 1/4 [00:03<00:10,  3.59s/it]/workspace/cicd/HunyuanVideo-t2v/hyvideo/vae/__init__.py:39: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(vae_ckpt, map_location=vae.device)
2026-02-02 14:17:53.322 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
2026-02-02 14:17:53.468 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (llm) from: ./ckpts/text_encoder
Using the `SDPA` attention implementation on multi-gpu setup with ROCM may lead to performance issues due to the FA backend. Disabling it to use alternative backends.
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]Loading checkpoint shards:  25%|██▌       | 1/4 [00:03<00:11,  3.81s/it]Loading checkpoint shards:  50%|█████     | 2/4 [00:07<00:07,  3.63s/it]Loading checkpoint shards:  50%|█████     | 2/4 [00:07<00:07,  3.58s/it]Loading checkpoint shards:  25%|██▌       | 1/4 [00:03<00:10,  3.63s/it]Loading checkpoint shards:  50%|█████     | 2/4 [00:07<00:07,  3.79s/it]Loading checkpoint shards:  75%|███████▌  | 3/4 [00:10<00:03,  3.50s/it]Loading checkpoint shards:  75%|███████▌  | 3/4 [00:10<00:03,  3.43s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00,  2.26s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00,  2.74s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00,  2.21s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00,  2.69s/it]
Loading checkpoint shards:  50%|█████     | 2/4 [00:07<00:07,  3.59s/it]Loading checkpoint shards:  75%|███████▌  | 3/4 [00:11<00:03,  3.79s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:11<00:00,  2.44s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:11<00:00,  2.93s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:10<00:03,  3.40s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00,  2.19s/it]Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00,  2.68s/it]
2026-02-02 14:18:05.568 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:18:05.602 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:18:08.076 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (llm) from: ./ckpts/text_encoder
2026-02-02 14:18:08.094 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (llm) from: ./ckpts/text_encoder
2026-02-02 14:18:08.489 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:18:08.502 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:18:08.682 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:18:08.683 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:18:08.691 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:18:08.719 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:18:08.726 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:18:08.807 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 33)
2026-02-02 14:18:08.811 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 33)
2026-02-02 14:18:08.834 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 33
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 2
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 32400
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
2026-02-02 14:18:08.837 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 33
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 2
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 32400
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
2026-02-02 14:18:10.561 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:18:10.974 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (llm) from: ./ckpts/text_encoder
2026-02-02 14:18:11.448 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:18:11.666 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:18:11.728 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
2026-02-02 14:18:11.825 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 33)
2026-02-02 14:18:11.859 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 33
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 2
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 32400
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
2026-02-02 14:18:13.558 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (llm) from: ./ckpts/text_encoder
2026-02-02 14:18:14.041 | INFO     | hyvideo.text_encoder:load_text_encoder:28 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:602: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at /home/pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:663.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
2026-02-02 14:18:14.259 | INFO     | hyvideo.text_encoder:load_text_encoder:50 - Text encoder to dtype: torch.float16
2026-02-02 14:18:14.323 | INFO     | hyvideo.text_encoder:load_tokenizer:64 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
  0%|          | 0/2 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:602: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at /home/pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:663.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
2026-02-02 14:18:14.417 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 33)
2026-02-02 14:18:14.451 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 33
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 2
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 32400
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
  0%|          | 0/2 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:602: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at /home/pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:663.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
  0%|          | 0/2 [00:00<?, ?it/s]I0202 14:18:19.577989 12476 ProcessGroupNCCL.cpp:2291] [PG ID 6 PG GUID 31 Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.09373 ms
I0202 14:18:19.579623 12478 ProcessGroupNCCL.cpp:2291] [PG ID 6 PG GUID 31 Rank 2] ProcessGroupNCCL broadcast unique ID through store took 3142.62 ms
I0202 14:18:19.583997 12477 ProcessGroupNCCL.cpp:2291] [PG ID 6 PG GUID 31 Rank 1] ProcessGroupNCCL broadcast unique ID through store took 3309.86 ms
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:602: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at /home/pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:663.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
  0%|          | 0/2 [00:00<?, ?it/s]I0202 14:18:22.561215 12479 ProcessGroupNCCL.cpp:2291] [PG ID 6 PG GUID 31 Rank 3] ProcessGroupNCCL broadcast unique ID through store took 0.166879 ms
I0202 14:18:23.305327 12479 ProcessGroupNCCL.cpp:2330] [PG ID 6 PG GUID 31 Rank 3] ProcessGroupNCCL created ncclComm_ 0x5645fcc9ee10 on CUDA device: 
I0202 14:18:23.305343 12479 ProcessGroupNCCL.cpp:2335] [PG ID 6 PG GUID 31 Rank 3] NCCL_DEBUG: N/A
I0202 14:18:23.305447 12478 ProcessGroupNCCL.cpp:2330] [PG ID 6 PG GUID 31 Rank 2] ProcessGroupNCCL created ncclComm_ 0x558371734190 on CUDA device: 
I0202 14:18:23.305467 12478 ProcessGroupNCCL.cpp:2335] [PG ID 6 PG GUID 31 Rank 2] NCCL_DEBUG: N/A
I0202 14:18:23.305480 12476 ProcessGroupNCCL.cpp:2330] [PG ID 6 PG GUID 31 Rank 0] ProcessGroupNCCL created ncclComm_ 0x5573d098d570 on CUDA device:  
I0202 14:18:23.305500 12476 ProcessGroupNCCL.cpp:2335] [PG ID 6 PG GUID 31 Rank 0] NCCL_DEBUG: N/A
I0202 14:18:23.306005 12477 ProcessGroupNCCL.cpp:2330] [PG ID 6 PG GUID 31 Rank 1] ProcessGroupNCCL created ncclComm_ 0x55add284f1e0 on CUDA device: 
I0202 14:18:23.306046 12477 ProcessGroupNCCL.cpp:2335] [PG ID 6 PG GUID 31 Rank 1] NCCL_DEBUG: N/A
I0202 14:18:25.975494 12476 ProcessGroupNCCL.cpp:2291] [PG ID 8 PG GUID 36 Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.108719 ms
I0202 14:18:25.975600 12478 ProcessGroupNCCL.cpp:2291] [PG ID 8 PG GUID 36 Rank 2] ProcessGroupNCCL broadcast unique ID through store took 24.3712 ms
I0202 14:18:25.975625 12477 ProcessGroupNCCL.cpp:2291] [PG ID 8 PG GUID 36 Rank 1] ProcessGroupNCCL broadcast unique ID through store took 23.5385 ms
I0202 14:18:25.975646 12479 ProcessGroupNCCL.cpp:2291] [PG ID 8 PG GUID 36 Rank 3] ProcessGroupNCCL broadcast unique ID through store took 23.4888 ms
I0202 14:18:26.609242 12477 ProcessGroupNCCL.cpp:2330] [PG ID 8 PG GUID 36 Rank 1] ProcessGroupNCCL created ncclComm_ 0x55add2adde80 on CUDA device: 
I0202 14:18:26.609257 12477 ProcessGroupNCCL.cpp:2335] [PG ID 8 PG GUID 36 Rank 1] NCCL_DEBUG: N/A
I0202 14:18:26.609277 12479 ProcessGroupNCCL.cpp:2330] [PG ID 8 PG GUID 36 Rank 3] ProcessGroupNCCL created ncclComm_ 0x5645fcf305e0 on CUDA device: 
I0202 14:18:26.609283 12478 ProcessGroupNCCL.cpp:2330] [PG ID 8 PG GUID 36 Rank 2] ProcessGroupNCCL created ncclComm_ 0x5583719c7ff0 on CUDA device: 
I0202 14:18:26.609292 12479 ProcessGroupNCCL.cpp:2335] [PG ID 8 PG GUID 36 Rank 3] NCCL_DEBUG: N/A
I0202 14:18:26.609298 12478 ProcessGroupNCCL.cpp:2335] [PG ID 8 PG GUID 36 Rank 2] NCCL_DEBUG: N/A
I0202 14:18:26.609301 12476 ProcessGroupNCCL.cpp:2330] [PG ID 8 PG GUID 36 Rank 0] ProcessGroupNCCL created ncclComm_ 0x5573d0c22e80 on CUDA device:  
I0202 14:18:26.609334 12476 ProcessGroupNCCL.cpp:2335] [PG ID 8 PG GUID 36 Rank 0] NCCL_DEBUG: N/A
 50%|█████     | 1/2 [00:06<00:06,  6.14s/it] 50%|█████     | 1/2 [00:12<00:12, 12.28s/it] 50%|█████     | 1/2 [00:12<00:12, 12.17s/it] 50%|█████     | 1/2 [00:08<00:08,  8.91s/it]100%|██████████| 2/2 [00:14<00:00,  6.39s/it]100%|██████████| 2/2 [00:14<00:00,  7.25s/it]
100%|██████████| 2/2 [00:08<00:00,  3.91s/it]100%|██████████| 2/2 [00:08<00:00,  4.24s/it]
100%|██████████| 2/2 [00:14<00:00,  6.43s/it]100%|██████████| 2/2 [00:14<00:00,  7.31s/it]
100%|██████████| 2/2 [00:11<00:00,  5.06s/it]100%|██████████| 2/2 [00:11<00:00,  5.64s/it]
2026-02-02 14:18:43.814 | INFO     | hyvideo.inference:predict:671 - Success, time: 29.36230969429016
2026-02-02 14:18:43.815 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 33)
2026-02-02 14:18:43.821 | INFO     | hyvideo.inference:predict:671 - Success, time: 34.98329401016235
2026-02-02 14:18:43.822 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 33)
2026-02-02 14:18:43.828 | INFO     | hyvideo.inference:predict:671 - Success, time: 34.9935405254364
2026-02-02 14:18:43.828 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 33)
2026-02-02 14:18:43.849 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 33
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 20
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 32400
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
2026-02-02 14:18:43.858 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 33
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 20
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 32400
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
2026-02-02 14:18:43.859 | INFO     | hyvideo.inference:predict:671 - Success, time: 31.99902057647705
2026-02-02 14:18:43.859 | INFO     | hyvideo.inference:predict:580 - Input (height, width, video_length) = (1280, 720, 33)
2026-02-02 14:18:43.862 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 33
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 20
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 32400
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
2026-02-02 14:18:43.894 | DEBUG    | hyvideo.inference:predict:642 - 
                        height: 1280
                         width: 720
                  video_length: 33
                        prompt: ['A cat walks on the grass, realistic style.']
                    neg_prompt: ['']
                          seed: 42
                   infer_steps: 20
         num_videos_per_prompt: 1
                guidance_scale: 1.0
                      n_tokens: 32400
                    flow_shift: 7.0
       embedded_guidance_scale: 6.0
  0%|          | 0/20 [00:00<?, ?it/s]  0%|          | 0/20 [00:00<?, ?it/s]  0%|          | 0/20 [00:00<?, ?it/s]  0%|          | 0/20 [00:00<?, ?it/s]  5%|▌         | 1/20 [00:02<00:45,  2.40s/it]  5%|▌         | 1/20 [00:02<00:44,  2.37s/it]  5%|▌         | 1/20 [00:02<00:45,  2.41s/it]  5%|▌         | 1/20 [00:02<00:45,  2.40s/it] 10%|█         | 2/20 [00:04<00:42,  2.37s/it] 10%|█         | 2/20 [00:04<00:42,  2.37s/it] 10%|█         | 2/20 [00:04<00:42,  2.37s/it] 10%|█         | 2/20 [00:04<00:42,  2.36s/it] 15%|█▌        | 3/20 [00:07<00:40,  2.37s/it] 15%|█▌        | 3/20 [00:07<00:40,  2.37s/it] 15%|█▌        | 3/20 [00:07<00:40,  2.37s/it] 15%|█▌        | 3/20 [00:07<00:40,  2.37s/it] 20%|██        | 4/20 [00:09<00:37,  2.37s/it] 20%|██        | 4/20 [00:09<00:37,  2.37s/it] 20%|██        | 4/20 [00:09<00:37,  2.37s/it] 20%|██        | 4/20 [00:09<00:37,  2.37s/it] 25%|██▌       | 5/20 [00:11<00:35,  2.37s/it] 25%|██▌       | 5/20 [00:11<00:35,  2.37s/it] 25%|██▌       | 5/20 [00:11<00:35,  2.37s/it] 25%|██▌       | 5/20 [00:11<00:35,  2.37s/it] 30%|███       | 6/20 [00:14<00:33,  2.37s/it] 30%|███       | 6/20 [00:14<00:33,  2.37s/it] 30%|███       | 6/20 [00:14<00:33,  2.37s/it] 30%|███       | 6/20 [00:14<00:33,  2.37s/it] 35%|███▌      | 7/20 [00:16<00:30,  2.37s/it] 35%|███▌      | 7/20 [00:16<00:30,  2.37s/it] 35%|███▌      | 7/20 [00:16<00:30,  2.37s/it] 35%|███▌      | 7/20 [00:16<00:30,  2.37s/it] 40%|████      | 8/20 [00:18<00:28,  2.37s/it] 40%|████      | 8/20 [00:18<00:28,  2.37s/it] 40%|████      | 8/20 [00:18<00:28,  2.37s/it] 40%|████      | 8/20 [00:18<00:28,  2.37s/it] 45%|████▌     | 9/20 [00:21<00:26,  2.37s/it] 45%|████▌     | 9/20 [00:21<00:26,  2.37s/it] 45%|████▌     | 9/20 [00:21<00:26,  2.37s/it] 45%|████▌     | 9/20 [00:21<00:26,  2.37s/it] 50%|█████     | 10/20 [00:23<00:23,  2.37s/it] 50%|█████     | 10/20 [00:23<00:23,  2.37s/it] 50%|█████     | 10/20 [00:23<00:23,  2.37s/it] 50%|█████     | 10/20 [00:23<00:23,  2.37s/it] 55%|█████▌    | 11/20 [00:26<00:21,  2.37s/it] 55%|█████▌    | 11/20 [00:26<00:21,  2.37s/it] 55%|█████▌    | 11/20 [00:26<00:21,  2.37s/it] 55%|█████▌    | 11/20 [00:26<00:21,  2.37s/it] 60%|██████    | 12/20 [00:28<00:18,  2.37s/it] 60%|██████    | 12/20 [00:28<00:18,  2.37s/it] 60%|██████    | 12/20 [00:28<00:18,  2.37s/it] 60%|██████    | 12/20 [00:28<00:18,  2.37s/it] 65%|██████▌   | 13/20 [00:30<00:16,  2.37s/it] 65%|██████▌   | 13/20 [00:30<00:16,  2.37s/it] 65%|██████▌   | 13/20 [00:30<00:16,  2.37s/it] 65%|██████▌   | 13/20 [00:30<00:16,  2.37s/it] 70%|███████   | 14/20 [00:33<00:14,  2.37s/it] 70%|███████   | 14/20 [00:33<00:14,  2.37s/it] 70%|███████   | 14/20 [00:33<00:14,  2.37s/it] 70%|███████   | 14/20 [00:33<00:14,  2.37s/it] 75%|███████▌  | 15/20 [00:35<00:11,  2.37s/it] 75%|███████▌  | 15/20 [00:35<00:11,  2.37s/it] 75%|███████▌  | 15/20 [00:35<00:11,  2.37s/it] 75%|███████▌  | 15/20 [00:35<00:11,  2.37s/it] 80%|████████  | 16/20 [00:37<00:09,  2.37s/it] 80%|████████  | 16/20 [00:37<00:09,  2.37s/it] 80%|████████  | 16/20 [00:37<00:09,  2.37s/it] 80%|████████  | 16/20 [00:37<00:09,  2.37s/it] 85%|████████▌ | 17/20 [00:40<00:07,  2.37s/it] 85%|████████▌ | 17/20 [00:40<00:07,  2.37s/it] 85%|████████▌ | 17/20 [00:40<00:07,  2.37s/it] 85%|████████▌ | 17/20 [00:40<00:07,  2.37s/it] 90%|█████████ | 18/20 [00:42<00:04,  2.37s/it] 90%|█████████ | 18/20 [00:42<00:04,  2.37s/it] 90%|█████████ | 18/20 [00:42<00:04,  2.37s/it] 90%|█████████ | 18/20 [00:42<00:04,  2.37s/it] 95%|█████████▌| 19/20 [00:45<00:02,  2.37s/it] 95%|█████████▌| 19/20 [00:45<00:02,  2.37s/it] 95%|█████████▌| 19/20 [00:45<00:02,  2.37s/it] 95%|█████████▌| 19/20 [00:45<00:02,  2.37s/it]100%|██████████| 20/20 [00:47<00:00,  2.37s/it]100%|██████████| 20/20 [00:47<00:00,  2.37s/it]
100%|██████████| 20/20 [00:47<00:00,  2.37s/it]100%|██████████| 20/20 [00:47<00:00,  2.37s/it]100%|██████████| 20/20 [00:47<00:00,  2.37s/it]
100%|██████████| 20/20 [00:47<00:00,  2.37s/it]
100%|██████████| 20/20 [00:47<00:00,  2.37s/it]100%|██████████| 20/20 [00:47<00:00,  2.37s/it]
2026-02-02 14:19:45.686 | INFO     | hyvideo.inference:predict:671 - Success, time: 61.82779026031494
2026-02-02 14:19:45.687 | INFO     | hyvideo.inference:predict:671 - Success, time: 61.83767604827881
2026-02-02 14:19:45.696 | INFO     | hyvideo.inference:predict:671 - Success, time: 61.833691120147705
2026-02-02 14:19:45.717 | INFO     | hyvideo.inference:predict:671 - Success, time: 61.82240390777588
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
I0202 14:19:46.725054 12479 ProcessGroupNCCL.cpp:1275] [PG ID 0 PG GUID 0 Rank 3] ProcessGroupNCCL destructor entered.
I0202 14:19:46.725337 12479 ProcessGroupNCCL.cpp:1259] [PG ID 0 PG GUID 0 Rank 3] Launching ProcessGroupNCCL abort asynchrounously.
I0202 14:19:46.725540 12479 ProcessGroupNCCL.cpp:1145] [PG ID 0 PG GUID 0 Rank 3] future is successfully executed for: ProcessGroup abort
I0202 14:19:46.725548 12479 ProcessGroupNCCL.cpp:1266] [PG ID 0 PG GUID 0 Rank 3] ProcessGroupNCCL aborts successfully.
I0202 14:19:46.725613 12479 ProcessGroupNCCL.cpp:1296] [PG ID 0 PG GUID 0 Rank 3] ProcessGroupNCCL watchdog thread joined.
I0202 14:19:46.725665 12479 ProcessGroupNCCL.cpp:1300] [PG ID 0 PG GUID 0 Rank 3] ProcessGroupNCCL heart beat monitor thread joined.
I0202 14:19:46.886549 12477 ProcessGroupNCCL.cpp:1275] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL destructor entered.
I0202 14:19:46.886615 12477 ProcessGroupNCCL.cpp:1259] [PG ID 0 PG GUID 0 Rank 1] Launching ProcessGroupNCCL abort asynchrounously.
I0202 14:19:46.886837 12477 ProcessGroupNCCL.cpp:1145] [PG ID 0 PG GUID 0 Rank 1] future is successfully executed for: ProcessGroup abort
I0202 14:19:46.886847 12477 ProcessGroupNCCL.cpp:1266] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL aborts successfully.
I0202 14:19:46.886874 12477 ProcessGroupNCCL.cpp:1296] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL watchdog thread joined.
I0202 14:19:46.886991 12477 ProcessGroupNCCL.cpp:1300] [PG ID 0 PG GUID 0 Rank 1] ProcessGroupNCCL heart beat monitor thread joined.
I0202 14:19:46.923003 12478 ProcessGroupNCCL.cpp:1275] [PG ID 0 PG GUID 0 Rank 2] ProcessGroupNCCL destructor entered.
I0202 14:19:46.923070 12478 ProcessGroupNCCL.cpp:1259] [PG ID 0 PG GUID 0 Rank 2] Launching ProcessGroupNCCL abort asynchrounously.
I0202 14:19:46.923293 12478 ProcessGroupNCCL.cpp:1145] [PG ID 0 PG GUID 0 Rank 2] future is successfully executed for: ProcessGroup abort
I0202 14:19:46.923301 12478 ProcessGroupNCCL.cpp:1266] [PG ID 0 PG GUID 0 Rank 2] ProcessGroupNCCL aborts successfully.
I0202 14:19:46.923332 12478 ProcessGroupNCCL.cpp:1296] [PG ID 0 PG GUID 0 Rank 2] ProcessGroupNCCL watchdog thread joined.
I0202 14:19:46.923460 12478 ProcessGroupNCCL.cpp:1300] [PG ID 0 PG GUID 0 Rank 2] ProcessGroupNCCL heart beat monitor thread joined.
2026-02-02 14:19:47.147 | INFO     | __main__:main:72 - Sample save to: ./results/2026-02-02-14:19:45_seed42_A cat walks on the grass, realistic style..mp4
I0202 14:19:48.216811 12476 ProcessGroupNCCL.cpp:1275] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL destructor entered.
W0202 14:19:48.216894 12476 ProcessGroupNCCL.cpp:1279] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
I0202 14:19:48.216917 12476 ProcessGroupNCCL.cpp:1259] [PG ID 0 PG GUID 0 Rank 0] Launching ProcessGroupNCCL abort asynchrounously.
I0202 14:19:48.217163 12476 ProcessGroupNCCL.cpp:1145] [PG ID 0 PG GUID 0 Rank 0] future is successfully executed for: ProcessGroup abort
I0202 14:19:48.217172 12476 ProcessGroupNCCL.cpp:1266] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL aborts successfully.
I0202 14:19:48.217195 12476 ProcessGroupNCCL.cpp:1296] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL watchdog thread joined.
I0202 14:19:48.217320 12476 ProcessGroupNCCL.cpp:1300] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL heart beat monitor thread joined.